[bitbake-devel] [PATCH] codeparser: Switch to sha256 from md5

richard.purdie at linuxfoundation.org richard.purdie at linuxfoundation.org
Mon Dec 17 21:33:46 UTC 2018


On Mon, 2018-12-17 at 19:47 +0100, Jacob Kroon wrote:
> On Mon, Dec 17, 2018 at 3:40 PM Richard Purdie
> <richard.purdie at linuxfoundation.org> wrote:
> > We've reports of hash collision with codeparser. Looking at the way collision
> > problems occur with md5 and the way our function templating works, I can believe
> > we may run into issues.
> > 
> > This patch therefore switches to sha256.
> > 
> > Performance wise, parse time could appear to rise by 4s in 374s
> > 
> > Before:
> > 
> > 384329 in 2.966s (md5)
> > 
> > After:
> > 
> > 349743 in 2.340s (sha256)
> > 34723 in 1.245s (md5)
> > 
> > since we still have md5 used elsewhere in the code, something we should look at
> > next (using sha256 everywhere is around 5.3s in total)
> > 
> > Unfortunately this does nearly double the size of the codeparser cache file
> > due to the hash size change.
> > 
> > Signed-off-by: Richard Purdie <richard.purdie at linuxfoundation.org>
> > ---
> >  lib/bb/codeparser.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/bb/codeparser.py b/lib/bb/codeparser.py
> > index 3f8ac1d5f6..ac995a6a1d 100644
> > --- a/lib/bb/codeparser.py
> > +++ b/lib/bb/codeparser.py
> > @@ -33,7 +33,7 @@ from bb.cache import MultiProcessCache
> >  logger = logging.getLogger('BitBake.CodeParser')
> > 
> >  def bbhash(s):
> > -    return hashlib.md5(s.encode("utf-8")).hexdigest()
> > +    return hashlib.sha256(s.encode("utf-8")).hexdigest()
> > 
> >  def check_indent(codestr):
> >      """If the code is indented, add a top level piece of code to 'remove' the indentation"""
> > @@ -140,7 +140,7 @@ class CodeParserCache(MultiProcessCache):
> >      # so that an existing cache gets invalidated. Additionally you'll need
> >      # to increment __cache_version__ in cache.py in order to ensure that old
> >      # recipe caches don't trigger "Taskhash mismatch" errors.
> > -    CACHE_VERSION = 10
> > +    CACHE_VERSION = 11
> 
> The comment above mentions a "__cache_version__" in cache.py. Should
> that be bumped as well ?

Good question but in this case no. We didn't change anything about the
form of the data, just the way its represented internally in the cache.
The other code will see the same results before and after the change so
it doesn't mean it has to be bumped. It wouldn't hurt either, we just
don't need to.

Cheers,

Richard



More information about the bitbake-devel mailing list