[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Anibal Limon anibal.limon at linaro.org
Wed Nov 13 19:44:03 UTC 2019


On Tue, 5 Nov 2019 at 22:50, Nicolas Dechesne <nicolas.dechesne at linaro.org>
wrote:

> hi,
>
> On Tue, Oct 8, 2019 at 8:42 PM Aníbal Limón <anibal.limon at linaro.org>
> wrote:
> >
> > In some cases people/organizations are using a SRC_URI with
> > file:///PATH_TO_DIR that contains a git repository for different
> > reasons, it is useful when want to do an internal build without
> > clone sources from outside.
> >
> > This could consume a lot of CPU time because the current taskhash
> > generation mechanism didn't identify that the folder is a VCS
> > (git, svn, cvs) and makes the cksum for every file including the
> > .git repository in this case.
> >
> > There are different ways to improve the situation,
> >
> > * Add protocol=gitscm in file:// SRC_URI but the taskhash is
> >   calculated before the fetcher identifies the protocol, will require
> >   some changes in bitbake codebase.
> > * This patch: When directory is a git repository (contains .git)
> >   use HEAD rev + git diff to calculate checksum instead of do it
> >   in every file, that is hackish because make some assumptions about
> >   .git directory contents.
> > * Variant of this patch: Make a list of VCS directories (.git, .svn,
> >   .cvs) and take out for cksum calculations, same as before making
> >   assumptions about the . folders content.
>
> I've discussed with Khem and Richard today (@ELCE) about this patch.
> We kind of agreed that the current approach is not really good since
> the local fetcher isn't supposed to 'deal' with scm commands. however
> we agreed that the last variant proposed above might be a much better
> approach. The idea would be to exclude VCS directories from the
> checksum computation. It could potentially be extended in a slightly
> more generic way, like using a variable to specify a list of
> directories to exclude, which could be unset by default, and set in oe
> core as
> BB_LOCAL_DIRS_EXCLUDE = ".git .cvs .svn"
>
> Anibal: do you think you can give it a try?
>

It sounds good, this new variable will be only used for cksum exclude?, if
yes may be to change
for more specific variable  BB_LOCAL_DIRS_TASKHASH_EXCLUDE.

Regards,
Anibal


>
>
> >
> > Signed-off-by: Aníbal Limón <anibal.limon at linaro.org>
> > ---
> >  lib/bb/checksum.py | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
> > index 5bc8a8fc..ee125cb5 100644
> > --- a/lib/bb/checksum.py
> > +++ b/lib/bb/checksum.py
> > @@ -86,6 +86,19 @@ class FileChecksumCache(MultiProcessCache):
> >              return checksum
> >
> >          def checksum_dir(pth):
> > +            git_dir = os.path.join(pth, '.git')
> > +            if os.path.exists(git_dir):
> > +                import subprocess, hashlib
> > +                m = hashlib.md5()
> > +                head = subprocess.check_output("cd %s && git rev-parse
> HEAD" % pth, shell=True)
> > +                diff = subprocess.check_output("cd %s && git diff" %
> pth, shell=True)
> > +                m.update(head)
> > +                if diff:
> > +                    m.update(diff)
> > +
> > +                return [(pth, m.hexdigest())]
> > +
> > +
> >              # Handle directories recursively
> >              if pth == "/":
> >                  bb.fatal("Refusing to checksum /")
> > --
> > 2.23.0
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openembedded.org/pipermail/bitbake-devel/attachments/20191113/58bfe4b7/attachment.html>


More information about the bitbake-devel mailing list