[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Nicolas Dechesne nicolas.dechesne at linaro.org
Tue Oct 8 21:07:47 UTC 2019


Le mar. 8 oct. 2019 à 22:54, Mark Hatle <mark.hatle at kernel.crashing.org> a
écrit :

>
>
> On 10/8/19 1:45 PM, Aníbal Limón wrote:
> > In some cases people/organizations are using a SRC_URI with
> > file:///PATH_TO_DIR that contains a git repository for different
> > reasons, it is useful when want to do an internal build without
> > clone sources from outside.
> >
> > This could consume a lot of CPU time because the current taskhash
> > generation mechanism didn't identify that the folder is a VCS
> > (git, svn, cvs) and makes the cksum for every file including the
> > .git repository in this case.
> >
> > There are different ways to improve the situation,
> >
> > * Add protocol=gitscm in file:// SRC_URI but the taskhash is
> >   calculated before the fetcher identifies the protocol, will require
> >   some changes in bitbake codebase.
>
> When I have done this before, I've -always- defined it as a git repository
> by
> SRCURI:
>
> git://<local file path>;protocol=file
>
> Then it does exactly what this patch appears to do and uses the git logic
> to
> handle everything automatically.  (The file protocol is already
> implemented.)
>
> Wouldn't this be better then specifying file:// and then attempting to
> infer
> what file refers to?


Anibal and I are working on the same use case / problem. Folks who rely on
this setup also use it as a workspace, eg. They make local changes in the
sources workspace. So when doing fetch again the local changes are used. So
this is not strictly equivalent to what you propose, right?

Another major flaw with the current file:// with a git folder is that doing
a git fetch in the source tree will invalidate the fetch task even if the
actual sources haven’t changed!



>
> --MArk
>
> > * This patch: When directory is a git repository (contains .git)
> >   use HEAD rev + git diff to calculate checksum instead of do it
> >   in every file, that is hackish because make some assumptions about
> >   .git directory contents.
> > * Variant of this patch: Make a list of VCS directories (.git, .svn,
> >   .cvs) and take out for cksum calculations, same as before making
> >   assumptions about the . folders content.
> >
> > Signed-off-by: Aníbal Limón <anibal.limon at linaro.org>
> > ---
> >  lib/bb/checksum.py | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
> > index 5bc8a8fc..ee125cb5 100644
> > --- a/lib/bb/checksum.py
> > +++ b/lib/bb/checksum.py
> > @@ -86,6 +86,19 @@ class FileChecksumCache(MultiProcessCache):
> >              return checksum
> >
> >          def checksum_dir(pth):
> > +            git_dir = os.path.join(pth, '.git')
> > +            if os.path.exists(git_dir):
> > +                import subprocess, hashlib
> > +                m = hashlib.md5()
> > +                head = subprocess.check_output("cd %s && git rev-parse
> HEAD" % pth, shell=True)
> > +                diff = subprocess.check_output("cd %s && git diff" %
> pth, shell=True)
> > +                m.update(head)
> > +                if diff:
> > +                    m.update(diff)
> > +
> > +                return [(pth, m.hexdigest())]
> > +
> > +
> >              # Handle directories recursively
> >              if pth == "/":
> >                  bb.fatal("Refusing to checksum /")
> >
> --
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openembedded.org/pipermail/bitbake-devel/attachments/20191008/ac4430a2/attachment.html>


More information about the bitbake-devel mailing list