[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Mark Hatle mark.hatle at kernel.crashing.org
Tue Oct 8 22:31:59 UTC 2019



On 10/8/19 4:07 PM, Nicolas Dechesne wrote:
> 
> 
> Le mar. 8 oct. 2019 à 22:54, Mark Hatle <mark.hatle at kernel.crashing.org
> <mailto:mark.hatle at kernel.crashing.org>> a écrit :
> 
> 
> 
>     On 10/8/19 1:45 PM, Aníbal Limón wrote:
>     > In some cases people/organizations are using a SRC_URI with
>     > file:///PATH_TO_DIR that contains a git repository for different
>     > reasons, it is useful when want to do an internal build without
>     > clone sources from outside.
>     >
>     > This could consume a lot of CPU time because the current taskhash
>     > generation mechanism didn't identify that the folder is a VCS
>     > (git, svn, cvs) and makes the cksum for every file including the
>     > .git repository in this case.
>     >
>     > There are different ways to improve the situation,
>     >
>     > * Add protocol=gitscm in file:// SRC_URI but the taskhash is
>     >   calculated before the fetcher identifies the protocol, will require
>     >   some changes in bitbake codebase.
> 
>     When I have done this before, I've -always- defined it as a git repository by
>     SRCURI:
> 
>     git://<local file path>;protocol=file
> 
>     Then it does exactly what this patch appears to do and uses the git logic to
>     handle everything automatically.  (The file protocol is already implemented.)
> 
>     Wouldn't this be better then specifying file:// and then attempting to infer
>     what file refers to?
> 
> 
> Anibal and I are working on the same use case / problem. Folks who rely on this
> setup also use it as a workspace, eg. They make local changes in the sources
> workspace. So when doing fetch again the local changes are used. So this is not
> strictly equivalent to what you propose, right?

Sometimes I do my work on a local clone of a repository.  Then I redirect the
recipe to my local git repository using SRC_URI, and SRCREV.

--Mark

> Another major flaw with the current file:// with a git folder is that doing a
> git fetch in the source tree will invalidate the fetch task even if the actual
> sources haven’t changed!
> 
> 
> 
> 
>     --MArk
> 
>     > * This patch: When directory is a git repository (contains .git)
>     >   use HEAD rev + git diff to calculate checksum instead of do it
>     >   in every file, that is hackish because make some assumptions about
>     >   .git directory contents.
>     > * Variant of this patch: Make a list of VCS directories (.git, .svn,
>     >   .cvs) and take out for cksum calculations, same as before making
>     >   assumptions about the . folders content.
>     >
>     > Signed-off-by: Aníbal Limón <anibal.limon at linaro.org
>     <mailto:anibal.limon at linaro.org>>
>     > ---
>     >  lib/bb/checksum.py | 13 +++++++++++++
>     >  1 file changed, 13 insertions(+)
>     >
>     > diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
>     > index 5bc8a8fc..ee125cb5 100644
>     > --- a/lib/bb/checksum.py
>     > +++ b/lib/bb/checksum.py
>     > @@ -86,6 +86,19 @@ class FileChecksumCache(MultiProcessCache):
>     >              return checksum
>>     >          def checksum_dir(pth):
>     > +            git_dir = os.path.join(pth, '.git')
>     > +            if os.path.exists(git_dir):
>     > +                import subprocess, hashlib
>     > +                m = hashlib.md5()
>     > +                head = subprocess.check_output("cd %s && git rev-parse
>     HEAD" % pth, shell=True)
>     > +                diff = subprocess.check_output("cd %s && git diff" % pth,
>     shell=True)
>     > +                m.update(head)
>     > +                if diff:
>     > +                    m.update(diff)
>     > +
>     > +                return [(pth, m.hexdigest())]
>     > +
>     > +
>     >              # Handle directories recursively
>     >              if pth == "/":
>     >                  bb.fatal("Refusing to checksum /")
>     >
>     -- 
>     _______________________________________________
>     bitbake-devel mailing list
>     bitbake-devel at lists.openembedded.org
>     <mailto:bitbake-devel at lists.openembedded.org>
>     http://lists.openembedded.org/mailman/listinfo/bitbake-devel
> 


More information about the bitbake-devel mailing list