[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Mark Hatle mark.hatle at kernel.crashing.org
Tue Oct 8 20:53:59 UTC 2019



On 10/8/19 1:45 PM, Aníbal Limón wrote:
> In some cases people/organizations are using a SRC_URI with
> file:///PATH_TO_DIR that contains a git repository for different
> reasons, it is useful when want to do an internal build without
> clone sources from outside.
> 
> This could consume a lot of CPU time because the current taskhash
> generation mechanism didn't identify that the folder is a VCS
> (git, svn, cvs) and makes the cksum for every file including the
> .git repository in this case.
> 
> There are different ways to improve the situation,
> 
> * Add protocol=gitscm in file:// SRC_URI but the taskhash is
>   calculated before the fetcher identifies the protocol, will require
>   some changes in bitbake codebase.

When I have done this before, I've -always- defined it as a git repository by
SRCURI:

git://<local file path>;protocol=file

Then it does exactly what this patch appears to do and uses the git logic to
handle everything automatically.  (The file protocol is already implemented.)

Wouldn't this be better then specifying file:// and then attempting to infer
what file refers to?

--MArk

> * This patch: When directory is a git repository (contains .git)
>   use HEAD rev + git diff to calculate checksum instead of do it
>   in every file, that is hackish because make some assumptions about
>   .git directory contents.
> * Variant of this patch: Make a list of VCS directories (.git, .svn,
>   .cvs) and take out for cksum calculations, same as before making
>   assumptions about the . folders content.
> 
> Signed-off-by: Aníbal Limón <anibal.limon at linaro.org>
> ---
>  lib/bb/checksum.py | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
> index 5bc8a8fc..ee125cb5 100644
> --- a/lib/bb/checksum.py
> +++ b/lib/bb/checksum.py
> @@ -86,6 +86,19 @@ class FileChecksumCache(MultiProcessCache):
>              return checksum
>  
>          def checksum_dir(pth):
> +            git_dir = os.path.join(pth, '.git')
> +            if os.path.exists(git_dir):
> +                import subprocess, hashlib
> +                m = hashlib.md5()
> +                head = subprocess.check_output("cd %s && git rev-parse HEAD" % pth, shell=True)
> +                diff = subprocess.check_output("cd %s && git diff" % pth, shell=True)
> +                m.update(head)
> +                if diff:
> +                    m.update(diff)
> +
> +                return [(pth, m.hexdigest())]
> +
> +
>              # Handle directories recursively
>              if pth == "/":
>                  bb.fatal("Refusing to checksum /")
> 


More information about the bitbake-devel mailing list