[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Nicolas Dechesne nicolas.dechesne at linaro.org
Wed Nov 6 04:50:22 UTC 2019


hi,

On Tue, Oct 8, 2019 at 8:42 PM Aníbal Limón <anibal.limon at linaro.org> wrote:
>
> In some cases people/organizations are using a SRC_URI with
> file:///PATH_TO_DIR that contains a git repository for different
> reasons, it is useful when want to do an internal build without
> clone sources from outside.
>
> This could consume a lot of CPU time because the current taskhash
> generation mechanism didn't identify that the folder is a VCS
> (git, svn, cvs) and makes the cksum for every file including the
> .git repository in this case.
>
> There are different ways to improve the situation,
>
> * Add protocol=gitscm in file:// SRC_URI but the taskhash is
>   calculated before the fetcher identifies the protocol, will require
>   some changes in bitbake codebase.
> * This patch: When directory is a git repository (contains .git)
>   use HEAD rev + git diff to calculate checksum instead of do it
>   in every file, that is hackish because make some assumptions about
>   .git directory contents.
> * Variant of this patch: Make a list of VCS directories (.git, .svn,
>   .cvs) and take out for cksum calculations, same as before making
>   assumptions about the . folders content.

I've discussed with Khem and Richard today (@ELCE) about this patch.
We kind of agreed that the current approach is not really good since
the local fetcher isn't supposed to 'deal' with scm commands. however
we agreed that the last variant proposed above might be a much better
approach. The idea would be to exclude VCS directories from the
checksum computation. It could potentially be extended in a slightly
more generic way, like using a variable to specify a list of
directories to exclude, which could be unset by default, and set in oe
core as
BB_LOCAL_DIRS_EXCLUDE = ".git .cvs .svn"

Anibal: do you think you can give it a try?


>
> Signed-off-by: Aníbal Limón <anibal.limon at linaro.org>
> ---
>  lib/bb/checksum.py | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/lib/bb/checksum.py b/lib/bb/checksum.py
> index 5bc8a8fc..ee125cb5 100644
> --- a/lib/bb/checksum.py
> +++ b/lib/bb/checksum.py
> @@ -86,6 +86,19 @@ class FileChecksumCache(MultiProcessCache):
>              return checksum
>
>          def checksum_dir(pth):
> +            git_dir = os.path.join(pth, '.git')
> +            if os.path.exists(git_dir):
> +                import subprocess, hashlib
> +                m = hashlib.md5()
> +                head = subprocess.check_output("cd %s && git rev-parse HEAD" % pth, shell=True)
> +                diff = subprocess.check_output("cd %s && git diff" % pth, shell=True)
> +                m.update(head)
> +                if diff:
> +                    m.update(diff)
> +
> +                return [(pth, m.hexdigest())]
> +
> +
>              # Handle directories recursively
>              if pth == "/":
>                  bb.fatal("Refusing to checksum /")
> --
> 2.23.0
>


More information about the bitbake-devel mailing list