[bitbake-devel] [RFC][WIP][PATCHv1] lib/bb/checksum.py: Speed-up checksum gen when directory is git

Nicolas Dechesne nicolas.dechesne at linaro.org
Wed Oct 9 09:02:11 UTC 2019


On Wed, Oct 9, 2019 at 1:15 AM <richard.purdie at linuxfoundation.org> wrote:
>
> On Tue, 2019-10-08 at 13:45 -0500, Aníbal Limón wrote:
> > In some cases people/organizations are using a SRC_URI with
> > file:///PATH_TO_DIR that contains a git repository for different
> > reasons, it is useful when want to do an internal build without
> > clone sources from outside.
> >
> > This could consume a lot of CPU time because the current taskhash
> > generation mechanism didn't identify that the folder is a VCS
> > (git, svn, cvs) and makes the cksum for every file including the
> > .git repository in this case.
> >
> > There are different ways to improve the situation,
> >
> > * Add protocol=gitscm in file:// SRC_URI but the taskhash is
> >   calculated before the fetcher identifies the protocol, will require
> >   some changes in bitbake codebase.
> > * This patch: When directory is a git repository (contains .git)
> >   use HEAD rev + git diff to calculate checksum instead of do it
> >   in every file, that is hackish because make some assumptions about
> >   .git directory contents.
> > * Variant of this patch: Make a list of VCS directories (.git, .svn,
> >   .cvs) and take out for cksum calculations, same as before making
> >   assumptions about the . folders content.
>
> This is an interesting one.

Are you referring to the last bullet here? I suspect it's the second one.

Also to give a bit more background to everyone, as it might not be
obvious. I've seen the same pattern used several times , especially in
large/corporate deployment of OE/YP. the whole build workspace is
built as:

|- sources
|---- kernel
|---- component_A
|---- component_B
|- layers
|---- poky
|---- meta-mycompany
|-------- recipes for kernel, component_A, ...

The whole workspace is managed with a repo manifest, and the recipes
are written to use source code from the 'sources' local folder.

I am not trying to argue whether this is a good practice or not ;-)
but from the perspectives of the folks I've talked to , there are a
couple of critical advantages of doing something like that:
* it looks like Android development workflow ;-)
* it relates to the company license/legal process and review. e.g. all
the software that gets out of the company is managed by a single repo
manifest xml file
* it solves "nicely" the problem of being able to iteratively develop
using bitbake natively. e.g. "bibtake myimage" always work, and uses
local changes from 'sources'

So overall, i am being convinced that this is a valid use case for OE
end users. I don't think we can use the git:// fetcher as we need the
snapshot of the current 'sources' (with local changes), and using the
file:// fetcher has important performance impacts:
* checksum for 'each' file (which can be large, especially for kernel)
* un-expected rebuild when running repo sync, if any new git objects
are put in .git (even when no changes are made to the local worktree
of the git project).

>
> File checksums are added to the hashes "late" so that we don't have to
> reparse entire recipes when files change. We do need a mechanism to
> know when we need to reparse the checksum. I think this means you can
> skip the checksum calculation for each file but you do still end up
> having to stat all files in the tree separately for bitbake's tracking
> and for git. We also have to notice when new files are added.
>
> As such I'm not convinced this patch will work correctly (e.g. would it
> notice if I copy in a new file to the directory untracked by git).

At least I confirm that with the file:// fetcher everything works
fine, when modifying files. I don't think I have tried adding new
files. But I will try that.

Are you trying to say that to fix this properly we might need another
Fetcher , something in between file:// and git://, e.g. localgit://?
Would that make this problem easier to solve?

>
> A first step may be to add some further tests to bitbake-selftest to
> better cover this area...
>
> Cheers,
>
> Richard
>
>
>
>
>


More information about the bitbake-devel mailing list