[OE-core] Hash Equiv rehash problems

Joshua Watt jpewhacker at gmail.com
Sun Nov 24 20:51:45 UTC 2019


On Sun, Nov 24, 2019 at 9:58 AM Richard Purdie
<richard.purdie at linuxfoundation.org> wrote:
>
> We have a small problem with hash equiv. I've spent quite a lot of time
> staring at broken builds trying to figure out what is happening and its
> not even easy to explain. The easiest example I have is perhaps this:
>
> https://autobuilder.yoctoproject.org/typhoon/#/builders/59/builds/1260
>
> where the eSDK is failing as meta-extsdk-toolchain can't be installed
> into the eDSK.
>
> Its easy enough to reproduce from the build artefacts:
>
> pokybuild at debian8-ty-1:/tmp$ ~/yocto-worker/qemux86/build/build/tmp/deploy/sdk/poky-glibc-x86_64-core-image-minimal-core2-32-qemux86-toolchain-ext-3.0.sh
>  Poky (Yocto Project Reference Distro) Extensible SDK installer version 3.0
>  ==========================================================================
>  Enter target directory for SDK (default: ~/poky_sdk): /tmp/rptest
>  [...]
> pokybuild at debian8-ty-1:/tmp$ . /tmp/rptest/environment-setup-core2-32-poky-linux
>  SDK environment now set up; additionally you may now run devtool to perform development tasks.
>  Run devtool --help for further details.
> pokybuild at debian8-ty-1:/tmp$ devtool sdk-install meta-extsdk-toolchain
>  NOTE: Starting bitbake server...
>  Loading cache: 100% |###############################################################################################################################| Time: 0:00:00
>  Loaded 1302 entries from dependency cache.
>  INFO: Installing meta-extsdk-toolchain...
>  Loading cache: 100% |###############################################################################################################################| Time: 0:00:00
>  Loaded 1302 entries from dependency cache.
>  NOTE: Resolving any missing task queue dependencies
>  Initialising tasks: 100% |##########################################################################################################################| Time: 0:00:00
>  Checking sstate mirror object availability: 100% |##################################################################################################| Time: 0:00:00
>  Sstate summary: Wanted 53 Found 50 Missed 3 Current 133 (94% match, 98% complete)
>  NOTE: Executing Tasks
>  NOTE: Setscene tasks completed
>  NOTE: Tasks Summary: Attempted 0 tasks of which 0 didn't need to be rerun and all succeeded.
> ERROR: Failed to install meta-extsdk-toolchain - unavailable
>
> [adding -s allows it to work, it doesn't build much]
>
> The interesting bit is to look back at the original build logs:
>
> [meta-extsdk-toolchain comes from the sstate cache]
> NOTE: Task /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-devtools/gdb/gdb-cross_8.3.1.bb:do_populate_sysroot unihash changed to 577ad44bb0150016da775bda4d1eadbad665a3714e9ad9c4a2b3d28d0f20a28b
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_package_qa so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_packagedata so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_package so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_package_write_ipk so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_package_write_deb so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_package_write_rpm so ignoring rehash
> NOTE: Already covered setscene for /home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_populate_sysroot so ignoring rehash
> NOTE: Running task 8970 of 8974 (/home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/meta/meta-extsdk-toolchain.bb:do_locked_sigs)
> NOTE: Running task 8971 of 8974 (/home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/images/core-image-minimal.bb:do_sdk_depends)
> NOTE: Running task 8972 of 8974 (/home/pokybuild/yocto-worker/qemux86/build/meta/recipes-sato/images/core-image-sato.bb:do_sdk_depends)
> NOTE: recipe meta-extsdk-toolchain-1.0-r0: task do_locked_sigs: Started
> NOTE: recipe core-image-minimal-1.0-r0: task do_sdk_depends: Started
> NOTE: recipe meta-extsdk-toolchain-1.0-r0: task do_locked_sigs: Succeeded
> NOTE: recipe core-image-sato-1.0-r0: task do_sdk_depends: Started
> NOTE: recipe core-image-minimal-1.0-r0: task do_sdk_depends: Succeeded
> NOTE: recipe core-image-sato-1.0-r0: task do_sdk_depends: Succeeded
> NOTE: recipe buildtools-tarball-1.0-r0: task do_populate_sdk: Succeeded
> NOTE: Running task 8973 of 8974 (/home/pokybuild/yocto-worker/qemux86/build/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk_ext)
> NOTE: Running task 8974 of 8974 (/home/pokybuild/yocto-worker/qemux86/build/meta/recipes-sato/images/core-image-sato.bb:do_populate_sdk_ext)
> NOTE: recipe core-image-minimal-1.0-r0: task do_populate_sdk_ext: Started
> NOTE: recipe core-image-sato-1.0-r0: task do_populate_sdk_ext: Started
> NOTE: recipe core-image-sato-1.0-r0: task do_populate_sdk: Succeeded
> NOTE: recipe core-image-sato-1.0-r0: task do_populate_sdk_ext: Succeeded
> NOTE: recipe core-image-minimal-1.0-r0: task do_populate_sdk_ext: Succeeded
> NOTE: Tasks Summary: Attempted 8974 tasks of which 8939 didn't need to be rerun and all succeeded.
> NOTE: Writing buildhistory
> NOTE: Writing buildhistory took: 6 seconds
>
> What this means is that gdb-cross changes the sstate hash of meta-
> extsdk-toolchain:do_* as it found an equiv match but those artefacts
> were never generated. That hash was written into the eSDK sigs but it
> isn't present in sstate.

So, to summarize and make sure I understand: The eSDK built with hash
equivalence enabled doesn't function because it is looking for sstate
object that do not exist?

I'm not very familiar with the eSDK, but it doesn't surprise me too
much; the locking mechanism used by the eSDK (at least the little I
understand), has some overlap with hash equivalence, so it seems like
they might not play nice together.

>
> I did put a comment in runqueue.py a while back:
>
> # Potentially risky, should we report this hash as a match?
> logger.info("Already covered setscene for %s so ignoring rehash" % (tid))
>
> and it does re-raise this question.
>
> There may be two possible fixes:
>
> a) We force setscene tasks which have already run to rerun if this kind
> of rehash happens. We did previously do this but runqueue *really*
> doesn't like rerunning setscene tasks.

Do we remember why this is the case? I know we've talked about it off
and on, but I don't think we've ever listed out the specific reasons
why this doesn't work.

I do know that one of the reasons was that it isn't possible to
transition a task from being covered by sstate to being uncovered,
because there might be dependent tasks that were completely skipped
when the task was restored from sstate (e.g. do_fetch, do_unpack,
do_compile, do_install might all get skipped if do_package_setscene is
run).

Are there other reasons?

> b) We report these 'additional' equivalences to the server

I'll admit, this never sat well with me. I couldn't ever prove why
this was "correct" (or, it has been shown to me and I didn't
understand :/ ). From a theoretical standpoint, it seems like
reporting these hashes as equivalent would be wrong since I'm not
necessarily sure we can say that the task would reproduce in an
equivalent manner. The implementation of this would, I think, agree
with me since the best mechanisms I can think of to implement this on
the server side aren't particularly pretty :(.

However, I do think it's possible I'm just missing something, and this
isn't as bad as I think.

>
> We are seeing other symptoms of this "rehashing" in builds where
> "bitbake X; bitbake X" will suddenly rebuild a load of stuff in the
> second build as it didn't build it originally due to the "ignoring
> rehash" messages. This is very counter-intuitive and effectively a
> different representation of the same bug. Its less problematic since we
> just rebuild things (eSDK can't).
>
> If we decide b) is correct, it also raises an interesting scope
> question. Should we:
>
> a) only report things we've run into in real builds

This would effectively mean only reporting equivalence when we would
print the log message "Already covered setscene..."?

> b) report all hashes (there'd be loads)

I'm not quite clear on how this would work, can you elaborate?

> c) report all hashes which are present in sstate

I'm also not sure how this one would work, can your provide more detail?

>
> ?
>
> I think I need to sit and think about this for a while. Ccing Joshua in
> case he has any insights into this...
>
> We can't enable hashequiv by default until we get this fixed somehow.
>
> Cheers,
>
> Richard
>
>
>
>
>


More information about the Openembedded-core mailing list