[OE-core] Debug from failing hashequiv builds - server side problem?

Sat Dec 28 11:55:11 UTC 2019

On Fri, 2019-12-27 at 13:55 -0600, Joshua Watt wrote:
> From a more theoretical standpoint, I think that hash equivalence
> only works if all of the possible variables that can impact the
> output hash are present in the taskhash. Otherwise, you can end up in
> cases described above where the same taskhash results in diverging
> output hashes that will never converge back together (in this case
> because the build host arch differs). For most target recipes this
> should already be true because we certainly want to rebuild the
> recipe if any of the inputs that could possibly affect the output
> change. One of the great parts about hash equivalence is that it sort
> dynamically and automatically figures out if these changes don't
> actually change anything and accounts for it so we don't have to
> manually figure out which variables for which recipes could be
> whitelisted (not that we would embark on that madness!).
> 
> However, for some recipes (e.g. native and cross), the taskhash is
> incomplete in this case because we don't actually care about the
> specific binary output from these recipes; we actually care about the
> behaviour of these recipes when they are run (e.g. we don't care if
> gcc-cross running on x86_64 or aarch is the exact same binary as long
> as it produces the exact same cross compiled target binary in both
> cases). Hash equivalence sort of falls down a little bit here because
> there is no sane way to hash the "behaviour" of a native or cross
> recipe. If there were we could set the output hash algorithm for
> native and cross recipes to "OEOminiscentBehaviourHash" and all would
> be well :) In lieu of that, the next best thing is to make sure that
> the inputs we give to the hash equivalence server are the full set
> that affect the output hash, just like we do for target builds. This
> works because we do expect that on a given build host arch, the
> output should be the same from build to build.
> 
> So, the real question is how do we make sure that the inputs to the
> hash equivalence server are all the ones that affect the output hash?
> I think there are a few options:
>  1) Make up yet another hash (yahash) that is equivalent to the
> taskhash + the hash of variables "unwhitelisted" for the purpose of
> hash equivalence. Instead of dealing with taskhashes, the hash
> equivalence deals with yahashes
>  2) Just like #1, except that the yahash is only the unwhitelisted
> variables and hash equivalence uses both the taskhash and the yahash
> for lookups.
>  3) Implement some sort of "namespace" field in the hash equivalence
> tables that is used to filter out entries appropriately (basically,
> the solution proposed by Richard).
> 
> I don't really have a strong opinion either way.... #3 seems much
> easier, but the thing that worries me is if we are going to find
> other variables that should be unwhitelisted and do #1 or #2 anyway.

We do have quite a number of variables we exclude from the taskhashes.
The most obvious is TMPDIR since we make relocatable binaries and don't
care where something was built. On the most part hashequiv is fairly
tolerant of this as worst case the outhashes don't match if a path
leaks.

I was further thinking about this and another issue is the version of
gcc on the host machine for building native binaries. We're naturally
going to end up with different outhashes for a native binaries built
with different versions of gcc (or other host tools). This is more
problematic as we're back to two outhashes which are treated
equivalently for the same input hash.

I'm not sure its realistic to try and even encode all the permutations
of host tools that may influence the binary output from this
perspective.

> As stated, I think that one of the advantages of hash equivalence is
> that we can be a little more aggressive with what we "unwhitelist"
> and let the hash equivalence process itself figure out if variable
> actually makes a difference in the output.

Having also spent a bit of time thinking about this, I think the
important piece is that we need to match the behaviour of sstate in
hashequiv. The theory is great but in practise they have to work
consistently and together. Where sstate allows two coexisting hashes,
we need to allow for that on the hashequiv side.

For that reason I don't think 1/2 are feasible at implementation and
we'll need to add some namespace field (as in 3) which allows us to
have the two systems work together.

The native arch issue is the big current one and when uninative is
active, I *think* its the only one we have. We may need to account for
the gcc 4.8/4.9/later sstate splitting now I think about it. When
uninative is not active, I think it would have to use the NATIVELSB
string that sstate uses to split the sstate feeds per distro.

I've hacked the method overloading into master-next as an experiment to
see how it works out, just with BUILD_ARCH for now.

I also tracked down the mystery runqueue "not executing tasks" problem
and put hard aborts into runqueue so it does not do that. The challenge
now is the conditionals around that abort aren't right so its aborting
some builds it shouldn't. I'll continue to work on figuring out the
right set.

Cheers,

Richard