[OE-core] Debug from failing hashequiv builds - server side problem?

Sun Dec 22 16:09:27 UTC 2019

On Sun, 2019-12-22 at 10:00 -0600, Joshua Watt wrote:
> On Sun, Dec 22, 2019, 6:49 AM Richard Purdie <
> richard.purdie at linuxfoundation.org> wrote:
> > On Sun, 2019-12-22 at 12:08 +0000, Richard Purdie wrote:
> > 
> > At query time in a clean build, the hashserver cannot know which of
> > the two output hashes it needs to return the value for.
> 
> In the case of multiple taskhashes mapping to different output
> hashes, the server is supposed to simply return the oldest unihash.
> If it's not doing that, there might be a bug.

The problem here is a single taskhash mapping to two different outputs.

m4-native:do_populate_sysroot (on aarch64)
m4-native:do_populate_sysroot (on x86-64)

When we query based on the input hash we'll get the first built.

We then rebuild as its not in sstate and can match a previous task
output, we then get a new hash.

If we start a new build with no cache, we lookup the input hash and get
the first built task of the wrong arch again.

> > I can think of two possible options:
> > 
> > a) When we report after running a task, we check if that input hash
> > already has a value and then reuse it for the output hash mapping.
> 
> I don't quite follow this one, can you be a little more precise with
> what hashes you are referring to (e.g. taskhash, unihash, outhash)?

We lookup a taskhash, get unihash A but its not in sstate (wrong arch).
We build this thing, send the outhash, currently we get unihash B.

I'm saying we map that outhash to unihash A since the server already
has an entry for it.

> > b) We start adding some kind of suffix to the reported hashes for
> > native output which is used within the hash equiv server but not
> > sstate.

I was thinking about this further and I had a slightly evil idea. What
if we set the method to XXX:<native arch>"? (where XXX is the current
value).

This would namespace the two native arches separately and I think then
avoid the problems?

> Just for clarification, this is because "native" can either be x86_64
> or aarch64 but the actual arch (HOST_ARCH ?) isn't part of the
> taskhash calculation?

Correct.

> This sounds related to the gcc-cross issues we had with the eSDK.

The previous problem was on the boundary between cross/native and
target. This is a pure native (or cross) issue.

> Is there a reason that the host arch isn't part of the taskhash?

Yes. Should the target output depend upon which arch it was built on?
(answer, no it shouldn't so the native hashes have to match).

> I think it might be possible to report additional inputs in the
> hashes reported to the server. The server doesn't really care if it's
> the exact taskhash, as long as the client is consistent. Perhaps that
> would help with the gcc-cross issue as well?

It won't help the other native/cross to target boundary mapping issue.

See above, I'm wondering if we could abuse the method field to make
this work? Certainly we could test that...

Cheers,

Richard