[OE-core] Need arm64/qemu help

Richard Purdie richard.purdie at linuxfoundation.org
Sat Mar 3 11:06:15 UTC 2018


On Sat, 2018-03-03 at 10:51 +0000, Ian Arkver wrote:
> On 03/03/18 09:00, Richard Purdie wrote:
> > I need some help with a problem we keep seeing:
> > 
> > https://autobuilder.yocto.io/builders/nightly-arm64/builds/798
> > 
> > Basically, now and again, for reasons we don't understand, all the
> > sanity tests fail for qemuarm64.
> > 
> > I've poked at this a bit and if I go in onto the failed machine and
> > run
> > this again, they work, using the same image, kernel and qemu
> > binaries.
> > We've seen this on two different autobuilder infrastructure on
> > varying
> > host OSs. They always seem to fail all three at once.
> > 
> > Whilst this was a mut build, I saw this repeat three builds in a
> > row on
> > the new autobuilder we're setting up with master.
> > 
> > The kernels always seem to hang somewhere around the:
> > 
> > > 
> > > [    0.766079] raid6: int64x1  xor()   302 MB/s
> > > [    0.844597] raid6: int64x2  gen()   675 MB/s
> I believe this is related to btrfs and comes from having btrfs
> compiled 
> in to the kernel. You could maybe side-step the problem (and hence
> leave 
> it lurking) by changing btrfs to a module.

That would make an interesting experiment, it depends whether the issue
is really due to this code, or something else like the kernel timer
interrupts failing for some reason.

If it were timer interrrupts, the code would hand somewhere else, if it
were this code, it would change the place the problem occurs in the
boot processes.

This issue does have parallels with the qemuppc issue I debugged a
month or two ago where the timer interrupts stopped and the machines
appeared to hang.

If the interrupts were disappearing when the host machine was under
load, that could explain why all the machines stop or all succeed.

Interesting food for thought though, thanks!

Cheers,

Richard





More information about the Openembedded-core mailing list