[OE-core] Sanity Failures - Segfaults in qemu images

Richard Purdie richard.purdie at linuxfoundation.org
Sun Apr 7 08:23:27 UTC 2013


We're coming up to release however we're struggling with various sanity
test failures that keep showing up on the autobuilder.

A lot of them have been caused by issues in the qemu scripts and the
fact that the systems are being asked to do more in parallel due to the
new autobuilder infrastructure. I believe we have these ones resolved
now.

The ones that worry me are like two that happened in the last build for
example:

http://autobuilder.yoctoproject.org:8011/builders/nightly-arm-lsb/builds/95/steps/Running%20Sanity%20Tests_1/logs/stdio
http://autobuilder.yoctoproject.org:8011/builders/nightly-x86-64-lsb/builds/87/steps/Running%20Sanity%20Tests/logs/stdio

In both cases we have a segfault happening in the guest, one directly
triggered by a sanity test, the other being detected in dmesg.

We saw one of these on the previous build:

http://autobuilder.yoctoproject.org:8011/builders/nightly-x86/builds/92/steps/Running%20Sanity%20Tests/logs/stdio
(ignore the minimal failure, that was likely a timeout issue, resolved
by a recent change)

I've also seen the smart help segfault on a qemumips image. I did
download that one locally and saw the same fault the first time I booted
it. I then didn't see it again, despite running the image many times.
The booting was of a copy of the image so it wasn't a first boot issue.
The checksum matched that on the autobuilder.

At this point I think it may well be a qemu issue but we don't know that
for sure. I've not seen any report of this on real hardware.

The question is how do we debug this? Does anyone have any ideas?

The best idea I've heard so far is to generate a coredump in the image
and save that off, maybe it would give some clue in later analysis. We
could also upon failure move the actually booted somewhere for later
analysis. I wondered if we could save off the qemu state too somehow.
The trouble is none of these are simple coming up to release.

So if anyone has any ideas on what is causing this of how to debug/fix
it, I'd be very receptive to them.

Cheers,

Richard







More information about the Openembedded-core mailing list