[OE-core] [PATCH 0/2] Avoid build failures due to setscene errors

Wed Aug 30 07:54:31 UTC 2017

I agree with this patchset and it would be OK with IGNORE_SETSCENE_ERRORS
conditional as well.

We're also sometimes seeing these errors, sometime anticipated when
cleaning shared sstate-cache on NFS server sometimes unexpected when NFS or
network goes down for a minute and for some builds it happens between
sstate_checkhashes()  and using the sstate.

We normally stop all jenkins builds, until the cleanup is complete (there
is jenkins job doing the cleanup, so it puts jenkins into stop mode, waits
for all current jobs to finish which can take hours, then performs the
cleanup and cancels the stop mode), but we cannot stop hundreds of
developers using the same sstate-cache in local builds (especially when we
cannot really know when exactly the job will have free jenkins to perform
the cleanup) - luckily in local builds it doesn't hurt so bad, because the
developers are more likely to ignore the error as long as the image was
created, but in jenkins builds when bitbake returns error we cannot easily
distinguish this case of "RP is intentionally warning us that something
went wrong with sstate, but everything was built correctly in the end" and
"something failed in the build and we weren't able to recover from that,
maybe even the image wasn't created" - so we don't trigger the follow up
actions like announcing new official builds or parsing release notes or
automated testing.

Yes we could add more logic to these CI jobs, to grep the logs to decide if
this error was the only one which caused the bitbake to return error code
and ignore the returned error in such case, but simple variable is easier
to maintain (even for the cost of forking bitbake and oe-core) and will
work for local builds as well.

Regards,

On Wed, Aug 30, 2017 at 8:44 AM, Peter Kjellerstedt <
peter.kjellerstedt at axis.com> wrote:

> > -----Original Message-----
> > From: openembedded-core-bounces at lists.openembedded.org
> > [mailto:openembedded-core-bounces at lists.openembedded.org] On Behalf Of
> > Richard Purdie
> > Sent: den 29 augusti 2017 23:50
> > To: Peter Kjellerstedt <peter.kjellerstedt at axis.com>; Andre McCurdy
> > <armccurdy at gmail.com>
> > Cc: OE Core mailing list <openembedded-core at lists.openembedded.org>
> > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> > errors
> >
> > On Tue, 2017-08-29 at 20:59 +0000, Peter Kjellerstedt wrote:
> > > > -----Original Message-----
> > > > From: Andre McCurdy [mailto:armccurdy at gmail.com]
> > > > Sent: den 29 augusti 2017 22:38
> > > > To: Peter Kjellerstedt <peter.kjellerstedt at axis.com>
> > > > Cc: OE Core mailing list <openembedded-core at lists.openembedded.org>
> > > > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to
> > > > setscene
> > > > errors
> > > >
> > > > On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
> > > > <peter.kjellerstedt at axis.com> wrote:
> > > > >
> > > > > Occasionally, we see errors on our autobuilders where a setscene
> > > > > task
> > > > > fails to retrieve a file from our global sstate cache. It
> > > > > typically
> > > > > looks something like this:
> > > > >
> > > > > WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch
> > > > > URL
> > > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > > 64:3:\
> > > > >
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> > > > > MIRRORS if available
> > > > > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> > > > > Unable to find file
> > > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > > 64:3:\
> > > > >
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere.
> > > > > The
> > > > > paths that were searched were:
> > > > >     /home/pkj/.openembedded/sstate-cache
> > > > To trigger this, do you have SSTATE_MIRRORS pointing to
> > > > "/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
> > > > somewhere else? Or are they both pointing to the same local
> > > > directory?
> > > > Or something else?
> > > No, the directory above is actually what is in SSTATE_DIR.
> > > SSTATE_MIRRORS is set to:
> > >
> > > SSTATE_MIRRORS ?= "\
> > > file://.* file:///n/oe/sstate-cache/PATH;downloadfilename=PATH"
> > >
> > > where /n/oe is an NFS mount where we share a global sstate cache.
> > >
> > > The only way I have figured out to manually simulate the problem is
> > > by modifying the code in sstate_checkhashes() in sstate.bbclass and
> > > commenting out the call to fetcher.checkstatus(). Then as long as
> > > there actually is no sstate files for the task in either the global
> > > or the local sstate cache, I will get the above.
> > >
> > > I do not know what triggers it on the autobuilder though. My guess
> > > is
> > > that somehow the sstate tgz file disappears between the call to
> > > sstate_checkhashes() and when bitbake actually tries to download the
> > > file.
> > >
> > > We do have a daily job that cleans up the global sstate cache and
> > > removes files that have not been accessed in the last ten days, but
> > > it seems unlikely that it should remove a file that just happens to
> > > be required again, and do it at exactly the time when that task is
> > > building.
> >
> > I have left this code as an error deliberately as this kind of thing
> > should not happen and if it does, there is really something wrong which
> > you need to figure out. It means that at one point bitbake thinks the
> > sstate is present and valid, then later it isn't.
>
> True, but since the operations of checking if an sstate file exists and
> retrieving it is not an atomic operation, there are always problems that
> can occur. Some may be fixable, some may not. However, using a build
> failure to detect these kind of problems is a bit harsh on the developers
> who only sees their builds complete only to get an error for something
> that is not their fault. We have better ways to detect these kinds of
> problems, e.g., through log monitoring, without having to cause
> unnecessary grief amongst the developers.
>
> > I'm not convinced patching out the errors is the right solution here...
>
> How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS?
> That way it can default to "0", but we can set it to "1" to prioritize
> the production builds.
>
> > Cheers,
> >
> > Richard
>
> //Peter
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openembedded.org/pipermail/openembedded-core/attachments/20170830/c941d09c/attachment-0002.html>