[OE-core] [PATCH 0/2] Avoid build failures due to setscene errors
Peter Kjellerstedt
peter.kjellerstedt at axis.com
Wed Aug 30 09:52:48 UTC 2017
> -----Original Message-----
> From: Richard Purdie [mailto:richard.purdie at linuxfoundation.org]
> Sent: den 30 augusti 2017 10:03
> To: Peter Kjellerstedt <peter.kjellerstedt at axis.com>; Andre McCurdy
> <armccurdy at gmail.com>
> Cc: OE Core mailing list <openembedded-core at lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> errors
>
> On Wed, 2017-08-30 at 06:44 +0000, Peter Kjellerstedt wrote:
> > > I have left this code as an error deliberately as this kind of
> > > thing should not happen and if it does, there is really something
> > > wrong which you need to figure out. It means that at one point
> > > bitbake thinks the sstate is present and valid, then later it
> > > isn't.
> >
> > True, but since the operations of checking if an sstate file exists
> > and retrieving it is not an atomic operation, there are always
> > problems that can occur. Some may be fixable, some may not. However,
> > using a build failure to detect these kind of problems is a bit harsh
> > on the developers who only sees their builds complete only to get an
> > error for something that is not their fault. We have better ways to
> > detect these kinds of problems, e.g., through log monitoring, without
> > having to cause unnecessary grief amongst the developers.
>
> Files are randomly disappearing from your sstate source. So far you've
> been lucky and these are not causing corruption, but they could.
Somehow I fail to see how missing sstate cache files can cause
corruption. If they are missing, the real task is run and all is well.
Also, I do not actually know if the files disappear permanently or
temporarily, because at the time when I look at the global sstate cache
the files are there, newly created because the build continued and let
the real task run. My guess though is that the files only temporarily
disappeared due to some network glitch, but currently I cannot verify it.
Regardless of whether my proposed changes are accepted or not, if you
want to keep the default behavior that a failed setscene task will
eventually cause the build to fail, then we should change it to fail
immediately instead. Continuing the build when you know it will fail
makes no sense at all.
> Please figure out and fix your sstate infrastructure, not hack the code
> to avoid the errors.
As Martin Jansa mentioned in another response, the problem may be due
to NFS or general network disturbances. And I see no way to protect
ourselves from them. And apparently we are not alone in seeing these
kinds of transient errors.
> I do appreciate its painful, we did once see this issue on the
> autobuilder. There was a real error in the sstate cleanup scripts and
> we fixed that but it took some work to find it.
Are your sstate cache clean up scripts available somewhere? Because
obviously it is not trivial to get it right, and since keeping the
sstate cache clean is something that I expect many like to do, having
a common script for this seems like a good thing.
Otherwise I can contribute our script. If nothing else it would
probably be good to have it reviewed by someone who is an expert on
the sstate cache. It currently features:
* configurable retention period (default is 10 days)
* removes related .tgz and .tgz.siginfo files as one
* can remove stale symbolic links (typically wanted for a local sstate
cache which has links into a global sstate cache which have seen the
actual files being cleaned away)
* dry run mode
* quiet mode (only prints a summary stating how much was clean up and
the current size of the sstate cache; very nice for running it as a
cronjob)
> Also, with changes like this you can end up in a state where sstate can
> completely stop working and the only way you'd tell is by increased
> build time.
As I mentioned, we have monitoring of our builds in place and would
definitely notice if the global sstate cache is not used as expected.
> > > I'm not convinced patching out the errors is the right solution
> > > here...
> >
> > How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS?
> > That way it can default to "0", but we can set it to "1" to
> > prioritize the production builds.
>
> I'm still not convinced, sorry.
>
> [The reason being complexity. I don't like having multiple ways of
> doing things if we can help it, particularly when one of them is a
> workaround for a problem elsewhere. One of the codepaths in a case like
> this is unlikely to get well tested.]
Well, as long as the conditional path is clearly marked as "only
enable this if you know what you are doing", I do not see a problem
with that path receiving less or no testing by you. It should get
enough testing by those of us who rely on it.
The problem for me in this kind of situations is that we do not want
to make changes to anything inside the Poky repository (which would
effectively fork it), because down that route lies madness. So instead
we rely on making all adaptations in our own layers. Making changes to
recipes is easy as we can use .bbappends in our layers. Making changes
to classes or configuration files works by copying them to our layers
and changing them there, even though I personally hate it because it
causes extra maintenance for me since I often need to build with a
newer version of Poky than our layers are currently adapted for in
preparations for updating to the next Poky release. However, changes
to anything inside bitbake is near impossible. The same with changes
to anything in meta/lib/oe. Thus we rely on being able to find a way
to get these kinds of changes integrated upstream.
> Cheers,
>
> Richard
And in case any of the above sounds as if I am trying to force a
feature down your throat that you do not like, then I beg for
forgiveness. We really do appreciate your expertise and dedication
to the OE community, and I hope we can work this to something that
you can accept and that we can use.
//Peter
More information about the Openembedded-core
mailing list