[OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

Wed Aug 14 12:55:32 UTC 2019

On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 13:36, <richard.purdie at linuxfoundation.org>
> wrote:
> > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > richard.purdie at linuxfoundation.org> wrote:
> > > > I had a glance at the profile output from master-next and the
> > > > problem
> > > > wasn't where I thought it would be, it was in the scheduler
> > code.
> > > > That
> > > > is good as those classes are effectively independent of the
> > other
> > > > changes and hence are a separate fix.
> > > > 
> > > > I've put a patch in -next which takes the above test to 36s
> > which
> > > > is
> > > > close to the older bitbake.
> > > > 
> > > > Could be interesting to see how it looks for others and
> > different
> > > > workloads.
> > > 
> > > I just tried the same test I did yesterday with
> > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > doesn't
> > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > Tasks" for 15 minutes now.
> > 
> > This might sound slightly crazy but can you try commenting out this
> > line in runqueue.py:
> > 
> > logger.debug(2, "Holding off tasks %s" %
> > pprint.pformat(self.holdoff_tasks))
> > 
> > ?
> 
> Even crazier is the outcome: it helped! 

Cool, I think I can explain it.

The holdoff_tasks list can contain a list of nearly all the tasks at
some points in execution. Even though the debug messages aren't being
printed on the console, they are being sent over the internal IPC bus
between the cooker, UI and other event handlers. Obviously for small
task lists its not a problem, for large ones its multiple 4k chunks
over pipes which isn't going to be fast.

We have done a lot of optimisation in the past but its all too easy to
trend on something like this and upset things :/.

> The whole thing completed after 15m49secons (with much of the time
> going to the empty task spin), that's some 3 minutes slower, but
> certainly it's usable again.

You followed up mentioning this wasn't with master-next. I think there
is a patch in -next which will help with the empty task spin so both
together might get us back to more normal numbers.

Cheers,

Richard