[OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

Fri Aug 16 15:00:40 UTC 2019

> Will try to bump BB_NUMBER_THREADS from 8 to 72.

I've tried to remove icecc.bbclass inherit (because it does things while
parsing and RP probably doesn't have it inherited), but that didn't save
much time.

All 3 tests were with bitbake 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
94m19.081s  8 BB_NUMBER_THREADS and icecc
82m59.574s  8 BB_NUMBER_THREADS no icecc
68m3.556s    72 BB_NUMBER_THREADS no icecc

Still don't know how to get to sub 10min world runs RP was seeing, but at
least it's as slow as it was before runqeueu changes (or even a bit faster
in lastest master).

On Fri, Aug 16, 2019 at 12:24 PM Martin Jansa <martin.jansa at gmail.com>
wrote:

> On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > > Comparing that build to a corresponding do-nothing build with Thud,
> > > > the time difference matches those three minutes where I have no idea
> > > > what bitbake is doing now that it didn’t need to do before…
> > > >
> > > > Hopefully these time degradations can be solved, because the current
> > > > state of bitbake is barely usable. We also need to look into possible
> > > > ways to improve the cooker output when it is running the setscene
> > > > tasks so it makes some kind of sense again.
> > >
> > > We talked on irc and you pointed at the commit things started to go
> > > wrong. Just to summarise things for the benefit of the list, this is
> > > some quick testing I did:
> > >
> > > "bitbake -p; time bitbake core-image-minimal -n"
> > >
> > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > > 76.9s master-next
> >
> > I know I'm late to the party, but it took really long for the test to
> finish..
> >
> > I'm not using poky, so reproduce this testing on our builds I've used
> > bitbake revisions matching with poky revision RP was testing:
> >
> > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake,
> because current master isn't compatible with old bitbake
> (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=],
> expression was ${@oe_import(d)} which triggered exception AttributeError:
> module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake
> (to possibly eliminate impact of metadata changes)
> >
> > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB
> RAM
> >
> > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence
> Server configured
> >
> > on a build with few more layers:
> > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632
> targets, 1927 skipped, 54 masked, 0 errors.
> >
> > but first doing just core-image-minimal like RP did:
> >
> > time          poky                                            bitbake
>                                      oe-core
> > 2m8.191s      6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > N/A           a0d941c787cf3ef030d190903279d311bc05d752        doesn't
> exist in poky/poky-contrib
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m17.053s     7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m18.515s     a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> > 2m22.220s     9983b07fffd19082abded7c3f15cc77d306dd69c
> 8c26b451f22193ef1c544e2017cc84515566c1b8
> > 2m38.185s     master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> > cache.py:446: ResourceWarning: unclosed <socket.socket fd=10,
> family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0,
> laddr=('127.0.0.1', 41117)>
> >   value = pickled.load()
> > started showing with this revision
> > 2m17.991s     master-next + fixups
> f7cd14f90463957b3e4be6d3876def98b78f1424
> > 2m9.651s      master-next + "Holding off tasks %s" out
> >
> > now world
> > 253m36.637s   7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 174m13.324s   a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> >    this is time when killed at "NOTE: Executing Tasks" without showing
> any progress on the tasks (unlike other tests) and only one bitbake process
> is running
> > 633m27.475s   master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> >    this is time when killed at (1417 from 71749) - running very slowly
> unlike other bitbake revisions where the task number changes relatively
> quickly - about 10 tasks per second
> > 89m13.992s    master-next + "Holding off tasks %s" out
> > 89m59.201s    master-next updated today
>  85fe627fdb6510f0942917964386fad9d8c479c8
> >
> > Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> > slower than recent master-next.
> >
> > I'll send -P output next.
>
>
> So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because
> this commit introduced that
> logger.debug(2, "Holding off tasks %s" %
> pprint.pformat(self.holdoff_tasks))
>
> Here is -P output from it:
>
> 5min (til "Executing Tasks" message):
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at
> .Executing.Tasks
> http://paste.ubuntu.com/p/Nw6n5hTPmF/
>
> 5hours (whole dry run):
>
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run
> http://paste.ubuntu.com/p/sydFGvYTxQ/
>
> profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed
> http://paste.ubuntu.com/p/RMGbm9DyZz/
>
> So for good old baseline I need to go few commits back, to e.g. 18d4a31f
> RP was testing before:
>
> 1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency
> a0d941c7 siggen: Convert to use self.unitaskhashes
> 1f326f2c siggen: Add new unitaskhashes data variable which is cached
> b707d0cb runqueue: Improve scenequeue processing logic
> 19a6e356 cache: Add SimpleCache class
> 7bb79099 siggen: Import unihash code from OE-Core
> 7aec8632 hashserv: SQL Optimizations
> 18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when
> handling 403 codes
>
> 94m19.081s
>
> profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run
> http://paste.ubuntu.com/p/Jnvv8GJNKB/
>
> And here is profile with latest master from yesterday:
> 90mins:
>
> profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run
> http://paste.ubuntu.com/p/YhMCHPmg56/
>
> So good news is that current master is a bit faster than "good old"
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.
>
> But still significantly slower on my fast builder compared to world tests
> on poky + meta-oe,
> not sure if it's all just because the size of metadata in my builds (with
> 71K tasks).
>
> Even the dry running the tasks was about 10 tasks/second while RP was
> seeing around 200 tasks/second.
>
> Will try to bump BB_NUMBER_THREADS from 8 to 72.
>
> I've already disabled rm_work and PRserv, but that doesn't seem to have
> any significant impact on performance here. Other than resolving the PRserv
> terminating and then warnings about lost connection.
>
> Cheers,
> >
> > >
> > > So basically the original changes showed a 25% hit but the performance
> > > of -next is dire.
> > >
> > > This is with no hash equiv server configured.
> > >
> > > It will vary depending on the target used (numbers with -sato for the
> > > above would be interesting for comparision) and how much was or is in
> > > sstate, they type of sstate mirror configured and so on.
> > >
> > > I really need to focus on getting the new code functioning correctly
> > > before we attempt to optimise but if nobody tests the new code due to
> > > performance problems we have a different issue. We also have a scaling
> > > problem with the hash server itself I need to fix to stop the
> > > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > > where to start with it all :/.
> > >
> > > Cheers,
> > >
> > > Richard
> > >
> > >
> > >
> > > --
> > > _______________________________________________
> > > Openembedded-core mailing list
> > > Openembedded-core at lists.openembedded.org
> > > http://lists.openembedded.org/mailman/listinfo/openembedded-core
> >
> > --
> > Martin 'JaMa' Jansa     jabber: Martin.Jansa at gmail.com
>
>
>
> --
> Martin 'JaMa' Jansa     jabber: Martin.Jansa at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openembedded.org/pipermail/openembedded-core/attachments/20190816/ba6fe378/attachment.html>