[OE-core] Improving Build Speed

Thu Nov 21 12:53:59 UTC 2013

On Thu, Nov 21, 2013 at 08:15:08AM +0100, Ulf Samuelsson wrote:
> 2013-11-21 01:19, Martin Jansa skrev:
> > On Wed, Nov 20, 2013 at 11:43:13PM +0100, Ulf Samuelsson wrote:
> >> 2013-11-20 22:29, Richard Purdie skrev:
> >> Another idea:
> >>
> >> I suspect that there is a lot of unpacking and patching of recipes
> >> for the target when the native stuff is built.
> >> Does it make sense to have multiple threads reading the disk, for
> >> the target recipes during the native build or will we just lose out
> >> due to seek time?
> >>
> >> Having multiple threads accessing the disk, might force the disk to spend
> >> most of its time seeking.
> >> Found an application which measures seek time performance,
> >> and my WD Black will do 83 seeks per second, and my SAS disk will do
> >> twice that.
> >> The RAID of two SAS disks will provide close to SSD throughput (380 MB/s)
> >> but seek time is no better than a single SAS disk.
> >>
> >> Since there is "empty time" at the end of the native build, does it make
> >> sense
> >> to minimize unpack/patch of target stuff when we reach that point, and
> >> then we let loose?
> > In my benchmarks increasing PARALLEL_MAKE till number of cores was
> > significantly improving build time, but BB_NUMBER_THREADS had minimal
> > influence somewhere above 6 or 8 (tested on various systems, even only 4 was
> > optimum on my older RAID-0 and 2 on single disk).
> > Of course it was quite different for clean build without sstate
> > prepopulated and build where most of the stuff was reused from sstate.
> >
> > see http://wiki.webos-ports.org/wiki/OE_benchmark
> 
> How many cores do you have in your build machine?

The one used in OE_benchmark has 8, my local builder also 8, I got the
same results on machines with 32 and 48 cores.

My experience (which can be different than what you see), is
that PARALLEL_MAKE scales well with number of cores, but
BB_NUMBER_THREADS is more or less limited by I/O performance, so even
when the machine has 48 cores, it doesn't say anything about running 48
do_populate or do_package tasks at the same time causing avalanche of
seeks.

The other extreme is when all 48 BB threads are in do_compile and you
can get 48x48 gcc processes which again doesn't work well on machine
with 48 cores.

with
PARALLEL_MAKE     = "-j32"
BB_NUMBER_THREADS =    "6"
and very big image build, I see all cores well used most of the time.

> I started a build, and after 20 minutes it had completed 1500 tasks using:
> 
> PARALLEL_MAKE     = "-j24"
> BB_NUMBER_THREADS =   "6"
> 
> The I decided to kill it.
> 
> When I did
> PARALLEL_MAKE     = "-j12"
> BB_NUMBER_THREADS =   "24"
> 
> It completed 2000 tasks in less than half the time.

You should have finish whole image, you can get 2000 tasks sooner (tasks
like fetch/unpack/patch) but then you're still waiting for the rest,
with smaller BB_NUMBER_THREADS it seems to spread tasks more evenly
(doing more fetch/unpack/patch tasks later when CPUs are busy compiling
something, which is good for I/O).

> This does not use tmpfs though.
> Do you have any comparision between tmpfs builds and RAID builds?

I've sent it to ML few months ago, cannot find it now.

> I currently do not use INHERIT += "rm_work"
> since I want to be able to do changes on some packages.
> Is there a way to defined rm_work on a package basis?
> Then the majority of the packages can be removed.
> 
> I use 75 GB without "rm_work"

Understood, in my scenario I want to build world as soon as possible,
keep sstate, record issues and forget about BUILDDIR.

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa at gmail.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.openembedded.org/pipermail/openembedded-core/attachments/20131121/9218d691/attachment-0002.sig>