[OE-core] Improving Build Speed

Wed Nov 20 22:43:13 UTC 2013

2013-11-20 22:29, Richard Purdie skrev:
> Hi Ulf,
>
> Nice to see someone else looking at this. I've shared some of my
> thoughts and observations below based on some of the work I've done
> trying to speed things up.
>
> On Wed, 2013-11-20 at 22:05 +0100, Ulf Samuelsson wrote:
>> Finally got my new build machine running. so I thought I'd measure
>> the performance vs the old machine
>>
>> Home Built
>> Core i7-980X
>>       6 core/12 threads @ 3,33GHz
>>       12 GB RAM @ 1333 Mhz.
>>       WD Black 1 TB @ 7200 rpm
>>
>> Precision 7500
>>       2 x  (X5670 6 core 2,93 MHz)
>>       2 x (24 GB RAM @ 1333 MHz)
>>       2 x SAS 600 GB / 15K rpm, Striped RAID
>>
>> Run Angstrom Distribution
>>
>> oebb.sh config beaglebone
>> bitbake cloud9-<my>-gnome-image  (It is slightly extended)
>>
>> The first machine build this in about three hours using
>> PARALLEL_MAKE = "-j6"
>> BB_NUMBER_THREADS = "6"
>>
>> The second machine build this much faster:
>>
>> Initially tried
>>
>> PARALLEL_MAKE = "-j2"
>> BB_NUMBER_THREADS = "12"
>>
>> but the CPU frequency tool showed it to idle.
>> Changed to:
>>
>> PARALLEL_MAKE = "-j6"
>> BB_NUMBER_THREADS = "24"
>>
>> and was quicker, but it seemed to be a little flawed.
>> At several times during the build, the CPU frequtil
>> showed that most of the cores went down to
>> minimum frequency (2,93 GHz -> 1,6 GHz)
>>
>> The image build breaks down into 7658 tasks
>>
>> 19:36    Start of Pseudo Build
>> 19:40    Start of real build
>> 19:42    Task 1000 built         2 minutes
>> 19:45    Task 2000 built         3 minutes
>> 19:47    Task 3000 built         2 minutes
>> 19:48    Task 3500 built         1 minute
>> 19:57    Task 4000 built         9 minutes ****** (1)
>> 20:00    Task 4500 built         3 minutes
>> 20:04    Task 5000 built         4 minutes
>> 20:14    Task 5700 built       10 minutes
>> 20:17    Task 6000 built         3 minutes
>> 20:27    Task 6500 built       10 minutes
>> 20:43    Task 7500 built       16 minutes
>> 20:52    Task 7657 built         9 minutes ******* (2)
>> 20:59    Task 7658 built         7 minutes ******* (3) (do_rootfs)
>>
>> Total Time 83 minutes
> FWIW this is clearly an older revision of the system. We now build
> pseudo in tree so the "Start of Pseudo Build" no longer exists. There
> have been several fixes in various performance areas recently too which
> all help a little. If that saves us the single threaded first 4 minutes
> that is clearly a good thing! :)
This is the Angstrom Master, which is Yocto-1.3
Had problems getting the build to complete with the Angstrom Yocto-1.4

>> There are several reasons for the speed traps.
>>
>> (1) This occurs at the end of the build of the native tools
>>         The build of the cross packages has started and stuff are unpacked
>>         and patched, and waiting for eglibc to be ready.
> We have gone through this "critical path" and tried to strip out as many
> dependencies as we can without sacrificing correctness. I'm open to
> further ideas.
>
>> (2) This occurs at the end of the build, when very few packages
>>         are left to build so the RunQueue only contains a few packages.
>>
>>         Had a look at the packages built at the end.
>>
>>         webkit-gtk, gimp, abiword pulseaudio.
>>
>>       abiword has PARALLEL_MAKE = "" and takes forever.
>>       I tried building an image with PARALLEL_MAKE = "-j24" and this
>> build completes without problem.
>>       but I have not loaded it to a target yet.
>>       AbiWord seems to be compiling almost alone for a long time.
>>
>>       Webkit-gtk has a strange fix in do_compile.
>>
>> do_compile() {
>>       if [ x"$MAKE" = x ]; then MAKE=make; fi
>>       ...
>>       for error_count in 1 2 3; do
>>           ...
>>           ${MAKE} ${EXTRA_OEMAKE} "$@" || exit_code=1
>>           ...
>>       done
>>       ...
>> }
>>
>>       Not sure, but I think this means that PARALLEL_MAKE might get ignored.
> I think we got rid of this in master. It was to workaround make bugs
> which we now detect and error upon instead.
>
>>       Why restrict PARALLEL_MAKE to anything less than the number of H/W
>> threads in the machine?
>>
>>       Came up with a construct PARALLEL_HIGH which is defined alongside
>> PARALLEL_MAKE in conf/local.conf
>>
>>       PARALLEL_MAKE = "-j8"
>>       PARALLEL_HIGH = "-j24"
>>
>>       In the appropriate recipes, which seems to be processed by bitbake
>> in solitude I do:
>>
>>       PARALLEL_HIGH ?= "${PARALLEL_MAKE}"
>>       PARALLEL_MAKE  = "${PARALLEL_HIGH}"
>>
>>       This means that they will try to use each H/W thread.
> Please benchmark the difference. I suspect we can just set the high
> number of make for everything. Note that few makefiles are well enough
> written to benefit from high levels of make (webkit being an notable
> exception).

I only checked a few, and no hard data, but looking at the cpufreq
it certainly seemed better.
Hard data is needed of course, so I will try that tomorrow.

>
>>       When I looked at the bitbake runqueue stuff, it seems to prioritize
>>       things with a lot of dependencies, which results in things like the
>> webkit-gtk
>>       beeing built among the last packages.
>>
>>       It would probably be better if the webkit-gtk build started earlier,
>>       so that the gimp build which depends on webkit-gtk, does not have
>>       to run as a single task for a few minutes.
>>
>>       I am thinking of adding a few dummy packages which depend on
>> webkit-gtk and the
>>       other long builds at the end, to fool bitbake to start their build
>> earlier,
>>       but it might be a better idea, if a build hint could be part of the
>> recipe.
>>
>>       I guess a value, which could be added to the dependency count would
>> not be
>>       to hard to implement (for those that know how)
> It would be easy to write a custom scheduler which hardcoded
> prioritisation of critical path items (or slow ones). Its an idea I've
> not tried yet and would be easier than artificial dependency trees.
I generated a recipe which just installs /home/root
but depends on a few things like gimp, webkit-gtk etc
to see if I can get them to start earlier.
Then I duplicated it 15 times and made a recipe which depends on these 15,
and included the latter recipe in the image.

Unfortunately this does not seem to make a difference.
It was actually a few seconds slower, which I guess is due
to the extra build time of the new recipes.
gimp is still there as the only thread at the end.
It could be that webkit-gtk depends on so many things it *has* to be 
built at the end.
>
> One point to note is that looking at the build "bootcharts", there are
> "pinch points". For core-image-sato, these are notably the toolchain,
> then gettext, then gtk, then gstreamer. I suspect webkit has a similar
> issue to that.

Another idea:

I suspect that there is a lot of unpacking and patching of recipes
for the target when the native stuff is built.
Does it make sense to have multiple threads reading the disk, for
the target recipes during the native build or will we just lose out
due to seek time?

Having multiple threads accessing the disk, might force the disk to spend
most of its time seeking.
Found an application which measures seek time performance,
and my WD Black will do 83 seeks per second, and my SAS disk will do 
twice that.
The RAID of two SAS disks will provide close to SSD throughput (380 MB/s)
but seek time is no better than a single SAS disk.

Since there is "empty time" at the end of the native build, does it make 
sense
to minimize unpack/patch of target stuff when we reach that point, and
then we let loose?

========================

Now with 48 MB of RAM, (which I might grow to 96 GB, if someone proves that
this makes it faster), this might be useful to speed things up.

Can tmpfs beat the kernel cache system?

1.    Typically, I work on less than 10 recipes, and if I continuosly
         rebuild those, why not create the build directories as links to 
a tmpfs file system.
         Maybe a configuration file with a list of recipes to build on 
tmpfs.

         During a build from scratch, this is not so useful, but once 
most stuff is in place, it might,

2.     If the downloads directory was shadowed in a tmpfs system
         then there would be less seek time during the build.
         The downloads tmpfs should be poplulated at boot time,
         and rsynced with a real disk in the background when new stuff
         is downloaded from internet.

3.     With 96 GB of RAM, maybe the complete build directory will fit.
         Would be nice to build everything on tmpfs, and automatically rsync
         to a real disk when there is nothing else to do...

4.     If not tmpfs is used, then It would still be good to have better 
control
         over the build directory.
         It make sense to me to have the metadata on an SSD, but the
         build directory should be on my RAID cluster for fast rebuilds.
         I can set this up manually, but it would be better to be able to
         specify this in a configuration file.

>> (3) Creating the rootfs seems to have zero parallelism.
>>       But I have not investigated if anything can be done.
> This is something I do want to fix in 1.6. We need to convert the core
> to python to gain access to easier threading mechanisms though.
> Certainly parallel image type generation and compression would be a win
> here.
>   
>>       ===================================
>>
>> So I propose the following changes:
>>
>> 1.Remove PARALLEL_MAKE = "" from abiword
>> 2.Add the PARALLEL_HIGH variable to a few recipes.
>> 3.Investigate if we can force the build of a few packages to an earlier
>> point.
>>
>> =======================================
>> BTW: Have noticed that there are some dependencies missing from the recipes.
>>
>>
>>
>> DEPENDENCY BUGS
>> pangomm    needs to depend on "pango"
>>       Otherwise, the required pangocairo might not be available when
>> pangomm is configured
>>
>> goffice needs to depend on "librsvg gdk-pixbuf"
>>       Also on "gobject-2.0 gmodule-2.0 gio-2.0", but I  did not find
>> those packages,
>>       so I assume they are generated somewhere. Did not investigate further.
> I'm sure patches would be most welcome for bugs like this.
>
> Cheers,
>
> Richard
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core

-- 
Best Regards
Ulf Samuelsson
eMagii