[bitbake-devel] [PATCH] use multiple processes to dump signatures.
Jianxun Zhang
jianxun.zhang at linux.intel.com
Thu Dec 22 18:39:19 UTC 2016
> On Dec 22, 2016, at 12:59 AM, Richard Purdie <richard.purdie at linuxfoundation.org> wrote:
>
> On Wed, 2016-12-21 at 12:27 -0800, Jianxun Zhang wrote:
>> This change significantly shortens the time on reparsing stage
>> of '-S' option.
>>
>> Each file is reparsed and then dumped within a dedicated
>> process. The maximum number of the running processes is not
>> greater than the value of BB_NUMBER_PARSE_THREADS if it is set.
>>
>> The dump_sigs() in class SignatureGeneratorBasic is _replaced_
>> by a new dump_sigfn() interface, so calls from the outside and
>> subclasses are dispatched to the implementation in the base
>> class of SignatureGeneratorBasic.
>>
>> Fixes [YOCTO #10352]
>
> Thanks, I think this is heading in the right direction.
>
> I am a little bit worried that this leaves OE's sstatesig.py with a
> dump_sigs() function which isn't used/connected into everything else
> though? Does this still write out a locked sigs file after this change?
I am not the expert in this area, so just share the result and my understanding here.
OE’s dump_sigs is still connected and used. The lock-sigs.inc is in output and looks okay with a manual diff against one from master tip. This is because I still keep the calling dump_sigs() in dump_signatures() in runqueue.py. That line triggers OE to dump lock-sigs.inc.
The instance “siggen” in dump_signatures() should be the type of subclass in OE’s sstatesig.py. The dump_sigs() of OE class, in return, writes out the additional lock-sigs.inc and also calls its base class (BB) SignatureGeneratorBasic’s dump_sigs(), which is nothing now.
This change removes dump_sigs() in SignatureGeneratorBasic’s to say "let’s fall back to our base class (SignatureGenerator) dump_sigs() since the inside of the removed dump_sigs() is replaced with parallel dumping". Another motivation of the removal is for code clarity. The loops in old dump_sigs() won’t work without a full data store. Keeping a loop in it to call new API doesn’t seem effective.
What we miss by removing SignatureGeneratorBasic’s dump_sigs() is any existing logic/class directly calls dump_sigs() in this class outside of -S option path. But I don’t see such a case after some searching in BB and OE. (Another reason to remove it).
In a short, it is a struggle between a parallelized stamp dumping and leaving lock-sigs.inc in a single main process. I don’t go further to parallelize writing lock-sigs.inc in OE because I think synchronization could be another deep hole, and this part doesn’t contribute much to perf either.
I also ran test module sstatetest in oe-selftest and passed it.
Feel free to let me know any better idea and improvement I should make.
>
> Cheers,
>
> Richard
>
>
>> Signed-off-by: Jianxun Zhang <jianxun.zhang at linux.intel.com>
>> ---
>> bitbake/lib/bb/runqueue.py | 32 +++++++++++++++++++++++++++-----
>> bitbake/lib/bb/siggen.py | 4 ++--
>> 2 files changed, 29 insertions(+), 7 deletions(-)
>>
>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>> index 2ad8aad..c7d8d53 100644
>> --- a/bitbake/lib/bb/runqueue.py
>> +++ b/bitbake/lib/bb/runqueue.py
>> @@ -36,6 +36,7 @@ from bb import msg, data, event
>> from bb import monitordisk
>> import subprocess
>> import pickle
>> +from multiprocessing import Process
>>
>> bblogger = logging.getLogger("BitBake")
>> logger = logging.getLogger("BitBake.RunQueue")
>> @@ -1302,15 +1303,36 @@ class RunQueue:
>> else:
>> self.rqexe.finish()
>>
>> + def rq_dump_sigfn(self, fn, options):
>> + bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>> + the_data = bb_cache.loadDataFull(fn,
>> self.cooker.collection.get_file_appends(fn))
>> + siggen = bb.parse.siggen
>> + dataCaches = self.rqdata.dataCaches
>> + siggen.dump_sigfn(fn, dataCaches, options)
>> +
>> def dump_signatures(self, options):
>> - done = set()
>> + fns = set()
>> bb.note("Reparsing files to collect dependency data")
>> - bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>> +
>> for tid in self.rqdata.runtaskentries:
>> fn = fn_from_tid(tid)
>> - if fn not in done:
>> - the_data = bb_cache.loadDataFull(fn,
>> self.cooker.collection.get_file_appends(fn))
>> - done.add(fn)
>> + fns.add(fn)
>> +
>> + max_process =
>> int(self.cfgData.getVar("BB_NUMBER_PARSE_THREADS") or os.cpu_count()
>> or 1)
>> + # We cannot use the real multiprocessing.Pool easily due to
>> some local data
>> + # that can't be pickled. This is a cheap multi-process
>> solution.
>> + launched = []
>> + while fns:
>> + if len(launched) < max_process:
>> + p = Process(target=self.rq_dump_sigfn,
>> args=(fns.pop(), options))
>> + p.start()
>> + launched.append(p)
>> + for q in launched:
>> + # The finished processes are joined when calling
>> is_alive()
>> + if not q.is_alive():
>> + launched.remove(q)
>> + for p in launched:
>> + p.join()
>>
>> bb.parse.siggen.dump_sigs(self.rqdata.dataCaches, options)
>>
>> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
>> index b20b9cf..ae50a18 100644
>> --- a/bitbake/lib/bb/siggen.py
>> +++ b/bitbake/lib/bb/siggen.py
>> @@ -307,8 +307,8 @@ class
>> SignatureGeneratorBasic(SignatureGenerator):
>> pass
>> raise err
>>
>> - def dump_sigs(self, dataCaches, options):
>> - for fn in self.taskdeps:
>> + def dump_sigfn(self, fn, dataCaches, options):
>> + if fn in self.taskdeps:
>> for task in self.taskdeps[fn]:
>> tid = fn + ":" + task
>> (mc, _, _) = bb.runqueue.split_tid(tid)
>> --
>> 2.7.4
>>
More information about the bitbake-devel
mailing list