[bitbake-devel] [PATCH] use multiple processes to dump signatures.
Jianxun Zhang
jianxun.zhang at linux.intel.com
Tue Jan 10 22:54:42 UTC 2017
Hi RP,
Ping~
Do you have further suggestion based on my explanation?
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10352
Thanks
> On Dec 22, 2016, at 10:39 AM, Jianxun Zhang <jianxun.zhang at linux.intel.com> wrote:
>
>>
>> On Dec 22, 2016, at 12:59 AM, Richard Purdie <richard.purdie at linuxfoundation.org> wrote:
>>
>> On Wed, 2016-12-21 at 12:27 -0800, Jianxun Zhang wrote:
>>> This change significantly shortens the time on reparsing stage
>>> of '-S' option.
>>>
>>> Each file is reparsed and then dumped within a dedicated
>>> process. The maximum number of the running processes is not
>>> greater than the value of BB_NUMBER_PARSE_THREADS if it is set.
>>>
>>> The dump_sigs() in class SignatureGeneratorBasic is _replaced_
>>> by a new dump_sigfn() interface, so calls from the outside and
>>> subclasses are dispatched to the implementation in the base
>>> class of SignatureGeneratorBasic.
>>>
>>> Fixes [YOCTO #10352]
>>
>> Thanks, I think this is heading in the right direction.
>>
>> I am a little bit worried that this leaves OE's sstatesig.py with a
>> dump_sigs() function which isn't used/connected into everything else
>> though? Does this still write out a locked sigs file after this change?
> I am not the expert in this area, so just share the result and my understanding here.
>
> OE’s dump_sigs is still connected and used. The lock-sigs.inc is in output and looks okay with a manual diff against one from master tip. This is because I still keep the calling dump_sigs() in dump_signatures() in runqueue.py. That line triggers OE to dump lock-sigs.inc.
>
> The instance “siggen” in dump_signatures() should be the type of subclass in OE’s sstatesig.py. The dump_sigs() of OE class, in return, writes out the additional lock-sigs.inc and also calls its base class (BB) SignatureGeneratorBasic’s dump_sigs(), which is nothing now.
>
> This change removes dump_sigs() in SignatureGeneratorBasic’s to say "let’s fall back to our base class (SignatureGenerator) dump_sigs() since the inside of the removed dump_sigs() is replaced with parallel dumping". Another motivation of the removal is for code clarity. The loops in old dump_sigs() won’t work without a full data store. Keeping a loop in it to call new API doesn’t seem effective.
>
> What we miss by removing SignatureGeneratorBasic’s dump_sigs() is any existing logic/class directly calls dump_sigs() in this class outside of -S option path. But I don’t see such a case after some searching in BB and OE. (Another reason to remove it).
>
> In a short, it is a struggle between a parallelized stamp dumping and leaving lock-sigs.inc in a single main process. I don’t go further to parallelize writing lock-sigs.inc in OE because I think synchronization could be another deep hole, and this part doesn’t contribute much to perf either.
>
> I also ran test module sstatetest in oe-selftest and passed it.
>
> Feel free to let me know any better idea and improvement I should make.
>
>
>>
>> Cheers,
>>
>> Richard
>>
>>
>>> Signed-off-by: Jianxun Zhang <jianxun.zhang at linux.intel.com>
>>> ---
>>> bitbake/lib/bb/runqueue.py | 32 +++++++++++++++++++++++++++-----
>>> bitbake/lib/bb/siggen.py | 4 ++--
>>> 2 files changed, 29 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index 2ad8aad..c7d8d53 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -36,6 +36,7 @@ from bb import msg, data, event
>>> from bb import monitordisk
>>> import subprocess
>>> import pickle
>>> +from multiprocessing import Process
>>>
>>> bblogger = logging.getLogger("BitBake")
>>> logger = logging.getLogger("BitBake.RunQueue")
>>> @@ -1302,15 +1303,36 @@ class RunQueue:
>>> else:
>>> self.rqexe.finish()
>>>
>>> + def rq_dump_sigfn(self, fn, options):
>>> + bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>>> + the_data = bb_cache.loadDataFull(fn,
>>> self.cooker.collection.get_file_appends(fn))
>>> + siggen = bb.parse.siggen
>>> + dataCaches = self.rqdata.dataCaches
>>> + siggen.dump_sigfn(fn, dataCaches, options)
>>> +
>>> def dump_signatures(self, options):
>>> - done = set()
>>> + fns = set()
>>> bb.note("Reparsing files to collect dependency data")
>>> - bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>>> +
>>> for tid in self.rqdata.runtaskentries:
>>> fn = fn_from_tid(tid)
>>> - if fn not in done:
>>> - the_data = bb_cache.loadDataFull(fn,
>>> self.cooker.collection.get_file_appends(fn))
>>> - done.add(fn)
>>> + fns.add(fn)
>>> +
>>> + max_process =
>>> int(self.cfgData.getVar("BB_NUMBER_PARSE_THREADS") or os.cpu_count()
>>> or 1)
>>> + # We cannot use the real multiprocessing.Pool easily due to
>>> some local data
>>> + # that can't be pickled. This is a cheap multi-process
>>> solution.
>>> + launched = []
>>> + while fns:
>>> + if len(launched) < max_process:
>>> + p = Process(target=self.rq_dump_sigfn,
>>> args=(fns.pop(), options))
>>> + p.start()
>>> + launched.append(p)
>>> + for q in launched:
>>> + # The finished processes are joined when calling
>>> is_alive()
>>> + if not q.is_alive():
>>> + launched.remove(q)
>>> + for p in launched:
>>> + p.join()
>>>
>>> bb.parse.siggen.dump_sigs(self.rqdata.dataCaches, options)
>>>
>>> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
>>> index b20b9cf..ae50a18 100644
>>> --- a/bitbake/lib/bb/siggen.py
>>> +++ b/bitbake/lib/bb/siggen.py
>>> @@ -307,8 +307,8 @@ class
>>> SignatureGeneratorBasic(SignatureGenerator):
>>> pass
>>> raise err
>>>
>>> - def dump_sigs(self, dataCaches, options):
>>> - for fn in self.taskdeps:
>>> + def dump_sigfn(self, fn, dataCaches, options):
>>> + if fn in self.taskdeps:
>>> for task in self.taskdeps[fn]:
>>> tid = fn + ":" + task
>>> (mc, _, _) = bb.runqueue.split_tid(tid)
>>> --
>>> 2.7.4
>>>
>
> --
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel
More information about the bitbake-devel
mailing list