[bitbake-devel] [PATCH] use multiple processes to dump signatures.

Jianxun Zhang jianxun.zhang at linux.intel.com
Tue Jan 10 22:54:42 UTC 2017


Hi RP,
Ping~
Do you have further suggestion based on my explanation?

https://bugzilla.yoctoproject.org/show_bug.cgi?id=10352

Thanks

> On Dec 22, 2016, at 10:39 AM, Jianxun Zhang <jianxun.zhang at linux.intel.com> wrote:
> 
>> 
>> On Dec 22, 2016, at 12:59 AM, Richard Purdie <richard.purdie at linuxfoundation.org> wrote:
>> 
>> On Wed, 2016-12-21 at 12:27 -0800, Jianxun Zhang wrote:
>>> This change significantly shortens the time on reparsing stage
>>> of '-S' option.
>>> 
>>> Each file is reparsed and then dumped within a dedicated
>>> process. The maximum number of the running processes is not
>>> greater than the value of BB_NUMBER_PARSE_THREADS if it is set.
>>> 
>>> The dump_sigs() in class SignatureGeneratorBasic is _replaced_
>>> by a new dump_sigfn() interface, so calls from the outside and
>>> subclasses are dispatched to the implementation in the base
>>> class of SignatureGeneratorBasic.
>>> 
>>> Fixes [YOCTO #10352]
>> 
>> Thanks, I think this is heading in the right direction. 
>> 
>> I am a little bit worried that this leaves OE's sstatesig.py with a
>> dump_sigs() function which isn't used/connected into everything else
>> though? Does this still write out a locked sigs file after this change?
> I am not the expert in this area, so just share the result and my understanding here.
> 
> OE’s dump_sigs is still connected and used. The lock-sigs.inc is in output and looks okay with a manual diff against one from master tip. This is because I still keep the calling dump_sigs() in dump_signatures() in runqueue.py. That line triggers OE to dump lock-sigs.inc.
> 
> The instance “siggen” in dump_signatures() should be the type of subclass in OE’s sstatesig.py. The dump_sigs() of OE class, in return, writes out the additional lock-sigs.inc and also calls its base class (BB) SignatureGeneratorBasic’s dump_sigs(), which is nothing now.
> 
> This change removes dump_sigs() in SignatureGeneratorBasic’s to say "let’s fall back to our base class (SignatureGenerator) dump_sigs() since the inside of the removed dump_sigs() is replaced with parallel dumping". Another motivation of the removal is for code clarity. The loops in old dump_sigs() won’t work without a full data store. Keeping a loop in it to call new API doesn’t seem effective.
> 
> What we miss by removing SignatureGeneratorBasic’s dump_sigs() is any existing logic/class directly calls dump_sigs() in this class outside of -S option path. But I don’t see such a case after some searching in BB and OE. (Another reason to remove it).
> 
> In a short, it is a struggle between a parallelized stamp dumping and leaving lock-sigs.inc in a single main process. I don’t go further to parallelize writing lock-sigs.inc in OE because I think synchronization could be another deep hole, and this part doesn’t contribute much to perf either.
> 
> I also ran test module sstatetest in oe-selftest and passed it.
> 
> Feel free to let me know any better idea and improvement I should make.
> 
> 
>> 
>> Cheers,
>> 
>> Richard
>> 
>> 
>>> Signed-off-by: Jianxun Zhang <jianxun.zhang at linux.intel.com>
>>> ---
>>> bitbake/lib/bb/runqueue.py | 32 +++++++++++++++++++++++++++-----
>>> bitbake/lib/bb/siggen.py   |  4 ++--
>>> 2 files changed, 29 insertions(+), 7 deletions(-)
>>> 
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index 2ad8aad..c7d8d53 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -36,6 +36,7 @@ from bb import msg, data, event
>>> from bb import monitordisk
>>> import subprocess
>>> import pickle
>>> +from multiprocessing import Process
>>> 
>>> bblogger = logging.getLogger("BitBake")
>>> logger = logging.getLogger("BitBake.RunQueue")
>>> @@ -1302,15 +1303,36 @@ class RunQueue:
>>>         else:
>>>             self.rqexe.finish()
>>> 
>>> +    def rq_dump_sigfn(self, fn, options):
>>> +        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>>> +        the_data = bb_cache.loadDataFull(fn,
>>> self.cooker.collection.get_file_appends(fn))
>>> +        siggen = bb.parse.siggen
>>> +        dataCaches = self.rqdata.dataCaches
>>> +        siggen.dump_sigfn(fn, dataCaches, options)
>>> +
>>>     def dump_signatures(self, options):
>>> -        done = set()
>>> +        fns = set()
>>>         bb.note("Reparsing files to collect dependency data")
>>> -        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>>> +
>>>         for tid in self.rqdata.runtaskentries:
>>>             fn = fn_from_tid(tid)
>>> -            if fn not in done:
>>> -                the_data = bb_cache.loadDataFull(fn,
>>> self.cooker.collection.get_file_appends(fn))
>>> -                done.add(fn)
>>> +            fns.add(fn)
>>> +
>>> +        max_process =
>>> int(self.cfgData.getVar("BB_NUMBER_PARSE_THREADS") or os.cpu_count()
>>> or 1)
>>> +        # We cannot use the real multiprocessing.Pool easily due to
>>> some local data
>>> +        # that can't be pickled. This is a cheap multi-process
>>> solution.
>>> +        launched = []
>>> +        while fns:
>>> +            if len(launched) < max_process:
>>> +                p = Process(target=self.rq_dump_sigfn,
>>> args=(fns.pop(), options))
>>> +                p.start()
>>> +                launched.append(p)
>>> +            for q in launched:
>>> +                # The finished processes are joined when calling
>>> is_alive()
>>> +                if not q.is_alive():
>>> +                    launched.remove(q)
>>> +        for p in launched:
>>> +                p.join()
>>> 
>>>         bb.parse.siggen.dump_sigs(self.rqdata.dataCaches, options)
>>> 
>>> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
>>> index b20b9cf..ae50a18 100644
>>> --- a/bitbake/lib/bb/siggen.py
>>> +++ b/bitbake/lib/bb/siggen.py
>>> @@ -307,8 +307,8 @@ class
>>> SignatureGeneratorBasic(SignatureGenerator):
>>>                 pass
>>>             raise err
>>> 
>>> -    def dump_sigs(self, dataCaches, options):
>>> -        for fn in self.taskdeps:
>>> +    def dump_sigfn(self, fn, dataCaches, options):
>>> +        if fn in self.taskdeps:
>>>             for task in self.taskdeps[fn]:
>>>                 tid = fn + ":" + task
>>>                 (mc, _, _) = bb.runqueue.split_tid(tid)
>>> -- 
>>> 2.7.4
>>> 
> 
> -- 
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel




More information about the bitbake-devel mailing list