[bitbake-devel] [PATCH] use multiple processes to dump signatures.

Jianxun Zhang jianxun.zhang at linux.intel.com
Thu Dec 22 18:39:19 UTC 2016


> On Dec 22, 2016, at 12:59 AM, Richard Purdie <richard.purdie at linuxfoundation.org> wrote:
> 
> On Wed, 2016-12-21 at 12:27 -0800, Jianxun Zhang wrote:
>> This change significantly shortens the time on reparsing stage
>> of '-S' option.
>> 
>> Each file is reparsed and then dumped within a dedicated
>> process. The maximum number of the running processes is not
>> greater than the value of BB_NUMBER_PARSE_THREADS if it is set.
>> 
>> The dump_sigs() in class SignatureGeneratorBasic is _replaced_
>> by a new dump_sigfn() interface, so calls from the outside and
>> subclasses are dispatched to the implementation in the base
>> class of SignatureGeneratorBasic.
>> 
>> Fixes [YOCTO #10352]
> 
> Thanks, I think this is heading in the right direction. 
> 
> I am a little bit worried that this leaves OE's sstatesig.py with a
> dump_sigs() function which isn't used/connected into everything else
> though? Does this still write out a locked sigs file after this change?
I am not the expert in this area, so just share the result and my understanding here.

OE’s dump_sigs is still connected and used. The lock-sigs.inc is in output and looks okay with a manual diff against one from master tip. This is because I still keep the calling dump_sigs() in dump_signatures() in runqueue.py. That line triggers OE to dump lock-sigs.inc.

The instance “siggen” in dump_signatures() should be the type of subclass in OE’s sstatesig.py. The dump_sigs() of OE class, in return, writes out the additional lock-sigs.inc and also calls its base class (BB) SignatureGeneratorBasic’s dump_sigs(), which is nothing now.

This change removes dump_sigs() in SignatureGeneratorBasic’s to say "let’s fall back to our base class (SignatureGenerator) dump_sigs() since the inside of the removed dump_sigs() is replaced with parallel dumping". Another motivation of the removal is for code clarity. The loops in old dump_sigs() won’t work without a full data store. Keeping a loop in it to call new API doesn’t seem effective.

What we miss by removing SignatureGeneratorBasic’s dump_sigs() is any existing logic/class directly calls dump_sigs() in this class outside of -S option path. But I don’t see such a case after some searching in BB and OE. (Another reason to remove it).

In a short, it is a struggle between a parallelized stamp dumping and leaving lock-sigs.inc in a single main process. I don’t go further to parallelize writing lock-sigs.inc in OE because I think synchronization could be another deep hole, and this part doesn’t contribute much to perf either.

I also ran test module sstatetest in oe-selftest and passed it.

Feel free to let me know any better idea and improvement I should make.


> 
> Cheers,
> 
> Richard
> 
> 
>> Signed-off-by: Jianxun Zhang <jianxun.zhang at linux.intel.com>
>> ---
>>  bitbake/lib/bb/runqueue.py | 32 +++++++++++++++++++++++++++-----
>>  bitbake/lib/bb/siggen.py   |  4 ++--
>>  2 files changed, 29 insertions(+), 7 deletions(-)
>> 
>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>> index 2ad8aad..c7d8d53 100644
>> --- a/bitbake/lib/bb/runqueue.py
>> +++ b/bitbake/lib/bb/runqueue.py
>> @@ -36,6 +36,7 @@ from bb import msg, data, event
>>  from bb import monitordisk
>>  import subprocess
>>  import pickle
>> +from multiprocessing import Process
>>  
>>  bblogger = logging.getLogger("BitBake")
>>  logger = logging.getLogger("BitBake.RunQueue")
>> @@ -1302,15 +1303,36 @@ class RunQueue:
>>          else:
>>              self.rqexe.finish()
>>  
>> +    def rq_dump_sigfn(self, fn, options):
>> +        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>> +        the_data = bb_cache.loadDataFull(fn,
>> self.cooker.collection.get_file_appends(fn))
>> +        siggen = bb.parse.siggen
>> +        dataCaches = self.rqdata.dataCaches
>> +        siggen.dump_sigfn(fn, dataCaches, options)
>> +
>>      def dump_signatures(self, options):
>> -        done = set()
>> +        fns = set()
>>          bb.note("Reparsing files to collect dependency data")
>> -        bb_cache = bb.cache.NoCache(self.cooker.databuilder)
>> +
>>          for tid in self.rqdata.runtaskentries:
>>              fn = fn_from_tid(tid)
>> -            if fn not in done:
>> -                the_data = bb_cache.loadDataFull(fn,
>> self.cooker.collection.get_file_appends(fn))
>> -                done.add(fn)
>> +            fns.add(fn)
>> +
>> +        max_process =
>> int(self.cfgData.getVar("BB_NUMBER_PARSE_THREADS") or os.cpu_count()
>> or 1)
>> +        # We cannot use the real multiprocessing.Pool easily due to
>> some local data
>> +        # that can't be pickled. This is a cheap multi-process
>> solution.
>> +        launched = []
>> +        while fns:
>> +            if len(launched) < max_process:
>> +                p = Process(target=self.rq_dump_sigfn,
>> args=(fns.pop(), options))
>> +                p.start()
>> +                launched.append(p)
>> +            for q in launched:
>> +                # The finished processes are joined when calling
>> is_alive()
>> +                if not q.is_alive():
>> +                    launched.remove(q)
>> +        for p in launched:
>> +                p.join()
>>  
>>          bb.parse.siggen.dump_sigs(self.rqdata.dataCaches, options)
>>  
>> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
>> index b20b9cf..ae50a18 100644
>> --- a/bitbake/lib/bb/siggen.py
>> +++ b/bitbake/lib/bb/siggen.py
>> @@ -307,8 +307,8 @@ class
>> SignatureGeneratorBasic(SignatureGenerator):
>>                  pass
>>              raise err
>>  
>> -    def dump_sigs(self, dataCaches, options):
>> -        for fn in self.taskdeps:
>> +    def dump_sigfn(self, fn, dataCaches, options):
>> +        if fn in self.taskdeps:
>>              for task in self.taskdeps[fn]:
>>                  tid = fn + ":" + task
>>                  (mc, _, _) = bb.runqueue.split_tid(tid)
>> -- 
>> 2.7.4
>> 




More information about the bitbake-devel mailing list