[OE-core] [PATCH 1/1] kill-bb: Add it for killing abnormal bitbake processes

Robert Yang liezhi.yang at windriver.com
Tue Aug 6 10:43:07 UTC 2019


Hi RP and Ross,

It seems that I have figured out the root cause, I can reproduce the problem
nearly 100% when parsing:

$ kill-bb; rm -fr tmp-glibc/cache/default-glibc/qemux86/x86_64/bb_cache.dat* ; 
bitbake -p

Press *one* Ctrl-C when the parsing process is at 50%, then I can reproduce the 
problem:

Keyboard Interrupt, closing down...

Timeout while waiting for a reply from the bitbake server

It hangs at process.join(), according to:

https://docs.python.org/3.7/library/multiprocessing.html

See the section "Joining processes that use queues", it is because the
result_queue is not empty, here is a draft patch to fix the problem.

diff --git a/lib/bb/cooker.py b/lib/bb/cooker.py
index b4851e1..c11cfec 100644
--- a/lib/bb/cooker.py
+++ b/lib/bb/cooker.py
@@ -2062,6 +2062,14 @@ class CookerParser(object):
              for process in self.processes:
                  self.parser_quit.put(None)

+        # Cleanup the queue before call process.join(), otherwise there might be
+        # deadlocks.
+        while True:
+            try:
+               self.result_queue.get(timeout=0.25)
+            except queue.Empty:
+                break
+
          for process in self.processes:
              if force:
                  process.join(.1)

With this patch, I can't reproduce the problem any more, we may also need
cleanup parser_quit in theory, but I'm not sure since I can't reproduce
the problem anymore.

Now the output is:
Parsing recipes:  49% |##################################################### 
                                                   | ETA:  0:00:06
Keyboard Interrupt, closing down...

Parsing recipes: 100% 
|############################################################################################################| 
Time: 0:00:08
Parsing of 2804 .bb files complete (0 cached, 1428 parsed). 1987 targets, 1618 
skipped, 0 masked, 0 errors.
Execution was interrupted, returning a non-zero exit code.


I will send out the patch after more testing.

This patch can fix the *One* KeyboardInterrupt, there are other problems with
two KeyboardInterrupt (traceback), I will try to fix that.

// Robert

On 8/2/19 11:44 PM, Richard Purdie wrote:
> On Fri, 2019-08-02 at 11:21 +0100, Ross Burton wrote:
>> On 02/08/2019 11:24, Robert Yang wrote:
>>> There might be processes left after Ctr-C, e.g.:
>>> $ rm -f tmp/cache/default-glibc/qemux86/x86_64/
>>> $ bitbake -p
>>>
>>> Press 'Ctrl-C' multiple times during parsing, then bitbake
>>> processes may not
>>> exit, and the worse is that we can't start bitbake again, we can't
>>> always
>>> reproduce this, but sometime. We can only use "ps ux" to find the
>>> processes and
>>> kill them one by one. This tool can kill all of them easily.
>> I've noticed this, and also noticed that it got a lot worse recently.
>>
>> But let's fix bitbake instead of adding tools to work around it?
> 
> Heh. As someone who spends a lot of time trying to debug this, I must
> admit I could use such a script so I'm torn on this one!
> 
> Cheers,
> 
> Richard
> 
> 
> 


More information about the Openembedded-core mailing list