[OE-core] Cache unihash ... doesn't match BB_UNIHASH ...

Alex Kiernan alex.kiernan at gmail.com
Sun Feb 9 07:27:23 UTC 2020


On Sun, Feb 9, 2020 at 12:23 AM chris.laplante at agilent.com
<chris.laplante at agilent.com> wrote:
>
> Hi Richard,
>
> > > > Anecdotally, we are running Zeus for nightly builds with three
> > > > multiconfigs. I cherry-picked your "bitbake: fix2" and "bitbake:
> > > > fixup" patches and haven't seen any of the BB_UNIHASH errors since.
> > > > Granted it's only been a week. But before that, hash equiv +
> > > > multiconfig was unusable due to the BB_UNIHASH errors.
> > >
> > > That is a really helpful data point, thanks. I should probably clean up
> > > those bitbake patches and get them merged then, I couldn't decide if
> > > they were right or not...
> > >
> >
> > I just picked all your pending changes out of master-next into our
> > local patch queue - will let you know how it looks when it's finished
> > cooking!
>
> There are two small issues I have observed.
>
> One is occasionally I get a lot of undeterministic metadata errors when BB_CACHE_POLICY = "cache", multiconfig, and hash equiv are enabled. The errors are all on recipes for which SRCREV = "${AUTOREV}". It doesn't always happen. But it did just now when I rebased our "zeus-modified" branch onto the upstream "zeus" branch, to get the changes starting with 7dc72fde6edeb5d6ac6b3832530998afeea67cbc.
>
> Two is that, sometimes "Initializing tasks" stage appears stuck at 44% for a couple minutes. I traced it down to this code in runqueue.py (line 1168 on zeus):
>
>         # Iterate over the task list and call into the siggen code
>         dealtwith = set()
>         todeal = set(self.runtaskentries)
>         while len(todeal) > 0:
>             for tid in todeal.copy():
>                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>                     dealtwith.add(tid)
>                     todeal.remove(tid)
>                     self.prepare_task_hash(tid)
>
> When I instrument the loop to print out the size of "todeal", I see it decrease very slowly, sometimes only a couple at a time. I'm guessing this is because prepare_task_hash is contacting the hash equiv server, in a serial manner here. I'm over my work VPN which makes things extra slow. Is there an opportunity for batching here?
>

I've a new failure:

00:20:59.829  Traceback (most recent call last):
00:20:59.829    File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/server/process.py",
line 278, in ProcessServer.idle_commands(delay=0.1,
fds=[<socket.socket fd=6, family=AddressFamily.AF_UNIX,
type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>,
<socket.socket fd=18, family=AddressFamily.AF_UNIX,
type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>,
<bb.server.process.ConnectionReader object at 0x7f831b7adb70>]):
00:20:59.829                   try:
00:20:59.829      >                retval = function(self, data, False)
00:20:59.829                       if retval is False:
00:20:59.829    File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/cooker.py",
line 1434, in buildTargetsIdle(server=<ProcessServer(ProcessServer-1,
started)>, rq=<bb.runqueue.RunQueue object at 0x7f82f5112f98>,
abort=False):
00:20:59.829                   try:
00:20:59.829      >                retval = rq.execute_runqueue()
00:20:59.829                   except runqueue.TaskFailure as exc:
00:20:59.829    File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1522, in RunQueue.execute_runqueue():
00:20:59.829               try:
00:20:59.829      >            return self._execute_runqueue()
00:20:59.829               except bb.runqueue.TaskFailure:
00:20:59.829    File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1488, in RunQueue._execute_runqueue():
00:20:59.829               if self.state is runQueueRunning:
00:20:59.829      >            retval = self.rqexe.execute()
00:20:59.829
00:20:59.829    File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1997, in RunQueueExecute.execute():
00:20:59.829                               else:
00:20:59.829      >
self.sqdata.outrightfail.remove(nexttask)
00:20:59.829                           if nexttask in self.sqdata.outrightfail:

Just testing locally with:

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 71108eeed752..a94a9bb27ae2 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1994,7 +1994,7 @@ class RunQueueExecute:
                             self.sq_task_failoutright(nexttask)
                             return True
                         else:
-                            self.sqdata.outrightfail.remove(nexttask)
+                            self.sqdata.outrightfail.discard(nexttask)
                     if nexttask in self.sqdata.outrightfail:
                         logger.debug(2, 'No package found, so
skipping setscene task %s', nexttask)
                         self.sq_task_failoutright(nexttask)



-- 
Alex Kiernan


More information about the Openembedded-core mailing list