[OE-core] Cache unihash ... doesn't match BB_UNIHASH ...
Alex Kiernan
alex.kiernan at gmail.com
Sun Feb 9 07:27:23 UTC 2020
On Sun, Feb 9, 2020 at 12:23 AM chris.laplante at agilent.com
<chris.laplante at agilent.com> wrote:
>
> Hi Richard,
>
> > > > Anecdotally, we are running Zeus for nightly builds with three
> > > > multiconfigs. I cherry-picked your "bitbake: fix2" and "bitbake:
> > > > fixup" patches and haven't seen any of the BB_UNIHASH errors since.
> > > > Granted it's only been a week. But before that, hash equiv +
> > > > multiconfig was unusable due to the BB_UNIHASH errors.
> > >
> > > That is a really helpful data point, thanks. I should probably clean up
> > > those bitbake patches and get them merged then, I couldn't decide if
> > > they were right or not...
> > >
> >
> > I just picked all your pending changes out of master-next into our
> > local patch queue - will let you know how it looks when it's finished
> > cooking!
>
> There are two small issues I have observed.
>
> One is occasionally I get a lot of undeterministic metadata errors when BB_CACHE_POLICY = "cache", multiconfig, and hash equiv are enabled. The errors are all on recipes for which SRCREV = "${AUTOREV}". It doesn't always happen. But it did just now when I rebased our "zeus-modified" branch onto the upstream "zeus" branch, to get the changes starting with 7dc72fde6edeb5d6ac6b3832530998afeea67cbc.
>
> Two is that, sometimes "Initializing tasks" stage appears stuck at 44% for a couple minutes. I traced it down to this code in runqueue.py (line 1168 on zeus):
>
> # Iterate over the task list and call into the siggen code
> dealtwith = set()
> todeal = set(self.runtaskentries)
> while len(todeal) > 0:
> for tid in todeal.copy():
> if len(self.runtaskentries[tid].depends - dealtwith) == 0:
> dealtwith.add(tid)
> todeal.remove(tid)
> self.prepare_task_hash(tid)
>
> When I instrument the loop to print out the size of "todeal", I see it decrease very slowly, sometimes only a couple at a time. I'm guessing this is because prepare_task_hash is contacting the hash equiv server, in a serial manner here. I'm over my work VPN which makes things extra slow. Is there an opportunity for batching here?
>
I've a new failure:
00:20:59.829 Traceback (most recent call last):
00:20:59.829 File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/server/process.py",
line 278, in ProcessServer.idle_commands(delay=0.1,
fds=[<socket.socket fd=6, family=AddressFamily.AF_UNIX,
type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>,
<socket.socket fd=18, family=AddressFamily.AF_UNIX,
type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>,
<bb.server.process.ConnectionReader object at 0x7f831b7adb70>]):
00:20:59.829 try:
00:20:59.829 > retval = function(self, data, False)
00:20:59.829 if retval is False:
00:20:59.829 File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/cooker.py",
line 1434, in buildTargetsIdle(server=<ProcessServer(ProcessServer-1,
started)>, rq=<bb.runqueue.RunQueue object at 0x7f82f5112f98>,
abort=False):
00:20:59.829 try:
00:20:59.829 > retval = rq.execute_runqueue()
00:20:59.829 except runqueue.TaskFailure as exc:
00:20:59.829 File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1522, in RunQueue.execute_runqueue():
00:20:59.829 try:
00:20:59.829 > return self._execute_runqueue()
00:20:59.829 except bb.runqueue.TaskFailure:
00:20:59.829 File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1488, in RunQueue._execute_runqueue():
00:20:59.829 if self.state is runQueueRunning:
00:20:59.829 > retval = self.rqexe.execute()
00:20:59.829
00:20:59.829 File
"/var/lib/jenkins/workspace/nanohub_master/poky/bitbake/lib/bb/runqueue.py",
line 1997, in RunQueueExecute.execute():
00:20:59.829 else:
00:20:59.829 >
self.sqdata.outrightfail.remove(nexttask)
00:20:59.829 if nexttask in self.sqdata.outrightfail:
Just testing locally with:
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 71108eeed752..a94a9bb27ae2 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1994,7 +1994,7 @@ class RunQueueExecute:
self.sq_task_failoutright(nexttask)
return True
else:
- self.sqdata.outrightfail.remove(nexttask)
+ self.sqdata.outrightfail.discard(nexttask)
if nexttask in self.sqdata.outrightfail:
logger.debug(2, 'No package found, so
skipping setscene task %s', nexttask)
self.sq_task_failoutright(nexttask)
--
Alex Kiernan
More information about the Openembedded-core
mailing list