[OE-core] [PATCH RFC] module.bbclass: Fix potential do_compile/do_make_scripts race condition

Mike Crowe mac at mcrowe.com
Mon May 8 10:29:03 UTC 2017


On Tuesday 15 December 2015 at 14:04:34 +0000, Paul Barker wrote:
> On Sun, 6 Dec 2015 11:26:33 +0000
> Paul Barker <paul.barker at commagility.com> wrote:
> 
> > I ran into a race condition building multiple external modules against a 3.10.y
> > series kernel using the dylan branch of OpenEmbedded. This is difficult to
> > reproduce as it requires very specific timing: the do_make_scripts task for one
> > module was linking the modpost script whilst the do_compile task for another
> > module was attempting to use the modpost script. This resulted in a permission
> > error:
> > 
> > ERROR: Function failed: do_compile (see /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/work/amc_d24a4-oe-linux-gnueabi/ti-hplib-mod/01.01.00.04-r3d/temp/log.do_compile.25434 for further information)
> > ERROR: Logfile of failure stored in: /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/work/amc_d24a4-oe-linux-gnueabi/ti-hplib-mod/01.01.00.04-r3d/temp/log.do_compile.25434
> > Log data follows:
> > | DEBUG: Executing shell function do_compile
> > | make -C /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel M=$PWD clean
> > | make[1]: Entering directory `/home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel'
> > | make[1]: Leaving directory `/home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel'
> > | make -C /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel M=$PWD modules
> > | make[1]: Entering directory `/home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel'
> > |   CC [M]  /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/work/amc_d24a4-oe-linux-gnueabi/ti-hplib-mod/01.01.00.04-r3d/git/ti/runtime/hplib/module/hplibmod.o
> > |   Building modules, stage 2.
> > |   MODPOST 1 modules
> > | /bin/sh: scripts/mod/modpost: Permission denied
> > | make[2]: *** [__modpost] Error 126
> > | make[1]: *** [modules] Error 2
> > | make[1]: Leaving directory `/home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/sysroots/amc-d24a4/usr/src/kernel'
> > | make: *** [default] Error 2
> > | ERROR: Function failed: do_compile (see /home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/build/tmp/work/amc_d24a4-oe-linux-gnueabi/ti-hplib-mod/01.01.00.04-r3d/temp/log.do_compile.25434 for further information)
> > ERROR: Task 1284 (/home/COMMAGILITY/paul.barker/linux-bsp/work-1.0.1/ca-linux-bsp/meta-mcsdk/meta-arago-extras/recipes-bsp/ti-hplib/ti-hplib-mod_git.bb, do_compile) failed with exit code '1'
> > 
> > Later kernel versions do not rebuild the modpost script every time that 'make
> > scripts' is invoked so they should be safe from this particular failure. However
> > I'm not convinced that running 'make scripts' whilst also building an
> > out-of-tree module is always safe on later kernels and there is always the
> > potential for vendor kernels to have different behaviour here.
> > 
> > Although this was seen on the dylan branch the behaviour of master and jethro
> > looks to be the same here - do_make_scripts is locked so that only one instance
> > of it may run at one time but there is nothing to prevent one instance of
> > do_make_scripts running at the same time as an instance of do_compile.
> > 
> > The patch I'm sending attempts to solve this issue by locking the do_compile
> > task with the same lockfile as the do_make_scripts task in module.bbclass so
> > that an instance of do_copile can't run at the same time as an instance of
> > do_make_scripts. I don't know enough about the task locking to guarantee that
> > this is the right solution or to be able to test that it works as expected so
> > I'm marking the patch as an RFC.
> > 
> > Please let me know if this is the right approach and if there is any easy way to
> > test this.
> > 
> > Paul Barker (1):
> >   module.bbclass: Fix potential do_compile/do_make_scripts race
> >     condition
> > 
> >  meta/classes/module.bbclass | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> 
> ping on this.
> 
> I've just got bitten by this again so it's not a one-off. Is anyone able to
> give me some feedback on the patch, whether this is the right approach to fix
> the problem and whether this is applicable to jethro/master.

We've started seeing the same symptom, but with a v3.14 kernel. We have
several recipes that build out-of-tree modules and I can see
do_make_scripts for one running at the same time as do_compile for the one
that fails.

If I try to reproduce the problem by hand, I cannot. However, I only see
modpost being compiled for one of the tasks in the logs.

I can't really explain why I see the problem with a newer kernel.
Regardless, it seems unwise to even attempt to run do_make_tasks and
do_compile in parallel.

It looks this patch was reviewed favourably, but doesn't seem to have made
it into master.

In the meantime, I'll try this patch and see if it makes the problem go
away for us.

Thanks.

Mike.

Original patch at https://patchwork.openembedded.org/patch/109269/ and
thread at
http://lists.openembedded.org/pipermail/openembedded-core/2015-December/113752.html
for those without long-term email archives.



More information about the Openembedded-core mailing list