[OE-core] Build failure with parallel build and opkg

Stefan Agner stefan at agner.ch
Tue Oct 2 13:12:55 UTC 2018


On 02.10.2018 10:46, Stefan Agner wrote:
> On 26.09.2018 11:34, Stefan Agner wrote:
>> Hi,
>>
>> On 12.09.2018 00:49, Stefan Agner wrote:
> <snip>
>> We discussed the issue at Linaro Connect a bit.
>>
>> To recap, we do build in two steps:
>>
>> 1. bitbake full-container-image
>> 2. bitbake -c populate_sdk full-container-image
>>
>> The issue always happens in the second step.
>>
>> We also see that in the second step, the do_package_write_ipk_setscene
>> task for every recipe is executed.
>>
>> The current assumption is
>>
>> I tried to reproduce by building a recipe using openembedded-core master
>> only in two build directories with shared sstate manually:
>>
>> 1. build1 $ bitbake eudev
>> 2. build2 $ bitbake -c cleansstate eudev
>> 3. build2 $ bitbake eudev
>> 4. build1 $ bitbake core-image-minimal
>>
>> This sequence seems not to have triggered a
>> do_package_write_ipk_setscene for eudev.
>>
>> I then tried
>> 5. build1 $ bitbake -c populate_sdk core-image-minimal
>>
>> Which did trigger a do_package_write_ipk_setscene. However, the issue
>> did not appear...
>>
>> I even tried to rebuild and replace the file manually, and run bitbake
>> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
>>
>> Last time I have seen it was with oe-core
>> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
>> again to see whether that helps reproducing the issue.
> 
> Using the older OE version did not make a difference.
> 
> 
> So Max and I discussed a bit further. We realized that when OE rebuilds,
> the opkg package index is refreshed for a package only if the mtime (in
> seconds) is different between the previous file and the file on disk
> currently (see opkg-make-index). Can it be that two simultaneously
> started builds create two ipk with same mtime?
> 
> So I built two core-image-minimal builds (on the same machine) at the
> *very* same time. First try did not cause a collision, but already on
> the second try I managed to reproduce multiple collision. This output
> shows all the packages with the same unix mtime stamp according to the
> Packages.stamps file (generated by opkg-make-index):
> 
> $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
> kages.stamps)                                                                                                                               
> 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk                                                                                   
> 1538425082 libreadline-staticdev_7.0-r0_i586.ipk                                                                                            
> 1538425162 bash-bashbug_4.4.18-r0_i586.ipk                                                                                                  
> 1538425162 bash-completion_2.8-r0_i586.ipk                                                                                                  
> 1538425162 bash-completion-dbg_2.8-r0_i586.ipk                                                                                              
> 1538425162 bash-completion-dev_2.8-r0_i586.ipk                                                                                              
> 1538425162 bash-loadable_4.4.18-r0_i586.ipk                                                                                                 
> 1538425163 bash_4.4.18-r0_i586.ipk                                                                                                          
> 1538425163 bash-completion-extra_2.8-r0_i586.ipk
> 1538425164 bash-doc_4.4.18-r0_i586.ipk
> 1538425167 bash-dbg_4.4.18-r0_i586.ipk
> 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-id_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-it_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
> 1538425234 util-linux-dev_2.32-r0_i586.ipk
> 1538425234 util-linux-findfs_2.32-r0_i586.ipk
> 1538425234 util-linux-ionice_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
> 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
> 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
> 1538425236 util-linux-blkid_2.32-r0_i586.ipk
> 1538425236 util-linux-fsck_2.32-r0_i586.ipk
> 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
> 1538425236 util-linux-fstrim_2.32-r0_i586.ipk
> 1538425236 util-linux-lscpu_2.32-r0_i586.ipk
> 1538425236 util-linux-mkfs_2.32-r0_i586.ipk
> 1538425236 util-linux-mount_2.32-r0_i586.ipk
> 1538425236 util-linux-partx_2.32-r0_i586.ipk
> 1538425236 util-linux-readprofile_2.32-r0_i586.ipk
> 1538425236 util-linux-umount_2.32-r0_i586.ipk
> 1538425237 libblkid1_2.32-r0_i586.ipk
> 1538425237 libfdisk1_2.32-r0_i586.ipk
> 1538425237 libmount1_2.32-r0_i586.ipk
> 1538425238 util-linux_2.32-r0_i586.ipk
> 1538425240 util-linux-staticdev_2.32-r0_i586.ipk
> 
> Indeed, the two ipk files have the same mtime (at least, on second level):
> 
> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2627769381  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:37.464716076 +0200
> Modify: 2018-10-01 22:20:37.930653796 +0200
> Change: 2018-10-01 22:20:50.998205339 +0200
>  Birth: -
>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136692          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2608876057  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:47.145119532 +0200
> Modify: 2018-10-01 22:20:37.358629653 +0200
> Change: 2018-10-01 22:20:50.727193901 +0200
>  Birth: -
> 
> But different md5sums!
> 
> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> 91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> 
> In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.
> 
> On our production system, the SDK builds called later always fetch the packages
> from sstate again (do_package_write_ipk_setscene). For some reason that did not
> happen in my synthetic test. However, I simulated the fetch from sstate by
> removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:
> 
>   build2 $ bitbake -c clean util-linux
> 
> And rebuild the image:
> 
>   build2 $ bitbake -f -c populate_sdk core-image-minimal
> 

Note that after this command, the packages in the feed looks like this:

$ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file                                                                   
Device: fd00h/64768d    Inode: 2627769381  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)                                                                   
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:37.464716076 +0200
Modify: 2018-10-01 22:20:37.930653796 +0200
Change: 2018-10-01 22:20:50.998205339 +0200
 Birth: -
  File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file                                                                   
Device: fd00h/64768d    Inode: 2315301423  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)                                                                   
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-02 09:27:07.750888458 +0200
Modify: 2018-10-01 22:20:37.000000000 +0200
Change: 2018-10-02 09:26:47.308004397 +0200
 Birth: -

Note that the package in build2 the mtime has no nanosecond accuracy. Presumably
the tarball from sstate does not preserve sub second timestamps.

$ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk

Will send out a patch which uses inodes instead of mtime in opkg-make-index.

--
Stefan


> Which indeed showed the issues we are seeing in production:
> ...
> Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
> Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
> ...
> Collected errors:                                                                                                                           
>  * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
> ...
> ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk                                                         
> ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
> ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'
> 
> 
> It really seems that using mtime only in opkg-make-index is not safe enough
> in our setup. We probably should use the inode (or using the inode in addition
> to the time stamp).
> 
> Is it guaranteed that the inode changes when ipk are fetched from sstate?
> 
> --
> Stefan
> 
> 
>>
>>>
>>> I guess there is some race/asynchronous operation going on around
>>> building index/getting package from sstate/pushing package to sstate.
>>>
>>> It seems an issue others have seen in the past too:
>>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
>>>
>>> Any idea?
>>>
>>> --
>>> Stefan



More information about the Openembedded-core mailing list