[OE-core] Build failure with parallel build and opkg

Khem Raj raj.khem at gmail.com
Tue Oct 2 19:03:30 UTC 2018


On Tue, Oct 2, 2018 at 6:13 AM Stefan Agner <stefan at agner.ch> wrote:
>
> On 02.10.2018 10:46, Stefan Agner wrote:
> > On 26.09.2018 11:34, Stefan Agner wrote:
> >> Hi,
> >>
> >> On 12.09.2018 00:49, Stefan Agner wrote:
> > <snip>
> >> We discussed the issue at Linaro Connect a bit.
> >>
> >> To recap, we do build in two steps:
> >>
> >> 1. bitbake full-container-image
> >> 2. bitbake -c populate_sdk full-container-image
> >>
> >> The issue always happens in the second step.
> >>
> >> We also see that in the second step, the do_package_write_ipk_setscene
> >> task for every recipe is executed.
> >>
> >> The current assumption is
> >>
> >> I tried to reproduce by building a recipe using openembedded-core master
> >> only in two build directories with shared sstate manually:
> >>
> >> 1. build1 $ bitbake eudev
> >> 2. build2 $ bitbake -c cleansstate eudev
> >> 3. build2 $ bitbake eudev
> >> 4. build1 $ bitbake core-image-minimal
> >>
> >> This sequence seems not to have triggered a
> >> do_package_write_ipk_setscene for eudev.
> >>
> >> I then tried
> >> 5. build1 $ bitbake -c populate_sdk core-image-minimal
> >>
> >> Which did trigger a do_package_write_ipk_setscene. However, the issue
> >> did not appear...
> >>
> >> I even tried to rebuild and replace the file manually, and run bitbake
> >> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
> >>
> >> Last time I have seen it was with oe-core
> >> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
> >> again to see whether that helps reproducing the issue.
> >
> > Using the older OE version did not make a difference.
> >
> >
> > So Max and I discussed a bit further. We realized that when OE rebuilds,
> > the opkg package index is refreshed for a package only if the mtime (in
> > seconds) is different between the previous file and the file on disk
> > currently (see opkg-make-index). Can it be that two simultaneously
> > started builds create two ipk with same mtime?
> >
> > So I built two core-image-minimal builds (on the same machine) at the
> > *very* same time. First try did not cause a collision, but already on
> > the second try I managed to reproduce multiple collision. This output
> > shows all the packages with the same unix mtime stamp according to the
> > Packages.stamps file (generated by opkg-make-index):
> >
> > $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
> > kages.stamps)
> > 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk
> > 1538425082 libreadline-staticdev_7.0-r0_i586.ipk
> > 1538425162 bash-bashbug_4.4.18-r0_i586.ipk
> > 1538425162 bash-completion_2.8-r0_i586.ipk
> > 1538425162 bash-completion-dbg_2.8-r0_i586.ipk
> > 1538425162 bash-completion-dev_2.8-r0_i586.ipk
> > 1538425162 bash-loadable_4.4.18-r0_i586.ipk
> > 1538425163 bash_4.4.18-r0_i586.ipk
> > 1538425163 bash-completion-extra_2.8-r0_i586.ipk
> > 1538425164 bash-doc_4.4.18-r0_i586.ipk
> > 1538425167 bash-dbg_4.4.18-r0_i586.ipk
> > 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-id_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-it_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
> > 1538425234 util-linux-dev_2.32-r0_i586.ipk
> > 1538425234 util-linux-findfs_2.32-r0_i586.ipk
> > 1538425234 util-linux-ionice_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
> > 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
> > 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
> > 1538425236 util-linux-blkid_2.32-r0_i586.ipk
> > 1538425236 util-linux-fsck_2.32-r0_i586.ipk
> > 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
> > 1538425236 util-linux-fstrim_2.32-r0_i586.ipk
> > 1538425236 util-linux-lscpu_2.32-r0_i586.ipk
> > 1538425236 util-linux-mkfs_2.32-r0_i586.ipk
> > 1538425236 util-linux-mount_2.32-r0_i586.ipk
> > 1538425236 util-linux-partx_2.32-r0_i586.ipk
> > 1538425236 util-linux-readprofile_2.32-r0_i586.ipk
> > 1538425236 util-linux-umount_2.32-r0_i586.ipk
> > 1538425237 libblkid1_2.32-r0_i586.ipk
> > 1538425237 libfdisk1_2.32-r0_i586.ipk
> > 1538425237 libmount1_2.32-r0_i586.ipk
> > 1538425238 util-linux_2.32-r0_i586.ipk
> > 1538425240 util-linux-staticdev_2.32-r0_i586.ipk
> >
> > Indeed, the two ipk files have the same mtime (at least, on second level):
> >
> > $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> > Device: fd00h/64768d    Inode: 2627769381  Links: 2
> > Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> > Context: unconfined_u:object_r:home_root_t:s0
> > Access: 2018-10-01 22:28:37.464716076 +0200
> > Modify: 2018-10-01 22:20:37.930653796 +0200
> > Change: 2018-10-01 22:20:50.998205339 +0200
> >  Birth: -
> >   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   Size: 136692          Blocks: 272        IO Block: 4096   regular file
> > Device: fd00h/64768d    Inode: 2608876057  Links: 2
> > Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> > Context: unconfined_u:object_r:home_root_t:s0
> > Access: 2018-10-01 22:28:47.145119532 +0200
> > Modify: 2018-10-01 22:20:37.358629653 +0200
> > Change: 2018-10-01 22:20:50.727193901 +0200
> >  Birth: -
> >
> > But different md5sums!
> >
> > $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> > bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> > 91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >
> > In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.
> >
> > On our production system, the SDK builds called later always fetch the packages
> > from sstate again (do_package_write_ipk_setscene). For some reason that did not
> > happen in my synthetic test. However, I simulated the fetch from sstate by
> > removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:
> >
> >   build2 $ bitbake -c clean util-linux
> >
> > And rebuild the image:
> >
> >   build2 $ bitbake -f -c populate_sdk core-image-minimal
> >
>
> Note that after this command, the packages in the feed looks like this:
>
> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2627769381  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:37.464716076 +0200
> Modify: 2018-10-01 22:20:37.930653796 +0200
> Change: 2018-10-01 22:20:50.998205339 +0200
>  Birth: -
>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2315301423  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-02 09:27:07.750888458 +0200
> Modify: 2018-10-01 22:20:37.000000000 +0200
> Change: 2018-10-02 09:26:47.308004397 +0200
>  Birth: -
>
> Note that the package in build2 the mtime has no nanosecond accuracy. Presumably
> the tarball from sstate does not preserve sub second timestamps.
>
> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>
> Will send out a patch which uses inodes instead of mtime in opkg-make-index.
>

Thats a good find Stefan, How about using ctime instead of mtime.

> --
> Stefan
>
>
> > Which indeed showed the issues we are seeing in production:
> > ...
> > Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
> > Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
> > ...
> > Collected errors:
> >  * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
> > ...
> > ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk
> > ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
> > ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'
> >
> >
> > It really seems that using mtime only in opkg-make-index is not safe enough
> > in our setup. We probably should use the inode (or using the inode in addition
> > to the time stamp).
> >
> > Is it guaranteed that the inode changes when ipk are fetched from sstate?
> >
> > --
> > Stefan
> >
> >
> >>
> >>>
> >>> I guess there is some race/asynchronous operation going on around
> >>> building index/getting package from sstate/pushing package to sstate.
> >>>
> >>> It seems an issue others have seen in the past too:
> >>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
> >>>
> >>> Any idea?
> >>>
> >>> --
> >>> Stefan
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core



More information about the Openembedded-core mailing list