[OE-core] [RFC][PATCH 2/2] buildhistory: support generating md5sum of files

Jacob Kroon jacob.kroon at gmail.com
Mon Jan 7 09:38:10 UTC 2019


Hi André,

On Mon, Jan 7, 2019 at 12:09 AM André Draszik <git at andred.net> wrote:
>
> Hi,
>
> On Sun, 2019-01-06 at 19:13 +0100, Jacob Kroon wrote:
> > Introduce 'md5' in BUILDHISTORY_FEATURES and enable it by default
> > when doing reproducible builds.
> >
> > When enabled this will additionally create:
> >
> >   files-in-package-md5.txt
> >   files-in-image-md5.txt
> >   files-in-sdk-md5.txt
> >
> > containing the md5 checksums of regular files.
> >
> > Signed-off-by: Jacob Kroon <jacob.kroon at gmail.com>
> > ---
> >  meta/classes/buildhistory.bbclass | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/meta/classes/buildhistory.bbclass
> > b/meta/classes/buildhistory.bbclass
> > index 33eb1b00f6..00f0701dec 100644
> > --- a/meta/classes/buildhistory.bbclass
> > +++ b/meta/classes/buildhistory.bbclass
> > @@ -7,7 +7,8 @@
> >  # Copyright (C) 2007-2011 Koen Kooi <koen at openembedded.org>
> >  #
> >
> > -BUILDHISTORY_FEATURES ?= "image package sdk"
> > +BUILDHISTORY_FEATURES ?= "image package sdk \
> > +  ${@ "md5" if
> > bb.utils.to_boolean(d.getVar('BUILD_REPRODUCIBLE_BINARIES')) else ""}"
> >  BUILDHISTORY_DIR ?= "${TOPDIR}/buildhistory"
> >  BUILDHISTORY_DIR_IMAGE =
> > "${BUILDHISTORY_DIR}/images/${MACHINE_ARCH}/${TCLIBC}/${IMAGE_BASENAME}"
> >  BUILDHISTORY_DIR_PACKAGE =
> > "${BUILDHISTORY_DIR}/packages/${MULTIMACH_TARGET_SYS}/${PN}"
> > @@ -526,7 +527,12 @@ buildhistory_list_files() {
> >               eval ${FAKEROOTENV} ${FAKEROOTCMD} $find_cmd
> >       else
> >               eval $find_cmd
> > -     fi | sort -k5 | sed 's/ * -> $//' > $2 )
> > +     fi | sort -k5 | sed 's/ * -> $//' > $2
> > +     if [ "${@bb.utils.contains('BUILDHISTORY_FEATURES', 'md5', '1', '0',
> > d)}" = "1" ] ; then
> > +             md5filename=$(echo $2 | sed 's/\.txt$/-md5.txt/')
> > +             find -type f | xargs -I{} -n1 md5sum {} | sort -k2 >
> > $md5filename
>
> Why don't you
>   find . -type f -exec md5sum {} + | sort -sk2 > $md5filename
> ?
> It'll be quite a bit faster because way fewer processes will be spawned.
>
> Am I missing something?

You're right, I will update the patch. I'm assuming I don't need the
stable sort, -s,
since the filenames should all be unique.

> I don't know what the intended use-case of the md5 files is, but could
> sha256 or similar maybe be more appropriate?

I thought it would be a good idea to store some sort of checksum of files in the
buildhistory when doing reproducible builds, so that it is easier to detect
when a rebuild produces changed files, but perhaps there is some way to do
this already that I am missing ?

But I have no real motivation for choosing md5, other than that I
assumed it would be less
cpu intensive than sha256, and the fact I'm not too worried about collisions.

Thanks for the feedback,
Jacob

> Cheers,
> Andre'
>
>
> > +             [ -s $md5filename ] || rm $md5filename # remove result if
> > empty
> > +     fi )
> >  }
> >
> >  buildhistory_list_pkg_files() {
> > --
> > 2.11.0
> >
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


More information about the Openembedded-core mailing list