[OE-core] [RFC PATCH 3/3] WIP: classes/packagefeed-stability: add class to help reduce package feed churn

Mark Hatle mark.hatle at windriver.com
Tue Sep 8 15:47:45 UTC 2015


On 9/8/15 10:03 AM, Paul Eggleton wrote:
> Hi Mark,
> 
> On Tuesday 08 September 2015 09:09:34 Mark Hatle wrote:
>> I've got some scripts where I do this externally of the build system.  So a
>> few comments on the stuff below based on what I found working on those
>> scripts.
>>
>> (Basically the system deploys as normal.. then we run the scripts to copy
>> the deploy directory contents to a release directory... we do the
>> build-compare as part of that copy...)
> 
> Ah ok, interesting. I meant to mention as part of the cover letter actually 
> that Randy and I had considered the approach of doing it separately and Randy 
> actually got some way down the path of implementing it, until I realised that 
> it would be problematic in that the images you'd generate out of the same 
> process as the package feed would have the newer packages in them instead of 
> the ones that were in the feed.

Yes, certainly an issue with image generation needing to match.

>> See below inline...
>>
>> On 9/8/15 8:41 AM, Paul Eggleton wrote:
>>> When a dependency causes a recipe to effectively be rebuilt, its output
>>> may in fact not change; but new packages (with an increased PR value, if
>>> using the PR server) will be generated nonetheless. There's no practical
>>> way for us to predict whether or not this is going to be the case based
>>> solely on the inputs, but we can compare the package output and see if
>>> that is materially different and based upon that decide to replace the
>>> old package with the new one.
>>>
>>> This class effectively intercepts packages as they are written out by
>>> do_package_write_*, causing them to be written into a different
>>> directory where we can compare them to whatever older packages might
>>> be in the "real" package feed directory, and avoid copying the new
>>> package to the feed if it has not materially changed. We use
>>> build-compare to do the package comparison.
>>>
>>> (NOTE: this is still a work in progress and there are no doubt
>>> unresolved issues.)
>>>
>>> Signed-off-by: Paul Eggleton <paul.eggleton at linux.intel.com>
>>> ---
>>>
>>>  meta/classes/packagefeed-stability.bbclass | 270
>>>  +++++++++++++++++++++++++++++ 1 file changed, 270 insertions(+)
>>>  create mode 100644 meta/classes/packagefeed-stability.bbclass
>>>
>>> diff --git a/meta/classes/packagefeed-stability.bbclass
>>> b/meta/classes/packagefeed-stability.bbclass new file mode 100644
>>> index 0000000..b5207d9
>>> --- /dev/null
>>> +++ b/meta/classes/packagefeed-stability.bbclass
>>> @@ -0,0 +1,270 @@
>>> +# Class to avoid copying packages into the feed if they haven't
>>> materially changed +#
>>> +# Copyright (C) 2015 Intel Corporation
>>> +# Released under the MIT license (see COPYING.MIT for details)
>>> +#
>>> +# This class effectively intercepts packages as they are written out by
>>> +# do_package_write_*, causing them to be written into a different
>>> +# directory where we can compare them to whatever older packages might
>>> +# be in the "real" package feed directory, and avoid copying the new
>>> +# package to the feed if it has not materially changed. The idea is to
>>> +# avoid unnecessary churn in the packages when dependencies trigger task
>>> +# reexecution (and thus repackaging). Enabling the class is simple:
>>> +#
>>> +# INHERIT += "packagefeed-stability"
>>> +#
>>> +# Caveats:
>>> +# 1) Latest PR values in the build system may not match those in packages
>>> +#    seen on the target (naturally)
>>> +# 2) If you rebuild from sstate without the existing package feed
>>> present,
>>> +#    you will lose the "state" of the package feed i.e. the preserved old
>>> +#    package versions. Not the end of the world, but would negate the
>>> +#    entire purpose of this class.
>>> +#
>>> +# Note that running -c cleanall on a recipe will purposely delete the old
>>> +# package files so they will definitely be copied the next time.
>>> +
>>> +python() {
>>> +    # Package backend agnostic intercept
>>> +    # This assumes that the package_write task is called
>>> package_write_<pkgtype> +    # and that the directory in which packages
>>> should be written is +    # pointed to by the variable
>>> DEPLOY_DIR_<PKGTYPE>
>>> +    for pkgclass in (d.getVar('PACKAGE_CLASSES', True) or '').split():
>>> +        if pkgclass.startswith('package_'):
>>> +            pkgtype = pkgclass.split('_', 1)[1]
>>> +            pkgwritefunc = 'do_package_write_%s' % pkgtype
>>> +            sstate_outputdirs = d.getVarFlag(pkgwritefunc,
>>> 'sstate-outputdirs', False) +            deploydirvar = 'DEPLOY_DIR_%s' %
>>> pkgtype.upper()
>>> +            deploydirvarref = '${' + deploydirvar + '}'
>>> +            pkgcomparefunc = 'do_package_compare_%s' % pkgtype
>>> +
>>> +            if bb.data.inherits_class('image', d):
>>> +                d.appendVarFlag('do_rootfs', 'recrdeptask', ' ' +
>>> pkgcomparefunc) +
>>> +            if bb.data.inherits_class('populate_sdk_base', d):
>>> +                d.appendVarFlag('do_populate_sdk', 'recrdeptask', ' ' +
>>> pkgcomparefunc) +
>>> +            if bb.data.inherits_class('populate_sdk_ext', d):
>>> +                d.appendVarFlag('do_populate_sdk_ext', 'recrdeptask', ' '
>>> + pkgcomparefunc) +
>>> +            d.appendVarFlag('do_build', 'recrdeptask', ' ' +
>>> pkgcomparefunc) +
>>> +            if d.getVarFlag(pkgwritefunc, 'noexec', True) or (not
>>> d.getVarFlag(pkgwritefunc, 'task', True)) or pkgwritefunc in
>>> (d.getVar('__BBDELTASKS', True) or []): +                # Packaging is
>>> disabled for this recipe, we shouldn't do anything +               
>>> continue
>>> +
>>> +            if deploydirvarref in sstate_outputdirs:
>>> +                # Set intermediate output directory
>>> +                d.setVarFlag(pkgwritefunc, 'sstate-outputdirs',
>>> sstate_outputdirs.replace(deploydirvarref, deploydirvarref + '-prediff'))
>>> +
>>> +            d.setVar(pkgcomparefunc, d.getVar('do_package_compare',
>>> False)) +            d.setVarFlags(pkgcomparefunc,
>>> d.getVarFlags('do_package_compare', False)) +           
>>> d.appendVarFlag(pkgcomparefunc, 'depends', '
>>> build-compare-native:do_populate_sysroot') +           
>>> bb.build.addtask(pkgcomparefunc, 'do_build', 'do_packagedata ' +
>>> pkgwritefunc, d) +}
>>> +
>>> +# This isn't the real task function - it's a template that we use in the
>>> +# anonymous python code above
>>> +fakeroot python do_package_compare () {
>>> +    currenttask = d.getVar('BB_CURRENTTASK', True)
>>> +    pkgtype = currenttask.rsplit('_', 1)[1]
>>> +    package_compare_impl(pkgtype, d)
>>> +}
>>> +
>>> +def package_compare_impl(pkgtype, d):
>>> +    import errno
>>> +    import fnmatch
>>> +    import glob
>>> +    import subprocess
>>> +    import oe.sstatesig
>>> +
>>> +    pn = d.getVar('PN', True)
>>> +    deploydir = d.getVar('DEPLOY_DIR_%s' % pkgtype.upper(), True)
>>> +    prepath = deploydir + '-prediff/'
>>> +
>>> +    # Find out PKGR values are
>>> +    pkgdatadir = d.getVar('PKGDATA_DIR', True)
>>> +    packages = []
>>> +    try:
>>> +        with open(os.path.join(pkgdatadir, pn), 'r') as f:
>>> +            for line in f:
>>> +                if line.startswith('PACKAGES:'):
>>> +                    packages = line.split(':', 1)[1].split()
>>> +                    break
>>> +    except IOError as e:
>>> +        if e.errno == errno.ENOENT:
>>> +            pass
>>> +
>>> +    if not packages:
>>> +        bb.debug(2, '%s: no packages, nothing to do' % pn)
>>> +        return
>>> +
>>> +    pkgrvalues = {}
>>> +    rpkgnames = {}
>>> +    rdepends = {}
>>> +    pkgvvalues = {}
>>> +    for pkg in packages:
>>> +        with open(os.path.join(pkgdatadir, 'runtime', pkg), 'r') as f:
>>> +            for line in f:
>>> +                if line.startswith('PKGR:'):
>>> +                    pkgrvalues[pkg] = line.split(':', 1)[1].strip()
>>> +                if line.startswith('PKGV:'):
>>> +                    pkgvvalues[pkg] = line.split(':', 1)[1].strip()
>>> +                elif line.startswith('PKG_%s:' % pkg):
>>> +                    rpkgnames[pkg] = line.split(':', 1)[1].strip()
>>> +                elif line.startswith('RDEPENDS_%s:' % pkg):
>>> +                    rdepends[pkg] = line.split(':', 1)[1].strip()
>>> +
>>> +    # Prepare a list of the runtime package names for packages that were
>>> +    # actually produced
>>> +    rpkglist = []
>>> +    for pkg, rpkg in rpkgnames.iteritems():
>>> +        if os.path.exists(os.path.join(pkgdatadir, 'runtime', pkg +
>>> '.packaged')): +            rpkglist.append((rpkg, pkg))
>>> +    rpkglist.sort(key=lambda x: len(x[0]), reverse=True)
>>> +
>>> +    pvu = d.getVar('PV', False)
>>> +    if '$' + '{SRCPV}' in pvu:
>>> +        pvprefix = pvu.split('$' + '{SRCPV}', 1)[0]
>>> +    else:
>>> +        pvprefix = None
>>> +
>>> +    pkgwritetask = 'package_write_%s' % pkgtype
>>> +    files = []
>>> +    copypkgs = []
>>> +    manifest, _ = oe.sstatesig.sstate_get_manifest_filename(pkgwritetask,
>>> d) +    with open(manifest, 'r') as f:
>>> +        for line in f:
>>
>>> +            if line.startswith(prepath):
>>
>> Is this comparing the -packages- or the -files-?  I ask, because if you
>> process the packages you can short-cut some of the comparisons.. it makes a
>> huge improvement in speed.
> 
> Well the manifest for do_package_write is each package file that it writes out, 
> so it is packages and not their contents (at this point in the code).
>  
>> Specifically, skip processing all '-dev' and '-dbg' packages.  (Not
>> -staticdev). The -dev and -dbg components need to match the runtime items. 
>> If the runtime items have no changed then there is no reason that the -dev
>> and -dbg items would have changed -- other then timestamps and file paths..
>> which the build-compare filters out anyway.
> 
> For -dbg I guess I was considering any case where debugging symbol generation 
> had changed in some way; I'm not familiar enough with the format of those 
> symbols to know whether that could really happen in practice or not. For -dev, 
> what if someone patches a header alone (would be unusual, but not impossible 
> to imagine)?

dbg symbols should not be able to change (other then file paths which are
ignored), otherwise something else has changed and the other pieces of the
recipe will know that.

As for the -dev, it would be possible to update a header or other component that
does not in the end affect the produced binaries.  I believe in a released
system that will be extraordinarily rare corner case... however it is something
that can certainly happen.  In our system we decided to handle that corner case
through a manual copy, but for OE, only ignoring -dbg should be reasonable.

>>> +                srcpath = line.rstrip()
>>> +                if os.path.isfile(srcpath):
>>> +                    destpath = os.path.join(deploydir,
>>> os.path.relpath(srcpath, prepath)) +
>>> +                    # This is crude but should work assuming the output
>>> +                    # package file name starts with the package name
>>> +                    # and rpkglist is sorted by length (descending)
>>> +                    pkgbasename = os.path.basename(destpath)
>>> +                    pkgname = None
>>> +                    for rpkg, pkg in rpkglist:
>>> +                        if pkgbasename.startswith(rpkg):
>>> +                            pkgr = pkgrvalues[pkg]
>>> +                            destpathspec = destpath.replace(pkgr, '*')
>>> +                            if pvprefix:
>>> +                                pkgv = pkgvvalues[pkg]
>>> +                                if pkgv.startswith(pvprefix):
>>> +                                    pkgvsuffix = pkgv[len(pvprefix):]
>>> +                                    if '+' in pkgvsuffix:
>>> +                                        newpkgv = pvprefix + '*+' +
>>> pkgvsuffix.split('+', 1)[1] +                                       
>>> destpathspec = destpathspec.replace(pkgv, newpkgv) +                     
>>>       pkgname = pkg
>>> +                            break
>>> +                    else:
>>> +                        bb.warn('Unable to map %s back to package' %
>>> pkgbasename) +                        destpathspec = destpath
>>> +
>>> +                    oldfiles = glob.glob(destpathspec)
>>> +                    oldfile = None
>>> +                    docopy = True
>>> +                    if oldfiles:
>>> +                        oldfile = oldfiles[-1]
>>> +                        result = subprocess.call(['pkg-diff.sh', oldfile,
>>> srcpath]) +                        if result == 0:
>>> +                            docopy = False
>>> +
>>
>> If any of the packages differ, then -all- of the packages differ.  You can
>> drop out of the loop at that point and simply copy the full set of packages
>> over at this point.
> 
> Right, that would be a lot simpler. I had imagined there might be an advantage 
> in terms of reduced on-target upgrading if you'd only for example patched the 
> source for one or two kernel modules and then rebuilt the kernel, but that 
> might not justify the extra complexity.

We found that typically this just doesn't happen.  Between embedded kernel
version numbers, built in PN-PV-PR dependencies and other 'safety' mechanisms..
 you really do have to publish all or nothing.

Large package sets (like the kernel, python, perl) would be a place for
optimization.. but I don't think it's a good idea to subset a recipe.  It really
makes regeneration of what you had very difficult without "repeating history"
(i.e. the steps you took to get there) and that makes for a bad build system
experience.  (Note there is still some of that since the deploy directory and
sstate-cache aren't in sync.. but I think it's easier to manage then partial
recipes updates.)

>> The reason for this is that the packages from a single recipe may (and often
>> do) have PN-PV-PR dependencies in them -- so they have to be kept as a
>> unit.  Also the -dbg package contains the overall debug information for the
>> full set of packages produced by a recipe.. so if you change a package the
>> -dbg has to be updated -- since the -dbg was updated, everything has to be
>> updated.
> 
> That's why I had the versioned dependency check, though it would be much 
> simpler (and faster) not to do any of that and just keep it all in lock-step. 
> 
>> So drop the loop, abort the compares and copy the full manifest set over in
>> one shot..  For large packages like perl and python it makes a HUGH
>> difference to avoid all of the extra comparisons.
>>
>> (I also did this because there is some concern in my mind if releasing only
>> part of a recipe build ever makes sense..  because I've not gotten into the
>> situation where the only way to reproduce the released feed would be to
>> build the old version first, then come in and build the new version next.. 
>> this isn't a typical workflow and greatly complicates things.)
> 
> Right, there is a potential risk (and potential for confusion) there.
> 
>>> +                    files.append((pkgname, pkgbasename, srcpath, oldfile,
>>> destpath)) +                    bb.debug(2, '%s: package %s %s' % (pn,
>>> files[-1], docopy)) +                    if docopy:
>>> +                        copypkgs.append(pkgname)
>>> +
>>
>> Making the change above would eliminate the need for the code below.. and
>> reduce the possibility of things getting out of sync.
> 
> Indeed.
> 
> Cheers,
> Paul
> 




More information about the Openembedded-core mailing list