[OE-core] [PATCH] sstate: Truncate PV in sstate filenames that are too long

Mike Crowe mac at mcrowe.com
Wed Jul 31 17:41:53 UTC 2019


On Tuesday 30 July 2019 at 09:14:01 -0500, Mark Hatle wrote:
> On 7/30/19 8:49 AM, Mike Crowe wrote:
> > On Tuesday 30 July 2019 at 08:25:52 -0500, Mark Hatle wrote:
> >> On 7/30/19 6:01 AM, Mike Crowe wrote:
> >>> sstate filenames are generated by concatenating a variety of bits of
> >>> package metadata. Some of these parts could be long, which could cause
> >>> the filename to be longer than the 255 character maximum for ext4.
> >>>
> >>> So, let's try to detect this situation and truncate the PV part of the
> >>> filename so that it will fit. If this happens, an ellipsis is added to
> >>> make it clear that the version number is incomplete.
> >>>
> >>> SSTATE_PKG needs to be consistent for all tasks so that the hash
> >>> remains stable. This means that we need to make an assumption for the
> >>> maximum length of the task name. In this implementation, the task name
> >>> is limited to 27 characters.
> >>>
> >>> This change also results in a sensible error message being emitted if
> >>> the resulting filename is still too long.
> >>>
> >>> Signed-off-by: Mike Crowe <mac at mcrowe.com>
> >>>
> >>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> >>> index 3342c5ef50..6313b1c538 100644
> >>> --- a/meta/classes/sstate.bbclass
> >>> +++ b/meta/classes/sstate.bbclass
> >>> @@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
> >>>          hash = "INVALID"
> >>>      return hash[:2] + "/" + spec + hash
> >>>  
> >>> +def sstate_path(taskname, d):
> >>> +    max_filename_len = 245 # leave some room for ".siginfo"
> >>> +    max_addendum_len = 32 # '_' + taskname + '.tgz'
> >>
> >> Since the task name is variable, is there really a 32 character limit here?
> >>
> >> It may make sense to do:
> >>
> >> # '_' + taskname + '.tgz', reserving a minimum of 32 for taskname
> >> max_addendum_len = len(taskname) + 5 if len(taskname) + 5 > 32 else 32
> >>
> >> Always reserve a minimum of 32 for consistency, but if we go over account
> >> for it.
> > 
> > I think that would just cause task hash mismatches (see third paragraph of
> > commit message.)
> > 
> > It probably does make sense to detect such long task names in this
> > situation and generate errors though.
> > 
> >>> +    sstate_prefix = d.getVar('SSTATE_PKG')
> >>> +    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
> >>> +    if excess > 0:
> >>> +        pv = d.getVar('PV')
> >>> +        if len(pv) >= excess and len(pv) >= 3:
> >>> +            short_pv = d.getVar('PV')[:-excess-3] + '...'
> >>
> >> Is truncating the PV enough?  In a discussion on the bitbake list, I suggested
> >> possibly changing the order of the entries in the SSTATE_PKGSPEC to allow us to
> >> prune things prior to the hash w/o affecting the hash.  Maybe this is simply not
> >> needed.. but it's a possibility if this proves to not be effective.
> > 
> > Truncating PV solves the problem I was having. The other fields don't
> > really tend to be very long, so there's less to be gained by shortening
> > them.

Here's a change that just always truncates PV. It has the benefit of being
simple but it will virtually always truncate too much and risks not
truncating enough when other parts of the path are too long.

However, if I can't get my head around enough of this to come up with an
acceptable solution, we'll probably end up just using this change locally:

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 3342c5ef50..e7003ea93f 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -8,17 +8,26 @@ def generate_sstatefn(spec, hash, d):
         hash = "INVALID"
     return hash[:2] + "/" + spec + hash

+def shortened_pv(d):
+    pv = d.getVar('PV')
+    if len(pv) <= 64:
+        return pv
+    else:
+        bb.note("PV shortened: " + pv)
+        return pv[:61] + '...'
+
+SSTATE_PV = "${@shortened_pv(d)}"
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
-SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
-SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
+SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${SSTATE_PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
+SSTATE_SWSPEC     = "sstate:${PN}::${SSTATE_PV}:${PR}::${SSTATE_VERSION}:"
 SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
 SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/${SSTATE_PKGSPEC}"

-# explicitly make PV to depend on evaluated value of PV variable
-PV[vardepvalue] = "${PV}"
+# explicitly make SSTATE_PV to depend on evaluated value of PV variable
+SSTATE_PV[vardepvalue] = "${PV}"

 # We don't want the sstate to depend on things like the distro string
 # of the system, we let the sstate paths take care of this.

> >>> +            d2 = d.createCopy()
> >>> +            d2.setVar('PV', short_pv)
> >>> +            sstate_prefix = d2.getVar('SSTATE_PKG')
> >>> +
> >>> +    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
> >>> +    if len(os.path.basename(sstatepkg)) > max_filename_len:
> >>> +        bb.error('Failed to shorten sstate filename')
> >>> +    return sstatepkg
> >>> +
> >>>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> >>>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> >>>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> >>
> >> There is something else I noticed..  "SSTATE_PKGNAME" defined as:
> >>
> >> SSTATE_PKGNAME    =
> >> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'),
> >> d.getVar('BB_UNIHASH'), d)}"
> >>
> >> From what I can tell, this really should be using the new sstate_path function
> >> in someway.
> > 
> > Hmm, you're right. I can't have covered that in my testing. :(
> > 
> >> Would it make more sense to define SSTATE_PKGNAME in such a way that it always
> >> resulted in something "short" enough, and in the right format, that it would
> >> always work?
> > 
> > I'd considered doing it that way. If SSTATE_OKGSPEC contained ${SSTATE_PV}
> > and that either had the value of ${PV} or the a truncated version of ${PV}
> > then the rest of the file could remain the same. However, truncating PV
> > without access to the rest of the spec would mean just picking some
> > arbitrary maximum PV length which is likely to be more conservative than
> > necessary.
> > 
> >> Adjusting or rewriting "generate_sstatefn" could still accomplish the PV change,
> >> but the max length of the string would need further shrinking to accommodate an
> >> unknown task length (which goes back to my previous comment).  If the 32 default
> >> is long enough then that shouldn't be a problem -- and may also resolve my
> >> concerns that something outside of sstate class could try to use that various
> >> and without the new magic function get the wrong results.
> > 
> > I wonder whether I can get away with applying per-task PV truncation in
> > generate_sstatefn without causing hash mismatches? That's worth a try.
> 
> The only cause of a task hash mismatch (actual hash, not filename) would be in
> SSTATE_PKGNAME is part of the hash itself.  I'd contend if it is, then we should
> exclude it.

I would have thought so to, but I'm unable to persuade myself that it is
definitely correct in all situations. If I have a fully built work tree,
and change SSTATE_PKGNAME then would I expect a subsequent build to add the
new filenames to the sstate cache?

Although I don't believe that I yet really understand sstate.bbclass, I've
done some more digging today to learn more. I thought I had a working
solution (including potentially different truncations for different task
names!) but then I discovered that nothing was ever successfully fetched
from the sstate cache. This turned out to be because I hadn't appreciated
the way that BB_HASHFILENAME was being used to transport variable values
into sstate_checkhashes, so I was ending up with default values for PN, PV
etc. :(

I can't see a way to make this work without transporting each of the
individual components of SSTATE_PKGSPEC and SSTATE_DIR, along with their
unexpanded versions through BB_HASHFILENAME. I can't tunnel the full
sstate filename because I don't know the task name at that point. :(

In case anyone cares, here's the patch that shows what I'm trying to do but
fails to work as described above:

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 3342c5ef50..bd2b76bab1 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -3,16 +3,40 @@ SSTATE_VERSION = "3"
 SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
 SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"

-def generate_sstatefn(spec, hash, d):
+# Generate an sstate filename that is guaranteed to not be longer than
+# the 255 character path component maximum for ext4. Returns a tuple
+# (sstate leaf filename, sstate full path)
+def generate_sstate_filename(d, taskname, hash = None, extension = '.tgz'):
+    # leave some room for '.siginfo' or '.XXXXXXXX' temporary filename suffix
+    max_filename_len = 246
+    if not hash:
+        hash = d.getVar('BB_UNIHASH')
     if not hash:
         hash = "INVALID"
-    return hash[:2] + "/" + spec + hash
+    sstate_prefix = d.getVar('SSTATE_PKGSPEC')
+    excess = len(sstate_prefix) + len(hash) + 1 + len(taskname) + len(extension) - max_filename_len
+    if excess > 0:
+        pv = d.getVar('PV')
+        if len(pv) >= excess and len(pv) >= 3:
+            short_pv = d.getVar('PV')[:-excess-3] + '...'
+            d2 = d.createCopy()
+            d2.setVar('PV', short_pv)
+            sstate_prefix = d2.getVar('SSTATE_PKGSPEC')
+
+    sstate_filename = sstate_prefix + hash + '_'+ taskname + extension
+    if len(sstate_filename) > max_filename_len:
+        bb.error('Failed to shorten sstate filename')
+
+    return (sstate_filename, d.getVar('SSTATE_DIR') + '/' + hash[:2] + '/' + sstate_filename)

 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
-SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
+
+# Assigned in sstate_installpkg and sstate_package before running
+# tasks. Should not be used beforehand
+SSTATE_PKG        = "SSTATE_PKG_NOT_YET_ASSIGNED"
+
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
 SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/${SSTATE_PKGSPEC}"
@@ -322,9 +346,7 @@ def sstate_installpkg(ss, d):
     from oe.gpg_sign import get_signer

     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
-    sstatefetch = d.getVar('SSTATE_PKGNAME') + '_' + ss['task'] + ".tgz"
-    sstatepkg = d.getVar('SSTATE_PKG') + '_' + ss['task'] + ".tgz"
-
+    (sstatefetch, sstatepkg) = generate_sstate_filename(d, ss['task'])
     if not os.path.exists(sstatepkg):
         pstaging_fetch(sstatefetch, d)

@@ -617,7 +639,7 @@ def sstate_package(ss, d):
     tmpdir = d.getVar('TMPDIR')

     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
-    sstatepkg = d.getVar('SSTATE_PKG') + '_'+ ss['task'] + ".tgz"
+    sstatepkg = generate_sstate_filename(d, ss['task'])[1]
     bb.utils.remove(sstatebuild, recurse=True)
     bb.utils.mkdirhier(sstatebuild)
     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
@@ -813,9 +835,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,

     ret = []
     missed = []
-    extension = ".tgz"
-    if siginfo:
-        extension = extension + ".siginfo"
+    maybe_siginfo = '.siginfo' if siginfo else ''

     def gethash(task):
         if sq_unihash is not None:
@@ -844,7 +864,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,

         spec, extrapath, tname = getpathcomponents(task, d)

-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
+        sstatefile = generate_sstate_filename(d, tname, gethash(task))[1] + maybe_siginfo

         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -907,7 +927,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))[1] + maybe_siginfo
             tasklist.append((task, sstatefile))

         if tasklist:
@@ -937,11 +957,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))
             evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))
             evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)

@@ -1084,8 +1104,8 @@ python sstate_eventhandler() {
         if taskname in ["fetch", "unpack", "patch", "populate_lic", "preconfigure"] and swspec:
             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
             d.setVar("SSTATE_EXTRAPATH", "")
-        sstatepkg = d.getVar('SSTATE_PKG')
-        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
+        sstateinfo = generate_sstate_filename(d, taskname)[1] + ".siginfo"
+        bb.siggen.dump_this_task(sstateinfo, d)
 }

 SSTATE_PRUNE_OBSOLETEWORKDIR ?= "1"

Thanks.

Mike.


More information about the Openembedded-core mailing list