[OE-core] Mis-generation of shell script (run.do_install)?

Tue Dec 18 17:45:59 UTC 2018

On Mon, Dec 17, 2018 at 4:24 PM <richard.purdie at linuxfoundation.org> wrote:
>
> On Mon, 2018-12-17 at 12:21 -0800, Andre McCurdy wrote:
> > On Mon, Dec 17, 2018 at 6:44 AM <richard.purdie at linuxfoundation.org>
> > wrote:
> > > On Sat, 2018-12-15 at 20:19 -0500, Jason Andryuk wrote:
> > > > As far as I can tell, pysh is working properly - it's just the
> > > > bb_codeparser.dat which is returning the incorrect shellCacheLine
> > > > entry.  It seems like I have an md5 collision between a pyro
> > > > core2-64
> > > > binutils do_install and core2-32 python-async
> > > > distutils_do_install in
> > > > the shellCacheLine.  python-async's entry got in first, so that's
> > > > why
> > > > binutils run.do_install doesn't include autotools_do_install -
> > > > the
> > > > shellCacheLine `execs` entry doesn't include it.  Or somehow the
> > > > `bb_codeparser.dat` file was corrupted to have an incorrect
> > > > `execs`
> > > > for the binutils do_install hash.
> > >
> > > That is rather worrying. Looking at the known issues with md5, I
> > > can
> > > see how this could happen though.
> >
> > How do you see this could happen? By random bad luck?
> >
> > Despite md5 now being susceptible to targeted attacks, the chances of
> > accidentally hitting a collision between two 128bit hashes is as
> > unlikely as it's always been.
> >
> >   http://big.info/2013/04/md5-hash-collision-probability-using.html
> >
> > "It is not that easy to get hash collisions when using MD5 algorithm.
> > Even after you have generated 26 trillion hash values, the
> > probability of the next generated hash value to be the same as one of
> > those 26 trillion previously generated hash values is 1/1trillion (1
> > out of 1 trillion)."
> >
> > It seems much more likely that there's a bug somewhere in the way the
> > hashes are used. Unless we understand that then switching to a longer
> > hash might not solve anything.
>
> The md5 collision generators have demonstrated its possible to get
> checksums where there is a block of contiguous fixed data and a block
> of arbitrary data in ratios of up to about 75% to 25%.
>
> That pattern nearly exactly matches our function templating mechanism
> where two functions may be nearly identical except for a name or a
> small subset of it.
>
> Two random hashes colliding are less interesting than the chances of
> two very similar but subtly different pieces of code getting the same
> hash. I don't have a mathematical level proof of it but looking at the
> way you can generate collisions, I suspect our data is susceptible and
> the fact you can do it at all with such large blocks is concerning.
>
> I would love to have definitive proof. I'd be really interested if
> Jason has the "bad" checksum and one of the inputs which matches it as
> I'd probably see if we could brute force the other. I've read enough to
> lose faith in our current code though.
>
> Also though, there is the human factor. What I don't want to have is
> people put off the project deeming it "insecure". I already get raised
> eyebrows at the use of md5. Its probably time to switch and be done
> with any perception anyway, particularly now questions are being asked,
> valid or not as the performance hit, whilst noticeable on a profile is
> not earth shattering.
>
> Finally, by all means please do audit the codepaths and see if there is
> another explanation. Our hash use is fairly simple but its possible
> there is some other logic error and if there is we should fix it.

I can definitively state I have a hash in bb_codeparser.dat with an
incorrect shellCacheLine entry and I don't know how it got there.

The bad hash is 3df9018676de219bb3e46e88eea09c98.  I've attached a
file with the binutils do_install() contents which hash to that value.

The bad 3df9018676de219bb3e46e88eea09c98 entry in the bb_codeparser.dat returned
DEBUG: execs [
DEBUG: execs rm
DEBUG: execs install
DEBUG: execs test
DEBUG: execs sed
DEBUG: execs rmdir
DEBUG: execs bbfatal_log
DEBUG: execs mv
DEBUG: execs /home/build/openxt-compartments/build/tmp-glibc/work/core2-32-oe-linux/python-async/0.6.2-r0/recipe-sysroot-native/usr/bin/python-native/python
DEBUG: execs find

These execs looks like they could be from a distutils_do_install(),
but that's just a guess.  python-async was not in my tmp-glibc
directory when I started this investigation.  I don't know how it got
there.  I built it manually, but the resulting distutils_do_install
has a different hash :(

The correct shellCacheLine entry for core2-64 binutils do_install returns:
DEBUG: execs basename
DEBUG: execs rm
DEBUG: execs oe_multilib_header
DEBUG: execs ln
DEBUG: execs install
DEBUG: execs echo
DEBUG: execs cd
DEBUG: execs autotools_do_install
DEBUG: execs sed
DEBUG: execs tr

Is it an md5 collision?  I don't know - I don't have a second
colliding input for 3df9018676de219bb3e46e88eea09c98.

Any hashing can potentially have collisions.  A longer and stronger
algorithm reduces the chances, but there is no absolute fix.  Without
comparing the original inputs, you can't know if two inputs collided.

This openxt 8 build is based on pyro, fyi.

Regards,
Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: binutils_do_install-3df9018676de219bb3e46e88eea09c98
Type: application/octet-stream
Size: 1704 bytes
Desc: not available
URL: <http://lists.openembedded.org/pipermail/openembedded-core/attachments/20181218/8e2c7dda/attachment-0001.obj>