[oe] Fixing ipkg-make-index slowness

Paul Sokolovsky pmiscml at gmail.com
Fri Jan 5 07:33:34 UTC 2007


Hello ,

      I bragged some time ago on IRC that I sped up ipkg-make-index
few times, and recently, question of ipkg-make-index slowness was
brought on ML too. So, I grasped my old patches wanting to submit it,
but of course it appeared not that easy. I've identified two causes of
slowness, each discussed separately below.

      All benchmarking was done on ipk repo of 5886 files and 245MB
total size, by running "bitbake package-index".


1. md5sum thrashing

In the summer, RP introduced patch for i-m-i ironically called
index_speedup.patch:
http://www.openembedded.org/bonsai/view/rev/17477/
It does following: If there's already Packages file exists, i-m-i
takes metadata from it, instead of parsing ipk's themselves. Before
this patch, such a cache was used simply when ipk file's name matched
filename recorded in Packages. RP added check for filesize, and also
for md5sum of the ipk's content. That means if you have Packages
file, and want to index few new ipk's, i-m-i will happily thrash over
each byte of entire package repo you have (like 26Gb). On my
repo running "bitbake package-index" with already existing and
up-to-date Packages led to:

i-m-i/md5sum
real    2m1.219s

i-m-i/no-md5sum
real    0m53.294s


Richard, what were the reasons for such conservative file matching?
Filename matching should be just enough, as per OE convention, any
package source changes leading to changes in the package metadata must
lead to bumping of package recipe's PR, and that in turn updates package
filename. Whoever don't follow PR update convention either call for
trouble, and no checks could really help them, or know what they do
(like bother to rebuild Packages from scratch). In this regard,
checking ipk size is the great convenience for adventurous, because
it's really high probability that update of any package metadata will
lead to change of file size due to compression. In other words, I
propose to remove md5sum check.


2. Unix process thrashing

Ok, that was cause of slowdown with already made Packages. Now, major
annoyance is (re)creating it or adding large number of packages. This
due to ar, tar, gz, being spawned for each ipk. Multiple Unix process
handling inefficiency by thousands files, and we get what we have.

So, I just took tarfile module from Python 2.3+, created arfile which
is not in Python, and I even failed to google it, and made them work
recursively one on another. Results:

i-m-i/spawn
real    14m30.239s

i-m-i/tarfile
real    5m21.950s

(Btw, I swear I was getting 6-7 times speed when I initially tried it
on Familiar buildtree.)

There's a small regression though: with this change, only deb-style
ipk's are supported. This shouldn't be an issue, as OE (and current
ipkg) generates exactly such ipks. And I envy people who used to know
and still remember what is the other ipk format ;-).

The patch is posted as:
http://bugs.openembedded.org/show_bug.cgi?id=1751


I hope that this analysis and changes/patches proposed will be of use
to someone.

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com





More information about the Openembedded-devel mailing list