[oe] checksums situation

Fri Feb 13 19:35:24 UTC 2009

Hello,

On Fri, Feb 13, 2009 at 7:37 PM, Otavio Salvador <otavio at debian.org> wrote:
> Ihar Hrachyshka <ihar.hrachyshka at gmail.com> writes:
>
>> On Fri, Feb 13, 2009 at 7:08 PM, Otavio Salvador <otavio at debian.org> wrote:
>>> Marcin Juszkiewicz <openembedded at haerwu.biz> writes:
>>>
>>>>
>>>> What do you think? Which way we should go? Do you have other ideas?
>>> <...>
>>>

I have thought about this (and our current fetch and checksum) a lot
in the past.

>> Keep the checksums.ini file, but change it's layout a bit.

In all cases, we need to REMOVE the URL of a package from the key, as
we are ONLY interested in its contents.

So, solution 1:

>> Make the key of a package: <filename>.

However, I would opt for solution 2:

>> Make the key of a source package: <filename, sha1sum, md5sum>.

Yes, that's right. The key of a package is it's filename plus it's contents.
Filename because we humans identify it by its name.
The dual checksum because we can guarantee the desired contents.

Next step, is we adapt the fetcher to:

- find URLs for a package given any of it's (sub)key.
- check the package against its dual sums.

I have written a working proof-of-concept of a
package-fetcher/checker/ url-cache/... in C using sqlite (instead of
checksums.ini)

which is public here:

http://www.sidebranch.com/leon/witpa_20080730.tar.bz2

It can import checksums.ini though.

To summarize: remove URL from the package key.

The .bb names the upstream URL and filename.

The filename ALONE is used as the key into the file database.
The file database caches URLs for the package, and has its md5sum and sha1sum.

The fetcher tries to fetch from the cached URLs or can fall back to
Google for the file by either filename or any checksum.
It can verify correctness against the checksums.

Regards,
-- 
Leon