[bitbake-devel] [PATCH 1/2] bb/utils.py: add iterate_chunks and hash_file helpers

Rasmus Villemoes rasmus.villemoes at prevas.dk
Tue Jul 10 14:06:11 UTC 2018


The current (md5,sha1,sha256)_file functions are somewhat inefficient in
that they loop over the lines of the given file. In a pathological case,
a huge binary file might have no \n characters at all, causing a lot of
realloc'ing on the way to providing the caller with the file as one big
"line".

For random binary data (and compressed tarballs are effectively
that) there's a \n roughly every 256 bytes, and for text files even more
often, but splitting the input buffer into lines is a waste of time, and
all hash methods work beter with nice aligned chunks.

So introduce two helpers: One that will iterate over a given file in
chunks, another that takes a hasher object and calls m.update() on the
chunks.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
---
 lib/bb/utils.py | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/lib/bb/utils.py b/lib/bb/utils.py
index 378e699e..cb7a6fa2 100644
--- a/lib/bb/utils.py
+++ b/lib/bb/utils.py
@@ -520,6 +520,28 @@ def unlockfile(lf):
     fcntl.flock(lf.fileno(), fcntl.LOCK_UN)
     lf.close()
 
+def iterate_chunks(filename, chunk_size = 32768):
+    """
+    Return an iterator that yields the contents of filename in chunks
+    of size (up to) chunk_size.
+    """
+    with open(filename, "rb") as f:
+        while True:
+            chunk = f.read(chunk_size)
+            if chunk:
+                yield chunk
+            else:
+                return
+
+def hash_file(hasher, filename):
+    """
+    Update the hasher object with the contents of filename, returning
+    the hasher object.
+    """
+    for chunk in iterate_chunks(filename):
+        hasher.update(chunk)
+    return hasher
+
 def md5_file(filename):
     """
     Return the hex string representation of the MD5 checksum of filename.
-- 
2.16.4




More information about the bitbake-devel mailing list