[bitbake-devel] [PATCH 1/2] bb/utils.py: add iterate_chunks and hash_file helpers
Rasmus Villemoes
rasmus.villemoes at prevas.dk
Tue Jul 10 14:06:11 UTC 2018
The current (md5,sha1,sha256)_file functions are somewhat inefficient in
that they loop over the lines of the given file. In a pathological case,
a huge binary file might have no \n characters at all, causing a lot of
realloc'ing on the way to providing the caller with the file as one big
"line".
For random binary data (and compressed tarballs are effectively
that) there's a \n roughly every 256 bytes, and for text files even more
often, but splitting the input buffer into lines is a waste of time, and
all hash methods work beter with nice aligned chunks.
So introduce two helpers: One that will iterate over a given file in
chunks, another that takes a hasher object and calls m.update() on the
chunks.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
---
lib/bb/utils.py | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/lib/bb/utils.py b/lib/bb/utils.py
index 378e699e..cb7a6fa2 100644
--- a/lib/bb/utils.py
+++ b/lib/bb/utils.py
@@ -520,6 +520,28 @@ def unlockfile(lf):
fcntl.flock(lf.fileno(), fcntl.LOCK_UN)
lf.close()
+def iterate_chunks(filename, chunk_size = 32768):
+ """
+ Return an iterator that yields the contents of filename in chunks
+ of size (up to) chunk_size.
+ """
+ with open(filename, "rb") as f:
+ while True:
+ chunk = f.read(chunk_size)
+ if chunk:
+ yield chunk
+ else:
+ return
+
+def hash_file(hasher, filename):
+ """
+ Update the hasher object with the contents of filename, returning
+ the hasher object.
+ """
+ for chunk in iterate_chunks(filename):
+ hasher.update(chunk)
+ return hasher
+
def md5_file(filename):
"""
Return the hex string representation of the MD5 checksum of filename.
--
2.16.4
More information about the bitbake-devel
mailing list