-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verifiably reproducible build artefacts #1269
Comments
Ensuring
|
Note: I tried this on macOS and Fedora 33. I don't think the sdist being non-deterministic is host tool related. |
tar isn't deterministic by default, do they still differ when generated with |
Quite right, thanks for the link! Our sdist tarballs are generated by setuptools, which (so far as I could tell in the relatively brief time I spent looking today) has no option to specify mtime. It's possible we just have to brute force this; unpack and re-pack the tarball with --mtime, use I intend to spend some more time on this in the next couple of weeks, unless someone gets to it first. |
Hello! Dropping in to say it'd be nice if these techniques could be made available to the broader Python community somehow. |
There have been pieces of work done in multiple places to support this:
I think the right next step for having a reproducible sdist for tuf is to try and get the above changes accepted into setuptools. |
in #1161 @sechkova has discovered flit
|
Updating the state since I started looking where we are with this. Wheel build does seem reproducible and I think this is how we want to do it: # Use latest commit date as epoch
SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct) python3 -m build The source tarball issue still persists. I have a home grown diff solution though: import hashlib
import sys
import tarfile
chunk_size = 100000
tar = tarfile.open(sys.argv[1])
content_hash = hashlib.sha256()
for member in tar:
if not member.isfile():
continue
member_file = tar.extractfile(member)
data = member_file.read(chunk_size)
while data:
content_hash.update(data)
data = member_file.read(chunk_size)
print(content_hash.hexdigest()) that prints a hash for tarball file contents (disregarding owner, group, mtime, etc metadata). It does care about file order. This is slightly too simplified -- each file should get hashed on it's own, and filename should matter -- but that's very close to the process that would work. |
As recommended by the
That's neat. We could write/maintain a little tool to diff tarballs based on their content hashes. Would be nice to help fix this for other Python projects too, but not something we should block python-tuf activities on. |
The goal of this work IMO should be a tool
As a first step we could add running the tool to the release instructions. |
Potential solution: #1896 (comment) |
So reproducible builds:
|
It works for tarball and wheel repro. Or did you not mention wheels because setting
Cool stuff! Maybe at a later point we can generate in-toto metadata for builds and use in-toto also to verify them. I'll add a comment to #529.
Not sure if we need to pin the build dependency. I think it would be enough to keep a record of the build environment. But there doesn't seem to be a canonical way for python projects yet. |
Yes correct, I did not mean to imply 1896 didn't handle this.
Yeah that's the other option... I just think it might be easiest to "document" it in pyproject.toml :) This could have some consequences I'm not seeing at the moment though |
Sure, that makes sense to pin
No need IMO |
I think I'll close this: build is reproducible, and there is a script to check that. These are the main items here. Build environment maybe should be pinned, or at least documented in the build artefacts but I think we can handle that as future work after #1550... |
Description of issue or feature request:
We can give users of our release artefacts (tarballs and wheels) greater confidence in the integrity of the artefacts and our development processes if they can verify that the artefacts we produced correspond to the signed tagged source code for the release.
We can achieve this through implementing reproducible builds.
Current behavior:
Tarball (sdist) and wheels (bdist_wheel) generated for a release are not verifiably reproducible.
Expected behavior:
Tarball (sdist) and wheels (bdist_wheel) generated for a release are verifiably reproducible.
The text was updated successfully, but these errors were encountered: