Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit-for-bit deterministic / reproducible builds #34902

Closed
infinity0 opened this issue Jul 18, 2016 · 76 comments
Closed

Bit-for-bit deterministic / reproducible builds #34902

infinity0 opened this issue Jul 18, 2016 · 76 comments
Labels
A-reproducibility Area: Reproducible / deterministic builds C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@infinity0
Copy link
Contributor

It would be good if rustc could generate bit-for-bit reproducible results, even in the presence of minor system environment differences. Currently we have quite a large diff: e.g. see txt diff for 1.9.0 or perhaps txt diff for 1.10.0 a few days after I'm posting this. (You might want to "save link as" instead of displaying it directly in the browser.)

Much of the diff output is due to build-id differences, which can be ignored since they are caused by other deeper issues and will go away once these deeper issues are fixed. One example of a deeper issue is this:

│   │   │   ├── ./usr/lib/i386-linux-gnu/librustc_data_structures-e8edd0fd.so
│   │   │   │   ├── readelf --wide --file-header {}
│   │   │   │   │ @@ -6,15 +6,15 @@
│   │   │   │   │    OS/ABI:                            UNIX - System V
│   │   │   │   │    ABI Version:                       0
│   │   │   │   │    Type:                              DYN (Shared object file)
│   │   │   │   │    Machine:                           Intel 80386
│   │   │   │   │    Version:                           0x1
│   │   │   │   │    Entry point address:               0x4620
│   │   │   │   │    Start of program headers:          52 (bytes into file)
│   │   │   │   │ -  Start of section headers:          338784 (bytes into file)
│   │   │   │   │ +  Start of section headers:          338780 (bytes into file)
│   │   │   │   │    Flags:                             0x0
│   │   │   │   │    Size of this header:               52 (bytes)
│   │   │   │   │    Size of program headers:           32 (bytes)
│   │   │   │   │    Number of program headers:         8
│   │   │   │   │    Size of section headers:           40 (bytes)
│   │   │   │   │    Number of section headers:         30
│   │   │   │   │    Section header string table index: 29
│   │   │   │   ├── readelf --wide --program-header {}

Here are the system variations that might be causing these issues. I myself am not that familiar with ELF, but perhaps someone else here would know why the section header is 4 bytes later in the first vs second builds.

@steveklabnik
Copy link
Member

In general, being reproducible is something we're interested in; we try to tackle it bug by bug.

@eefriedman
Copy link
Contributor

eefriedman commented Jul 18, 2016

This probably has some obvious cause:

│   │   │   │ -<h4 id='method.unzip' class='method'><code>fn <a href='../../core/iter/trait.Iterator.html#method.unzip' class='fnname'>unzip</a>&lt;A, B, FromA, FromB&gt;(self) -&gt; (FromA, FromB) <span class='where'>where FromB: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a>&lt;B&gt;, FromA: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a>&lt;A&gt;, Self: <a class='trait' href='../../core/iter/trait.Iterator.html' title='core::iter::Iterator'>Iterator</a>&lt;Item=(A, B)&gt;</span></code></h4>
│   │   │   │ +<h4 id='method.unzip' class='method'><code>fn <a href='../../core/iter/trait.Iterator.html#method.unzip' class='fnname'>unzip</a>&lt;A, B, FromA, FromB&gt;(self) -&gt; (FromA, FromB) <span class='where'>where FromB: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a>&lt;B&gt;, Self: <a class='trait' href='../../core/iter/trait.Iterator.html' title='core::iter::Iterator'>Iterator</a>&lt;Item=(A, B)&gt;, FromA: <a class='trait' href='../../core/default/trait.Default.html' title='core::default::Default'>Default</a> + <a class='trait' href='../../core/iter/trait.Extend.html' title='core::iter::Extend'>Extend</a>&lt;A&gt;</span></code></h4>

If someone wants to look at the code generation differences, it's probably best to start with libcore. The reproducible-builds.org diff isn't that useful for that because it doesn't recognize .rlib files as archives.

@infinity0
Copy link
Contributor Author

Thanks for the heads up, I just added AR support in diffoscope and hopefully the diff output will see that in the next few weeks, when the website updates.

@infinity0
Copy link
Contributor Author

Here's a diff of rust.metadata.bin from libcore.rlib, from 1.10.0. (2.0 MB, ~1700 lines, might make your browser slow.) Mostly ordering differences. @eddyb has suggested HashMap ordering as a culprit, and that would be replaced with FnvHashMap instead.

@eddyb
Copy link
Member

eddyb commented Aug 16, 2016

If it's a HashMap, I don't know which one. cc @rust-lang/compiler

@infinity0
Copy link
Contributor Author

I tried replacing HashMap with FnvHashMap in src/librustc_metadata/loader.rs but it didn't seem to help. Here is the Makefile i'm using to automate my rebuilds:

TRIPLET = x86_64-unknown-linux-gnu

all: libcore.diff

libcore.diff: rust.metadata.bin.1 rust.metadata.bin.2
    diffoscope --html-dir "$@" rust.metadata.bin.1 rust.metadata.bin.2 \
        --max-diff-block-lines 5000 \
        --max-diff-input-lines 10000000 \
        --max-report-size 204800000; true # diffoscope returns 1 for "there were diffs"

rust.metadata.bin.1 rust.metadata.bin.2: Makefile
    rm -f $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/libcore-*.rlib
    rm -f $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/stamp.core
    $(MAKE) $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/stamp.core
    ar x $(TRIPLET)/stage2/lib/rustlib/$(TRIPLET)/lib/libcore-*.rlib rust.metadata.bin
    mv rust.metadata.bin "$@"

.PHONY: all rust.metadata.bin.1 rust.metadata.bin.2

perhaps i'm Doing It Wrong.

@infinity0
Copy link
Contributor Author

As background, HashMap in rust is non-deterministic to protect against certain types of DoS attack. You can switch it to the deterministic FnvHashMap if you're sure your code will always be called in a safe manner. This ought to be true for rustc itself. (I notice some online "try-it-yourself" rust web services let me run "ls /" and other shell commands, so I could also exploit this HashMap ordering issue, but I also assume that they're clever enough to set a ulimit and/or containerise the thing.)

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Aug 22, 2016

I wrote a small script to diff metadata (I couldn't really get the makefile to work).

It looks like even more is changing between two compiles using the current master (or nightly): The crate hash and/or disambiguator, which are stored right after the target triple. @infinity0 can you confirm?

EDIT: That might get fixed by #35854

@jonas-schievink
Copy link
Contributor

Okay, apparently replacing Hash{Map,Set} in resolve with FnvHash{Map,Set} got rid of the first (smaller) blocks of the diff. The big block at the bottom is still there. Will continue to randomly sed the compiler and report back ;)

@jonas-schievink
Copy link
Contributor

The remaining issue is that (some?) predicates are encoded in non-deterministic order. Maybe we need to do this for all bounds? Not sure what would be the correct way to do that (or why these are nondeterministic in the first place).

@michaelwoerister
Copy link
Member

Just for reference, there's this: #34805

@jonas-schievink
Copy link
Contributor

@michaelwoerister Thanks! Too tired to think about it currently. Will have a look tomorrow :)

@sanxiyn
Copy link
Member

sanxiyn commented Aug 23, 2016

#24473 is rustdoc subset of this.

eddyb added a commit to eddyb/rust that referenced this issue Aug 23, 2016
Use `FnvHashMap` in more places

* A step towards rust-lang#34902 (see my comments there)
* More stable error messages in some places related to crate loading
* Possible slight performance improvements since all `HashMap`s
  replaced had small keys where `FnvHashMap` should be faster
  (although I didn't measure)

(likewise for `HashSet` -> `FnvHashSet`)
bors added a commit that referenced this issue Aug 28, 2016
Steps towards reproducible builds

cc #34902

Running `make dist` twice will result in a rustc tarball where only `librustc_back.so`, `librustc_llvm.so` and `librustc_trans.so` differ. Building `libstd` and `libcore` twice with the same compiler and flags produces identical artifacts.

The third commit should close #24473
bors added a commit that referenced this issue Aug 28, 2016
Steps towards reproducible builds

cc #34902

Running `make dist` twice will result in a rustc tarball where only `librustc_back.so`, `librustc_llvm.so` and `librustc_trans.so` differ. Building `libstd` and `libcore` twice with the same compiler and flags produces identical artifacts.

The third commit should close #24473
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Sep 2, 2016

Potentially relevant, from IRC:

<eddyb> nmatsakis, infinity0: fwiw, I just found another hash table. not random, but may cause issues around incremental recompilation. see rustc_metadata::encoder::encode_reachable (the reachable NodeSet)

though I think that it's not a big issue unless the hashtable is truly random.

@eddyb
Copy link
Member

eddyb commented Sep 2, 2016

Fix suggestion is changing that to be a BTreeSet<DefIndex>. Other things could also use btrees.

@infinity0
Copy link
Contributor Author

In case this helps to motivate anyone, we got a successful reproduction on i386 on Debian testing!

https://tests.reproducible-builds.org/debian/rb-pkg/testing/i386/rustc.html

This may or may not be "an accident", we'll have to see what future tests show. Also on this arch/platform we are fixing the build path (just to see how it does). For other arch/platforms we vary the build path, and haven't seen rustc reproduce there yet.

@michaelwoerister
Copy link
Member

At least when building with debuginfo, the build path will show up in the resulting binaries as the DW_AT_comp_dir attribute of the DW_TAG_compile_unit.

@codyps
Copy link
Contributor

codyps commented Dec 12, 2016

@michaelwoerister

In gcc (and clang), the -fdebug-prefix-map argument allows one to avoid this. Is a similar option for rustc available?

@michaelwoerister
Copy link
Member

@jmesmon No, we don't have an option like that. If you open an issue, I'd be happy to discuss it with the @rust-lang/tools team.

@tarcieri
Copy link
Contributor

tarcieri commented Dec 6, 2019

@jgalenson awesome! I'm working on a project for BFT consensus-driven proofs of reproducible builds, and that'd make an amazing test case: https://github.com/iqlusioninc/synchronicity

@jgalenson
Copy link

I was able to get reproducible builds of rustc with debuginfo by upgrading to a newer version of the compiler_builtins crate that contains rust-lang/compiler-builtins@ca423fe (so at least 0.1.20 should work, although I used 0.1.22). With that, I no longer needed my last remaining patch.

I also had to use a newer C++ compiler that supports the -ffile-prefix-map argument. gcc-8 does, but unfortunately using it did not work (it seems to embed source paths into files generated from assembly). However, since https://reviews.llvm.org/D49466 was merged, Clang supports that option now, so using a random nightly build of LLVM from a few days ago worked.

That's a convoluted setup, but @infinity0, can you reproduce it? Your diff seemed to contain some other differences mine did not, so some things might remain.

@tarcieri that seems pretty cool. Have you gotten it working yet?

@infinity0
Copy link
Contributor Author

infinity0 commented Dec 29, 2019

@jgalenson I'm already including that compiler-builtins -ffile-prefix-map patch in Debian's rustc and it doesn't work, though we're using GCC not LLVM. What exactly is this issue with "it seems to embed source paths into files generated from assembly" you are talking about? I don't remember seeing that last time I was working on GCC reproducibility a few years ago.

tests.r-b.org results for rustc 1.39 are out the results are pretty similar to mine above - unreproducible including build paths relating to the syn crate.

@infinity0
Copy link
Contributor Author

I should mention that the differences include more that just C++ compiler output, which is what you seem to be suggesting is the last remaining thing, from your previous comment. My own testing and tests.r-b.org's testing indicates otherwise. At least the syn crate is embedding build paths.

@jgalenson
Copy link

I investigated the gcc bug I mentioned. If I take a trivial .c file and compile it with -g and -ffile-prefix-map, I don't see source paths in the output. But if I use the same command line for a trivial assembly file, I do see source paths in the output. Those go away if I change -ffile-prefix-map to -fdebug-prefix-map. I see this with gcc 8.3 but not with recent LLVM builds. Can you reproduce that?

This comes up because compiler-builtins includes a few .S files. This miscompilation then causes a number of other crates to be non-reproducible, including rustc_{driver,llvm} and cargo. Those all go away one I use Clang instead of gcc. I don't, however, see a difference in syn.

The tests.r-b.org link you posted no longer works for me because it seems to fail to build. But if you can send me the config.toml you're using, I'll see if I can reproduce it myself. In addition, you can try using my config and adding in debuginfo-level{,-std,-tools} = 2 and using a recent Clang.

@comex
Copy link
Contributor

comex commented Jan 22, 2020

Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93371 for the GCC bug. As a short-term workaround, compiler-builtins could pass both -ffile-prefix-map and -fdebug-prefix-map...

@infinity0
Copy link
Contributor Author

I didn't get around to looking into the above issues in more detail, but I just noticed that tests.r-b.org for rustc has been showing "reproducible" for 1.44.1 including under varying build paths (unstable, experimental)!

1.45.0 is in-progress and we should have results in just a few more days.

@sanxiyn
Copy link
Member

sanxiyn commented Aug 6, 2020

We should use individual issues tagged A-reproducibility going forward. Closing.

@sanxiyn sanxiyn closed this as completed Aug 6, 2020
@infinity0
Copy link
Contributor Author

OK, well results for 1.45.0 say "unreproducible" again, and the auto-generated diff did not complete after 2 hours. Shall I open an issue for CI integration tests on this? If I understand correctly there are concerns about it taking up too much resources, or something.

@sanxiyn
Copy link
Member

sanxiyn commented Aug 10, 2020

Yes, I think we should open a new issue for reproducible build CI. I also think it would be impractical for a while due to computational demand, but then I am not a member of infra team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-reproducibility Area: Reproducible / deterministic builds C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests