Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc produces indeterministic asm #57041

Closed
bmwiedemann opened this issue Dec 21, 2018 · 19 comments
Closed

rustc produces indeterministic asm #57041

bmwiedemann opened this issue Dec 21, 2018 · 19 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bmwiedemann
Copy link

bmwiedemann commented Dec 21, 2018

While working on reproducible builds for openSUSE, I found that
the svgcleaner and exa rust packages had variations in machine code.

For exa, I traced this down to variations from git2-rs that is embedded there.
From that I reduced the code as much as I could, so that now there are only 183 lines of rust left.

Steps to Reproduce:

git clone https://github.com/bmwiedemann/git2-rs
cd git2-rs
git checkout race
./build.sh
for i in $(seq 20) ; do ./build.sh ; done | sort | uniq -c

Actual Result:

  8 26e2151e7f3593339976a02365fb0608  lib.s
 12 ad32d7431b7c1edb363d1f3238a3d0f5  lib.s

Expected Result:
rustc should always produce the same asm output

--- lib.s	2018-12-21 15:18:39.671245106 +0100
+++ lib.s-20181221	2018-12-21 15:18:28.115357401 +0100
@@ -450,8 +450,9 @@
 	movaps	%xmm1, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF+16(%rax)
 	movl	$1, %ecx
 	movq	%rcx, %xmm1
-	cmpq	$0, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax)
+	movq	_ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax), %rcx
 	movdqa	%xmm1, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax)
+	testq	%rcx, %rcx
 	je	.LBB14_9
 	movq	%xmm0, %r14
 	testq	%r14, %r14

I have identified 6 places that can be dropped and make the output reproducible:

https://github.com/bmwiedemann/git2-rs/blob/race/src/panic.rs#L11
https://github.com/bmwiedemann/git2-rs/blob/race/src/lib.rs#L5
https://github.com/bmwiedemann/git2-rs/blob/race/src/lib.rs#L13 ++
https://github.com/bmwiedemann/git2-rs/blob/race/src/call.rs#L10 +4
https://github.com/bmwiedemann/git2-rs/blob/race/src/index.rs#L39 +8
https://github.com/bmwiedemann/git2-rs/blob/race/src/index.rs#L64 ++

This bug might be related to bug #50556, but that one is closed and this bug is still reproducible with rust-1.31

I checked that llvm-bc and llvm-ir output are reproducible even when asm is not.

@Centril Centril added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation labels Dec 21, 2018
@bmwiedemann
Copy link
Author

Spent some more time on this and did

for i in $(seq 20) ; do
  echo 10000 > /proc/sys/kernel/ns_last_pid
  taskset 1 setarch `arch` -R strace -s 4096 -o rustc$i.strace -f ./build.sh
done

This can create more reproducible results as long as the system is idle.
Switching firefox tabs created enough load for rustc to start varying results.
strace showed that the main rustc process spawns 4 threads, where thread 3 is short-lived and thread 4 is writing the lib.s

diffing strace outputs showed that the 'idle' result correlated with more, smaller mprotect calls and the 'busy' result correlated with fewer, larger mprotect calls.
There was also a difference in the 'brk' pattern in the main rustc process.

All this indicates that there is some raciness in the communication between the involved threads and this is where the non-determinism comes from.

rustc-strace-diff.txt

@jonas-schievink
Copy link
Contributor

Have you found any difference in the emitted LLVM IR?

@bmwiedemann
Copy link
Author

@jonas-schievink no, LLVM IR is always deterministic, as mentioned in the last line of the original report.

@bmwiedemann
Copy link
Author

Because of my very limited rust-foo I would appreciate help in further reducing the 183 lines of .rs files into an even smaller reproducer so that it becomes more visible what matters in there.

@jonas-schievink
Copy link
Contributor

@bmwiedemann In that case, can you reproduce this by running llc on the emitted IR? If so, this is an LLVM bug.

All this indicates that there is some raciness in the communication between the involved threads and this is where the non-determinism comes from.

Does it make a difference if you pass -Ccodegen-units=1 or -Ccodegen-units=2 to the compiler? It really shouldn't matter here though, since the difference seems to be in instruction selection.

(this does indeed look like the same problem as #50556)

@bmwiedemann
Copy link
Author

I changed my build.sh to have --emit=asm,llvm-ir,llvm-bc.
Then running llc on lib.ll or lib.bc` indeed produced varying lib.s files.

I also tried -Ccodegen-units=$N and 1 behaved pretty much like before but 2 produced 2 output files that did not seem to vary for some reason.

I'm on llvm-5,0,1 atm. llvm-6.0.1 is also bad, but llc from llvm-7.0.0 seems to be good.
So rebuilding our rust with libllvm7 should help solve this.
Or we find the relevant llvm commit and backport it to the stable branches.

@bmwiedemann
Copy link
Author

I now ran my autoreduce script on lib.ll and shrank it to 176 lines:
https://github.com/bmwiedemann/git2-rs/blob/race/lib.ll
dropping line 140, 129, 107, 104, 84, 82, 80, 78, 76, 74, 70, 68, 66, or 44 make the result reproducible.

@bmwiedemann
Copy link
Author

In llvm I ran a git bisect between release_60 and release_70 and this turned up this fixing commit:
llvm-mirror/llvm@8957ed6
unfortunately it is hard to cherry-pick, as it requires 4+ other commits

@memoryruins
Copy link
Contributor

cc #34902

@nikic
Copy link
Contributor

nikic commented Jan 18, 2019

Is there anything actionable to do here on the side of rustc? Per your investigation this issue has already been fixed in LLVM 7.0.

@sanxiyn
Copy link
Member

sanxiyn commented Jan 18, 2019

I think this can be closed, once someone checks the original reproduction with a nightly build.

@bmwiedemann
Copy link
Author

Found another patch that makes llc determistic with lib.ll from the full git-rs
llvm-mirror/llvm@b2c8f014
do you think, llvm guys could backport those to 6.x or is that a waste of time?

@nikic
Copy link
Contributor

nikic commented Feb 1, 2019

@bmwiedemann LLVM has short release cycles and LLVM 6 is long since dead. Not even LLVM 7 will receive further backports (there will only be one more release with a single change, that was not included in LLVM 7.0.1 because it is ABI breaking).

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 18, 2019
@jonas-schievink
Copy link
Contributor

We are now on LLVM 9, so this should be fixed since a long time. Not sure if it's effective to add a test for this though, so maybe we should just close?

@Mark-Simulacrum
Copy link
Member

I think it would be good to verify that this is indeed fixed, i.e., that we no longer encounter problems here. Once that's done, we can definitely close.

@mati865
Copy link
Contributor

mati865 commented Jul 19, 2019

I ran commands from issue description few times with changes to ignore all warnings and I think the output looks right:

    Finished release [optimized] target(s) in 25.77s
b9a7fc923e7a9fc167073bdc40d80072  lib.s
     20 b9a7fc923e7a9fc167073bdc40d80072  lib.s

Increasing number of runs from 20 to 2000 still yields deterministic assembly.

@bmwiedemann
Copy link
Author

bmwiedemann commented Jul 19, 2019

That example was minimized though to trigger one particular issue. I still experienced nondeterminism with some rust builds. E.g. in packages alacritty, bat, exa, rust, svgcleaner, tealdeer

I can debug these separately and file new issues.

@mati865
Copy link
Contributor

mati865 commented Jul 19, 2019

This specific issues seems to be resolved but there are other problems like #34902

Ideally each bug should have it's own issue.

@Mark-Simulacrum
Copy link
Member

Yep, closing this. Please file other issues if other nondeterminism arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants