-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate GVN a little #126991
Accelerate GVN a little #126991
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Accelerate GVN a little This PR addresses a few inefficiencies I've seen in callgrind profiles. Commits are independent. Only the first commit introduces a change in behaviour: we stop substituting some constant pointers. But we keep propagating their contents that have no provenance, so we don't lose much. r? `@saethlin`
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (dfaffed): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -0.6%, secondary 2.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 2.4%, secondary 2.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 695.46s -> 692.615s (-0.41%) |
@@ -264,21 +265,29 @@ struct VnState<'body, 'tcx> { | |||
impl<'body, 'tcx> VnState<'body, 'tcx> { | |||
fn new( | |||
tcx: TyCtxt<'tcx>, | |||
body: &Body<'tcx>, | |||
param_env: ty::ParamEnv<'tcx>, | |||
ssa: &'body SsaLocals, | |||
dominators: &'body Dominators<BasicBlock>, | |||
local_decls: &'body LocalDecls<'tcx>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you could get this from the passed-in body
now, rather than needing the parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they have different lifetimes. The caller will want to mutate part of the body, so we can only keep a reference to the local decls.
Hm. I think the behavior change here is more significant to compile time than any of the optimizations. Is that expected? I ran this to build the standard library and dump MIR with each of the toolchains in the perf run
Then did a
Almost all the net deletion is due to the MIR dumps having less const allocations. That's surprising to me. I can tell from the commits that this is caused by normalizing less... but why does this even matter? If these consts can just be deleted from the MIR, shouldn't they have been deduplicated in the first place by some other mechanism? |
+ _3 = const {ALLOC0<imm>: &[u32; 3]}; | ||
+ _2 = const {ALLOC0<imm>: &[u32; 3]} as &[u32] (PointerCoercion(Unsize)); | ||
+ _3 = _9; | ||
+ _2 = _9 as &[u32] (PointerCoercion(Unsize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saethlin this is this kind of pattern.
The constant is still there, named main::promoted[0]
. It's "evaluated" form const {ALLOC0<imm>: &[u32; 3]}
does not appear in the MIR dump, so the dump does not print ALLOC0
. This does not mean that ALLOC0
was not evaluated, just that it's not put in MIR and not printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you saying there consts that get their allocations dumped, and some consts that don't? That seems like an undesirable property of MIR dumps, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, Ideally promoteds would just be regular statics, but lazily created in the MIR opt pipeline. This is blocked on various works around DefId
creation, most importantly #115613. I guess we could add a special case for showing promoteds like we show other allocations for now, but that seems orthogonal to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statics can't be generic so IMO they should be regular consts.
About perf:
I'm very much tempted to leave this as-is. |
@bors r+ |
Accelerate GVN a little This PR addresses a few inefficiencies I've seen in callgrind profiles. Commits are independent. Only the first commit introduces a change in behaviour: we stop substituting some constant pointers. But we keep propagating their contents that have no provenance, so we don't lose much. r? `@saethlin`
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (28a58f2): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 4.1%, secondary 1.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary -0.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 755.782s -> 757.595s (0.24%) |
Nice binary size reductions :) |
More improvements than regressions, and also nice binary size regressions, as Nick said. @rustbot label: +perf-regression-triaged |
This PR addresses a few inefficiencies I've seen in callgrind profiles.
Commits are independent.
Only the first commit introduces a change in behaviour: we stop substituting some constant pointers. But we keep propagating their contents that have no provenance, so we don't lose much.
r? @saethlin