-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor query system to maintain a global job id counter #93741
Conversation
r? @oli-obk (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue Expecting this to potentially cause some noise just because this code is quite hot, but likely to be largely neutral. |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit d4a783abdfc9cad833f87532e231523117194764 with merge 5ebf14f5a1b120efc229c7b3b2e2571d82dc7496... |
☀️ Try build successful - checks-actions |
Queued 5ebf14f5a1b120efc229c7b3b2e2571d82dc7496 with parent c5e4148, future comparison URL. |
Finished benchmarking commit (5ebf14f5a1b120efc229c7b3b2e2571d82dc7496): comparison url. Summary: This benchmark run shows 3 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Similar to #93724 (comment), the regression seems real but is likely optimizer noise we can't really do much about. |
For the moment, the parallel compiler efforts have pretty much stalled. |
📌 Commit d4a783abdfc9cad833f87532e231523117194764 has been approved by |
@bors r- I was thinking about this and realized we should keep the checked_add to ensure uniqueness, so will just amend this and r=you with that most likely. |
This replaces the per-shard counters with a single global counter, simplifying the JobId struct down to just a u64 and removing the need to pipe a DepKind generic through a bunch of code. The performance implications on non-parallel compilers are likely minimal (this switches to `Cell<u64>` as the backing storage over a `u64`, but the latter was already inside a `RefCell` so it's not really a significance divergence). On parallel compilers, the cost of a single global u64 counter may be more significant: it adds a serialization point in theory. On the other hand, we can imagine changing the counter to have a thread-local component if it becomes worrisome or some similar structure. The new design is sufficiently simpler that it warrants the potential for slight changes down the line if/when we get parallel compilation to be more of a default. A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid possibly overflowing it and causing problems; it is effectively impossible that we would overflow a u64 counter in this context.
d4a783a
to
e240783
Compare
@bors r=cjgillot Replaced the u32 with a u64 -- even simpler, and avoids complicated increments to the atomic in a parallel context. |
📌 Commit e240783 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (e7aca89): comparison url. Summary: This benchmark run did not return any relevant results. 28 results were found to be statistically significant but too small to be relevant. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
This replaces the per-shard counters with a single global counter, simplifying
the JobId struct down to just a u64 and removing the need to pipe a DepKind
generic through a bunch of code. The performance implications on non-parallel
compilers are likely minimal (this switches to
Cell<u64>
as the backingstorage over a
u64
, but the latter was already inside aRefCell
so it's notreally a significance divergence). On parallel compilers, the cost of a single
global u64 counter may be more significant: it adds a serialization point in
theory. On the other hand, we can imagine changing the counter to have a
thread-local component if it becomes worrisome or some similar structure.
The new design is sufficiently simpler that it warrants the potential for slight
changes down the line if/when we get parallel compilation to be more of a
default.
A u64 counter, instead of u32 (the old per-shard width), is chosen to avoid
possibly overflowing it and causing problems; it is effectively impossible that
we would overflow a u64 counter in this context.