Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal iteration for &mut I #100173

Closed
wants to merge 2 commits into from

Conversation

sarah-quinones
Copy link

this pr implements internal iteration for &mut I when I: Sized. it additionally inlines some wrapper functions that were not previously inline, which seems to speed things up by a fair amount in some cases.

this lead to up to 3x performance gains across the board for iter:: benches, with only a minor regression for iter::bench_filter_sum

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 5, 2022
@rustbot

This comment was marked as resolved.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 5, 2022
@scottmcm
Copy link
Member

scottmcm commented Aug 5, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 5, 2022
@bors
Copy link
Contributor

bors commented Aug 5, 2022

⌛ Trying commit cb7f7ee with merge 3e685715a7ece536b2ab653e3433c06c00454bdf...

Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tried several times before, the last one being #82185
Perhaps the changes on function.rs make a difference this time.


impl<'a, I: DoubleEndedIterator + Sized> ByRefRFold for &'a mut I {
#[inline]
default fn try_rfold<B, F, R>(&mut self, init: B, f: F) -> R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more specific impl shouldn't have default

@bors
Copy link
Contributor

bors commented Aug 5, 2022

☀️ Try build successful - checks-actions
Build commit: 3e685715a7ece536b2ab653e3433c06c00454bdf (3e685715a7ece536b2ab653e3433c06c00454bdf)

@rust-timer
Copy link
Collaborator

Queued 3e685715a7ece536b2ab653e3433c06c00454bdf with parent d77da9d, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3e685715a7ece536b2ab653e3433c06c00454bdf): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
0.8% 25.3% 66
Regressions 😿
(secondary)
0.6% 1.9% 32
Improvements 🎉
(primary)
-0.5% -1.5% 14
Improvements 🎉
(secondary)
-0.6% -1.4% 22
All 😿🎉 (primary) 0.6% 25.3% 80

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.0% 4.0% 1
Improvements 🎉
(primary)
-2.3% -2.3% 1
Improvements 🎉
(secondary)
-2.5% -2.5% 1
All 😿🎉 (primary) -2.3% -2.3% 1

Cycles

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
14.7% 37.5% 3
Regressions 😿
(secondary)
2.8% 3.0% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 14.7% 37.5% 3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 5, 2022
@scottmcm
Copy link
Member

scottmcm commented Aug 5, 2022

Wow, this totally changes the pattern of LTO in clap:
image

@sarah-quinones
Copy link
Author

well, these don't look like the most promising results ^^'

@compiler-errors
Copy link
Member

Presumably this makes compilation slower, but the perf tests don't show the effect on the performance of the compiled code, right?

@sarah-quinones
Copy link
Author

are there tests that do?

@the8472
Copy link
Member

the8472 commented Aug 6, 2022

Other than the std benches (which aren't great) we don't have anything automated to assess runtime performance. In the rustc-perf suite check and doc builds are the closest since they don't codegen but they're probably not diverse enough.

You could try paring down the PR by splitting out some of the changes. E.g. some of the inlining in function.rs doesn't look relevant to iterators. You can also run rustc-perf locally and focus on that one benchmark, that should yield results more quickly (assuming you have a machine that can compile a stage1 rustc in a reasonable amount of time).

@sarah-quinones
Copy link
Author

rustc-perf seems to take forever on my machine and i can't display the results after it's finished. so that doesn't seem like a good option for me :/

@the8472
Copy link
Member

the8472 commented Aug 6, 2022

It can be set to run a subset of the benchmarks, e.g. the serde ones. https://github.com/rust-lang/rustc-perf/tree/master/collector#benchmarking-options
Running the site locally should work as long as it uses the same DB as generated by the collector.

@sarah-quinones
Copy link
Author

thanks for the tips! i managed to get it working thanks to your help. it seems that the biggest culprit was inlining the ops::function wrappers.
but even without it i still get a 1-2% regression on deeply-nested-multi

@scottmcm
Copy link
Member

I'm going to send this over to

r? @m-ou-se

because I think this is going to be as much a policy decision (about compile-vs-runtime) as it is about the code itself.

@rust-highfive rust-highfive assigned m-ou-se and unassigned scottmcm Aug 11, 2022
@the8472
Copy link
Member

the8472 commented Aug 11, 2022

Some changes were reverted, let's get new perf results.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 11, 2022
@bors
Copy link
Contributor

bors commented Aug 11, 2022

⌛ Trying commit f6a3462 with merge c20ee6d211784a78b94e26d37cce4e66acea976a...

@bors
Copy link
Contributor

bors commented Aug 11, 2022

☀️ Try build successful - checks-actions
Build commit: c20ee6d211784a78b94e26d37cce4e66acea976a (c20ee6d211784a78b94e26d37cce4e66acea976a)

@rust-timer
Copy link
Collaborator

Queued c20ee6d211784a78b94e26d37cce4e66acea976a with parent aeb5067, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c20ee6d211784a78b94e26d37cce4e66acea976a): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: ❌ relevant regressions found
mean1 max count2
Regressions ❌
(primary)
0.2% 0.3% 14
Regressions ❌
(secondary)
0.7% 2.0% 16
Improvements ✅
(primary)
-0.4% -0.7% 7
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% -0.7% 21

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% 4.5% 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.4% -4.2% 3
All ❌✅ (primary) - - 0

Cycles

Results
  • Primary benchmarks: ✅ relevant improvement found
  • Secondary benchmarks: ✅ relevant improvement found
mean1 max count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.3% -2.3% 1
Improvements ✅
(secondary)
-4.1% -4.1% 1
All ❌✅ (primary) -2.3% -2.3% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 11, 2022
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Dec 30, 2022

r? @the8472

@rustbot rustbot assigned the8472 and unassigned m-ou-se Dec 30, 2022
@the8472
Copy link
Member

the8472 commented Dec 30, 2022

The compile-time perf numbers are slightly negative, but less so than the previous attempt to do this.

But we need some runtime benchmark numbers to verify that it brings the expected benefits. There are some core::iter benchmarks that I'd expect to show some speedup.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 30, 2022
@Dylan-DPC
Copy link
Member

@sarah-ek any updates on this?

@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC closed this May 16, 2023
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.