-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Iterator
implementation for &mut impl Iterator + Sized
#111200
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
Previous / still open attempts to do the same: |
r? @the8472 |
Have you run the built-in microbenchmarks ( Meanwhile let's see how it affects compile-times. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 922cd41 with merge 87ffec05ceb5093beaab8a7340c10042c4ad211b... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (87ffec05ceb5093beaab8a7340c10042c4ad211b): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 653.436s -> 654.245s (0.12%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf.rlo numbers are looking ok (finally!) but can you run the microbenchmarks? ./x bench core --test-args "iter::"
should run some some that test the difference between normal and by-ref iterators. Though I'm not sure if they exercise fold
.
I did try to do this at one point but I remember running into problems because all of these methods are by-value. Are you sure you're actually forwarding anything and not just running on the original |
There are many very good improvements (up to x2.5!), but surprisingly there are a few regressions. + iter::bench_cycle_take_sum 2,840,000 (+/- 56430) 2,420,000 (+/- 29910) -420,000 -14.79% x 1.17
+ iter::bench_enumerate_chain_ref_sum 6,430,000 (+/- 151070) 4,560,000 (+/- 70980) -1,870,000 -29.08% x 1.41
+ iter::bench_enumerate_ref_sum 2,850,000 (+/- 30300) 2,220,000 (+/- 28230) -630,000 -22.11% x 1.28
+ iter::bench_filter_chain_ref_count 4,520,000 (+/- 63620) 3,630,000 (+/- 56520) -890,000 -19.69% x 1.25
+ iter::bench_filter_chain_ref_sum 5,940,000 (+/- 89680) 3,930,000 (+/- 62200) -2,010,000 -33.84% x 1.51
+ iter::bench_filter_chain_sum 4,400,000 (+/- 156650) 3,870,000 (+/- 57980) -530,000 -12.05% x 1.14
+ iter::bench_filter_map_chain_ref_sum 9,880,000 (+/- 137620) 4,080,000 (+/- 168620) -5,800,000 -58.70% x 2.42
+ iter::bench_filter_map_ref_sum 3,100,000 (+/- 56680) 1,340,000 (+/- 23390) -1,760,000 -56.77% x 2.31
+ iter::bench_filter_ref_count 2,570,000 (+/- 49930) 1,310,000 (+/- 20390) -1,260,000 -49.03% x 1.96
+ iter::bench_filter_ref_sum 2,940,000 (+/- 32760) 1,390,000 (+/- 17400) -1,550,000 -52.72% x 2.12
+ iter::bench_filter_sum 1,470,000 (+/- 19750) 1,290,000 (+/- 171010) -180,000 -12.24% x 1.14
+ iter::bench_flat_map_chain_sum 17,180,000 (+/- 212360) 16,080,000 (+/- 157240) -1,100,000 -6.40% x 1.07
+ iter::bench_flat_map_ref_sum 1,870,000 (+/- 16210) 1,310,000 (+/- 18730) -560,000 -29.95% x 1.43
+ iter::bench_for_each_chain_ref_fold 4,010,000 (+/- 32240) 3,630,000 (+/- 37100) -380,000 -9.48% x 1.10
+ iter::bench_fuse_chain_ref_sum 4,400,000 (+/- 60220) 3,160,000 (+/- 41270) -1,240,000 -28.18% x 1.39
+ iter::bench_fuse_ref_sum 2,570,000 (+/- 67820) 1,300,000 (+/- 22790) -1,270,000 -49.42% x 1.98
+ iter::bench_inspect_chain_ref_sum 4,130,000 (+/- 38070) 3,150,000 (+/- 84390) -980,000 -23.73% x 1.31
+ iter::bench_inspect_chain_sum 3,840,000 (+/- 114580) 3,160,000 (+/- 44510) -680,000 -17.71% x 1.22
+ iter::bench_inspect_ref_sum 1,800,000 (+/- 32770) 1,300,000 (+/- 22700) -500,000 -27.78% x 1.38
+ iter::bench_lt 388,540 (+/- 8280) 290,920 (+/- 12680) -97,620 -25.12% x 1.34
+ iter::bench_partial_cmp 388,560 (+/- 20470) 290,940 (+/- 12430) -97,620 -25.12% x 1.34
+ iter::bench_peekable_chain_ref_sum 5,950,000 (+/- 108320) 3,880,000 (+/- 77500) -2,070,000 -34.79% x 1.53
+ iter::bench_peekable_ref_sum 3,340,000 (+/- 72060) 1,300,000 (+/- 9720) -2,040,000 -61.08% x 2.57
+ iter::bench_skip_chain_ref_sum 5,450,000 (+/- 69460) 3,150,000 (+/- 41290) -2,300,000 -42.20% x 1.73
+ iter::bench_skip_chain_sum 3,370,000 (+/- 40890) 3,140,000 (+/- 41420) -230,000 -6.82% x 1.07
+ iter::bench_skip_cycle_skip_zip_add_ref_sum 8,960,000 (+/- 194840) 7,930,000 (+/- 100570) -1,030,000 -11.50% x 1.13
+ iter::bench_skip_ref_sum 2,330,000 (+/- 28930) 1,290,000 (+/- 16200) -1,040,000 -44.64% x 1.81
+ iter::bench_skip_while_chain_ref_sum 8,070,000 (+/- 116360) 3,160,000 (+/- 20140) -4,910,000 -60.84% x 2.55
+ iter::bench_skip_while_chain_sum 3,580,000 (+/- 75270) 3,370,000 (+/- 101820) -210,000 -5.87% x 1.06
+ iter::bench_skip_while_ref_sum 2,850,000 (+/- 49380) 1,210,000 (+/- 101950) -1,640,000 -57.54% x 2.36
+ iter::bench_take_while_chain_ref_sum 2,580,000 (+/- 26740) 2,220,000 (+/- 33890) -360,000 -13.95% x 1.16
+ iter::bench_take_while_chain_sum 2,210,000 (+/- 31890) 1,930,000 (+/- 50100) -280,000 -12.67% x 1.15
- iter::bench_cycle_skip_take_ref_sum 2,330,000 (+/- 36760) 2,780,000 (+/- 49670) 450,000 19.31% x 0.84
- iter::bench_cycle_take_ref_sum 2,080,000 (+/- 23900) 2,440,000 (+/- 49420) 360,000 17.31% x 0.85
- iter::bench_cycle_take_skip_ref_sum 2,100,000 (+/- 36590) 2,510,000 (+/- 36290) 410,000 19.52% x 0.84
- iter::bench_filter_chain_count 3,780,000 (+/- 62680) 4,100,000 (+/- 68200) 320,000 8.47% x 0.92
- iter::bench_filter_map_chain_sum 3,410,000 (+/- 46900) 3,980,000 (+/- 166240) 570,000 16.72% x 0.86
- iter::bench_for_each_chain_loop 4,020,000 (+/- 73930) 4,670,000 (+/- 45530) 650,000 16.17% x 0.86
- iter::bench_fuse_chain_sum 3,370,000 (+/- 78770) 3,770,000 (+/- 122160) 400,000 11.87% x 0.89
- iter::bench_skip_cycle_skip_zip_add_sum 7,520,000 (+/- 156080) 7,850,000 (+/- 163310) 330,000 4.39% x 0.96
- iter::bench_zip_add 2,400 (+/- 79) 2,880 (+/- 113) 480 20.00% x 0.83 Using + iter::bench_cycle_take_skip_ref_sum 763,040 (+/- 21810) 723,230 (+/- 31150) -39,810 -5.22% x 1.06
+ iter::bench_enumerate_chain_ref_sum 2,490,000 (+/- 41610) 1,140,000 (+/- 17630) -1,350,000 -54.22% x 2.18
+ iter::bench_enumerate_ref_sum 712,920 (+/- 23130) 514,340 (+/- 20750) -198,580 -27.85% x 1.39
+ iter::bench_filter_chain_ref_count 2,660,000 (+/- 47500) 1,920,000 (+/- 28330) -740,000 -27.82% x 1.39
+ iter::bench_filter_map_chain_ref_sum 2,720,000 (+/- 34250) 1,550,000 (+/- 20900) -1,170,000 -43.01% x 1.75
+ iter::bench_filter_map_ref_sum 1,170,000 (+/- 18460) 777,150 (+/- 34470) -392,850 -33.58% x 1.51
+ iter::bench_flat_map_chain_ref_sum 2,070,000 (+/- 29680) 531,940 (+/- 10720) -1,538,060 -74.30% x 3.89
+ iter::bench_flat_map_sum 522,250 (+/- 5730) 404,000 (+/- 9120) -118,250 -22.64% x 1.29
+ iter::bench_for_each_chain_ref_fold 2,910,000 (+/- 42700) 891,080 (+/- 14120) -2,018,920 -69.38% x 3.27
+ iter::bench_fuse_chain_ref_sum 2,930,000 (+/- 32240) 741,010 (+/- 12080) -2,188,990 -74.71% x 3.95
+ iter::bench_fuse_chain_sum 885,260 (+/- 21620) 738,290 (+/- 21190) -146,970 -16.60% x 1.20
+ iter::bench_inspect_chain_ref_sum 2,930,000 (+/- 48850) 741,270 (+/- 9640) -2,188,730 -74.70% x 3.95
+ iter::bench_inspect_chain_sum 885,350 (+/- 21560) 741,150 (+/- 28330) -144,200 -16.29% x 1.19
+ iter::bench_inspect_sum 520,120 (+/- 29050) 370,820 (+/- 13980) -149,300 -28.70% x 1.40
+ iter::bench_peekable_chain_ref_sum 2,930,000 (+/- 39760) 737,880 (+/- 33780) -2,192,120 -74.82% x 3.97
+ iter::bench_peekable_chain_sum 891,020 (+/- 21660) 741,220 (+/- 26120) -149,800 -16.81% x 1.20
+ iter::bench_skip_chain_ref_sum 2,930,000 (+/- 40540) 884,990 (+/- 13830) -2,045,010 -69.80% x 3.31
+ iter::bench_skip_cycle_skip_zip_add_ref_sum 5,120,000 (+/- 197580) 4,770,000 (+/- 90050) -350,000 -6.84% x 1.07
+ iter::bench_skip_while_chain_ref_sum 2,640,000 (+/- 213540) 741,080 (+/- 12520) -1,898,920 -71.93% x 3.56
+ iter::bench_skip_while_ref_sum 776,700 (+/- 14200) 372,090 (+/- 6420) -404,610 -52.09% x 2.09
+ iter::bench_take_while_chain_ref_sum 605,040 (+/- 8910) 409,450 (+/- 16610) -195,590 -32.33% x 1.48
+ iter::bench_zip_then_skip 49 (+/- 1) 46 (+/- 1) -3 -6.12% x 1.07
- iter::bench_filter_chain_ref_sum 1,500,000 (+/- 46420) 1,730,000 (+/- 21530) 230,000 15.33% x 0.87
- iter::bench_filter_chain_sum 1,380,000 (+/- 28850) 1,720,000 (+/- 20850) 340,000 24.64% x 0.80
- iter::bench_filter_ref_sum 755,820 (+/- 45890) 862,870 (+/- 33230) 107,050 14.16% x 0.88
- iter::bench_for_each_chain_fold 737,070 (+/- 15980) 890,820 (+/- 13460) 153,750 20.86% x 0.83
- iter::bench_skip_ref_sum 370,420 (+/- 34000) 519,570 (+/- 34210) 149,150 40.27% x 0.71
- iter::bench_skip_sum 373,080 (+/- 5430) 519,630 (+/- 26050) 146,550 39.28% x 0.72 Some results are weird, though, because they shouldn't be affected |
Actually |
While that is the case, I do recall having performance regressions (#106463) when attempting to rewrite |
Actually there were regression in compiler performance, probably because of the additional work, but it is likely that there were runtime execution improvements (probably not that much actually because |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 6a830cf3d2a38c07a161f51bec4d3fccb06af620 with merge c4945223b4ece7576d3c0c99f6111dcd3490ab97... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c4945223b4ece7576d3c0c99f6111dcd3490ab97): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 647.661s -> 649.087s (0.22%) |
hrm, can we try again without those inlines? |
6a830cf
to
922cd41
Compare
I have removed the last commit that added With that @bors r+ rollup=never |
@@ -4018,4 +4018,66 @@ impl<I: Iterator + ?Sized> Iterator for &mut I { | |||
fn nth(&mut self, n: usize) -> Option<Self::Item> { | |||
(**self).nth(n) | |||
} | |||
fn fold<B, F>(self, init: B, f: F) -> B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this let us remove https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/core/iter/struct.ByRefSized.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly we're getting closer, but the #[inline]
s still differ so that would affect performance and the tradeoffs aren't straight-forward.
☀️ Test successful - checks-actions |
Finished benchmarking commit (eb088b8): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 649.206s -> 652.053s (0.44%) |
This adds a specialization trait to forward
fold
,try_fold
,... to the inner iterator where possible