Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTree: inline more functions supporting iteration #85217

Closed
wants to merge 1 commit into from

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented May 12, 2021

According to the library/alloc benchmarks, these #[inline] might boost performance. They didn't seem to a year or so ago.

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 12, 2021
@ssomers
Copy link
Contributor Author

ssomers commented May 12, 2021

Note that the ones on the deallocating_ functions don't appear to help (but mutable benchmarks are treacherous anyway) and that btree::map::clone_slim_100 just reverts the freak boost in #84904.

benchcmp a b --threshold 5
 name                                           a ns/iter  b ns/iter  diff ns/iter   diff %  speedup
 btree::map::clone_slim_100                     1,483      4,356             2,873  193.73%   x 0.34
 btree::map::find_seq_100                       9          10                    1   11.11%   x 0.90
 btree::map::first_and_last_0                   10         12                    2   20.00%   x 0.83
 btree::map::first_and_last_10k                 56         65                    9   16.07%   x 0.86
 btree::map::insert_rand_10_000                 38         44                    6   15.79%   x 0.86
 btree::map::iter_100                           4,292      4,523               231    5.38%   x 0.95
 btree::map::iter_10k                           5,646      6,001               355    6.29%   x 0.94
 btree::map::iteration_1000                     4,235      3,564              -671  -15.84%   x 1.19
 btree::map::iteration_20                       68         62                   -6   -8.82%   x 1.10
 btree::map::iteration_mut_1000                 4,304      4,013              -291   -6.76%   x 1.07
 btree::set::difference_random_100_vs_100       997        771                -226  -22.67%   x 1.29
 btree::set::difference_random_10k_vs_100       68,320     59,853           -8,467  -12.39%   x 1.14
 btree::set::difference_random_10k_vs_10k       218,385    191,578         -26,807  -12.28%   x 1.14
 btree::set::difference_staggered_100_vs_100    1,146      841                -305  -26.61%   x 1.36
 btree::set::difference_staggered_100_vs_10k    2,352      2,179              -173   -7.36%   x 1.08
 btree::set::difference_staggered_10k_vs_10k    113,332    82,685          -30,647  -27.04%   x 1.37
 btree::set::intersection_100_pos_vs_100_neg    16         17                    1    6.25%   x 0.94
 btree::set::intersection_10k_neg_vs_100_pos    17         18                    1    5.88%   x 0.94
 btree::set::intersection_10k_neg_vs_10k_pos    18         19                    1    5.56%   x 0.95
 btree::set::intersection_10k_pos_vs_10k_neg    18         19                    1    5.56%   x 0.95
 btree::set::intersection_random_100_vs_100     713        626                 -87  -12.20%   x 1.14
 btree::set::intersection_staggered_100_vs_100  661        617                 -44   -6.66%   x 1.07
 btree::set::intersection_staggered_10k_vs_10k  64,105     60,344           -3,761   -5.87%   x 1.06
 btree::set::is_subset_100_vs_100               533        599                  66   12.38%   x 0.89
 btree::set::is_subset_10k_vs_10k               62,611     59,448           -3,163   -5.05%   x 1.05

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

I am generally not inclined to merge this - #[inline] on these functions increases the chances of getting inlined, probably, but it doesn't change the ability (they're already generic) of LLVM to do so. I'm not usually inclined to mess with LLVM heuristics in std code. But, we'll see if there's any change on compiler benchmarks...

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2021
@bors
Copy link
Contributor

bors commented May 14, 2021

⌛ Trying commit 154e6d1 with merge 51fbd179ae85719260f05675a60575ad1e15790f...

@bors
Copy link
Contributor

bors commented May 14, 2021

☀️ Try build successful - checks-actions
Build commit: 51fbd179ae85719260f05675a60575ad1e15790f (51fbd179ae85719260f05675a60575ad1e15790f)

@rust-timer
Copy link
Collaborator

Queued 51fbd179ae85719260f05675a60575ad1e15790f with parent 1025db8, future comparison URL.

@ssomers
Copy link
Contributor Author

ssomers commented May 15, 2021

It sure seems these inlines are as scientific as needles in a voodoo doll - they keep you awake at night and nobody can explain how they work. I sometimes add them in a PR to "disguise" a reported performance backlash - on functions that are extracted from literally inlined code, like in #84904.

I don't think I ever tried removing some of them. When I remove all of them from btree code (most are on public, stable API members), there's only backlash (on my Windows x64 system, Intel CPU) compared to master.

benchcmp a c --threshold 5
 name                                         a ns/iter  c ns/iter  diff ns/iter  diff %  speedup
 btree::map::clone_fat_val_100_and_clear      10,761     11,459              698   6.49%   x 0.94
 btree::map::clone_fat_val_100_and_into_iter  10,151     11,173            1,022  10.07%   x 0.91
 btree::map::clone_slim_100_and_into_iter     4,269      4,790               521  12.20%   x 0.89
 btree::map::clone_slim_10k_and_into_iter     183,788    242,990          59,202  32.21%   x 0.76
 btree::map::first_and_last_0                 10         12                    2  20.00%   x 0.83
 btree::map::first_and_last_10k               56         67                   11  19.64%   x 0.84
 btree::map::insert_rand_10_000               38         42                    4  10.53%   x 0.90
 btree::map::iteration_mut_1000               4,304      4,576               272   6.32%   x 0.94
 btree::map::iteration_mut_100000             534,390    603,730          69,340  12.98%   x 0.89
 btree::map::iteration_mut_20                 68         77                    9  13.24%   x 0.88
 btree::set::clone_100_and_into_iter          1,413      1,933               520  36.80%   x 0.73
 btree::set::clone_10k                        159,812    173,932          14,120   8.84%   x 0.92
 btree::set::clone_10k_and_into_iter          168,626    227,017          58,391  34.63%   x 0.74
 btree::set::difference_random_10k_vs_100     68,320     73,951            5,631   8.24%   x 0.92
 btree::set::is_subset_100_vs_100             533        630                  97  18.20%   x 0.85
 btree::set::is_subset_100_vs_10k             1,185      1,258                73   6.16%   x 0.94

But compared to this PR, clone_slim_100 is happy again.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (51fbd179ae85719260f05675a60575ad1e15790f): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 15, 2021
@Mark-Simulacrum
Copy link
Member

Looks loosely like this PR is a regression.

@ssomers ssomers deleted the btree_inlining branch May 23, 2021 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants