Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve codegen of String::retain method #96605

Merged
merged 1 commit into from
May 21, 2022

Conversation

Urgau
Copy link
Member

@Urgau Urgau commented May 1, 2022

This pull-request improve the codegen of the String::retain method.

Using unwrap_unchecked helps the optimizer to not generate a panicking path that will never be taken for valid UTF-8 like string.

Using encode_utf8 saves us from an expensive call to memcpy, as the optimizer is unable to realize that ch_len <= 4 and so can generate much better assembly code.

https://rust.godbolt.org/z/z73ohenfc

Using unwrap_unchecked helps the optimizer to not generate panicking
path, that will never be taken for valid UTF-8 like string.

Using encode_utf8 saves us a call to a memcpy, as the optimizer is
unable to realize that ch_len <= 4 and so can generate much better
assembly code.

https://rust.godbolt.org/z/z73ohenfc
@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 1, 2022
@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 1, 2022
@the8472
Copy link
Member

the8472 commented May 1, 2022

Wouldn't it be better to optimize for runs of retained/dropped chars so larger chunks could be memcopied? At least I assume real-world workloads rarely discard every second character.

@Urgau
Copy link
Member Author

Urgau commented May 1, 2022

Wouldn't it be better to optimize for runs of retained/dropped chars so larger chunks could be memcopied? At least I assume real-world workloads rarely discard every second character.

Maybe, but the cost of doing a memcpy is pretty high if N (number of bytes) is low and because the average length of words is: ~4.5 letters in engligh, ~4.84 letters in french and ~4.96 letters in spanish, I'm not sure that it would be good to do it unconditionally, at least not without a fast path for N < 32 (or 16 depending on the assembly generated).

http://www.cs.trincoll.edu/~crypto/resources/LetFreq.html

@thomcc
Copy link
Member

thomcc commented May 20, 2022

I'll take over review for this since it's been a while since the last update.

r? @thomcc

Have you taken any benchmarks?

@rust-highfive rust-highfive assigned thomcc and unassigned kennytm May 20, 2022
@Urgau
Copy link
Member Author

Urgau commented May 20, 2022

I'll take over review for this since it's been a while since the last update.

Thanks. If you feel up to it you can also take #94647 which hasn't had any activity at all. 😁

Have you taken any benchmarks?

Yes, I run them on armv7-unknown-linux-gnueabihf where despace is just .retain(|ch| ch != ' ') (and the number is the number of character in the input string):

running 8 tests
test despace_25        ... bench:          17 ns/iter (+/- 0)
test despace_250       ... bench:          17 ns/iter (+/- 0)
test despace_5         ... bench:          17 ns/iter (+/- 0)
test despace_emoji     ... bench:          18 ns/iter (+/- 0)
test despace_std_25    ... bench:         374 ns/iter (+/- 2)
test despace_std_250   ... bench:       2,829 ns/iter (+/- 24)
test despace_std_5     ... bench:          87 ns/iter (+/- 0)
test despace_std_emoji ... bench:         168 ns/iter (+/- 1)

@the8472
Copy link
Member

the8472 commented May 20, 2022

Previously it scaled with character counts and now it's flat, is that expected or do the benchmarks get optimized away entirely?

Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit surprising to me that this would make much of a perf difference, but it would also surprise me that it would hurt, there are benchmark numbers, and the logic that this avoids a memcpy call for something that is always <= 4 bytes all makes sense.

This also has the benefit of improving the safety documentation of these blocks, and actually reducing the amount of unsafe needed as well (sort of), so it seems like a positive change all around.

@thomcc
Copy link
Member

thomcc commented May 20, 2022

@bors r+

@bors
Copy link
Contributor

bors commented May 20, 2022

📌 Commit a98abe8 has been approved by thomcc

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2022
@thomcc
Copy link
Member

thomcc commented May 20, 2022

Thanks. If you feel up to it you can also take #94647 which hasn't had any activity at all. 😁

Sure. Nevermind, it's an API change, I didn't notice this.

Previously it scaled with character counts and now it's flat, is that expected or do the benchmarks get optimized away entirely?

This is a fair point. Hm. I actually think this is a reasonable change regardless, but it's worth seeing more accurate benchmark numbers.

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 20, 2022
@Urgau
Copy link
Member Author

Urgau commented May 20, 2022

Thanks. If you feel up to it you can also take #94647 which hasn't had any activity at all. grin

Sure. Nevermind, it's an API change, I didn't notice this.

Sorry.

Previously it scaled with character counts and now it's flat, is that expected or do the benchmarks get optimized away entirely?

This is a fair point. Hm. I actually think this is a reasonable change regardless, but it's worth seeing more accurate benchmark numbers.

Here is the same benchmark except that now the String always reallocated a part of the benchmark (my previous logic must have been wrong somewhat):

running 8 tests
test despace_25        ... bench:         732 ns/iter (+/- 5)
test despace_250       ... bench:       4,020 ns/iter (+/- 25)
test despace_5         ... bench:         306 ns/iter (+/- 5)
test despace_emoji     ... bench:         507 ns/iter (+/- 3)
test despace_std_25    ... bench:       1,546 ns/iter (+/- 35)
test despace_std_250   ... bench:      10,296 ns/iter (+/- 598)
test despace_std_5     ... bench:         432 ns/iter (+/- 4)
test despace_std_emoji ... bench:         783 ns/iter (+/- 9)

@thomcc
Copy link
Member

thomcc commented May 20, 2022

Sorry.

You're fine. The issue should have more traction now anyway.

Here is the same benchmark except that now the String always reallocated a part of the benchmark (my previous logic must have been wrong somewhat): ... <snip>

This looks reasonable to me, and since I was already inclined to accept the PR,

@bors r+

@bors
Copy link
Contributor

bors commented May 20, 2022

📌 Commit a98abe8 has been approved by thomcc

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 20, 2022
@bors
Copy link
Contributor

bors commented May 21, 2022

⌛ Testing commit a98abe8 with merge 4a86c79...

@bors
Copy link
Contributor

bors commented May 21, 2022

☀️ Test successful - checks-actions
Approved by: thomcc
Pushing 4a86c79 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 21, 2022
@bors bors merged commit 4a86c79 into rust-lang:master May 21, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4a86c79): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvement found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 0 0 1 0
mean2 N/A N/A N/A -1.1% N/A
max N/A N/A N/A -1.1% N/A

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 0 1 5 1
mean2 N/A N/A -3.0% -2.1% -3.0%
max N/A N/A -3.0% -3.0% -3.0%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. number of relevant changes 2

  2. the arithmetic mean of the percent change 2

@Urgau Urgau deleted the string-retain-codegen branch May 5, 2023 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants