Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::str: optimize is_utf8 & first_non_utf8_index. #12241

Closed
wants to merge 1 commit into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Feb 13, 2014

This makes is_utf8 up to 40% faster

Before:

is_utf8_100_ascii               ... bench:       123 ns/iter (+/- 9)
is_utf8_100_multibyte           ... bench:       134 ns/iter (+/- 7)

from_utf8_lossy_100_ascii       ... bench:       118 ns/iter (+/- 8)
from_utf8_lossy_100_invalid     ... bench:       378 ns/iter (+/- 12)
from_utf8_lossy_100_multibyte   ... bench:       135 ns/iter (+/- 5)
from_utf8_lossy_invalid         ... bench:       129 ns/iter (+/- 10)

After:

is_utf8_100_ascii               ... bench:        77 ns/iter (+/- 4)
is_utf8_100_multibyte           ... bench:        96 ns/iter (+/- 9)

from_utf8_lossy_100_ascii       ... bench:        91 ns/iter (+/- 14)
from_utf8_lossy_100_invalid     ... bench:       414 ns/iter (+/- 30)
from_utf8_lossy_100_multibyte   ... bench:       119 ns/iter (+/- 10)
from_utf8_lossy_invalid         ... bench:       108 ns/iter (+/- 11)

This makes is_utf8 up to 40% faster

Before:

    is_utf8_100_ascii               ... bench:       123 ns/iter (+/- 9)
    is_utf8_100_multibyte           ... bench:       134 ns/iter (+/- 7)

    from_utf8_lossy_100_ascii       ... bench:       118 ns/iter (+/- 8)
    from_utf8_lossy_100_invalid     ... bench:       378 ns/iter (+/- 12)
    from_utf8_lossy_100_multibyte   ... bench:       135 ns/iter (+/- 5)
    from_utf8_lossy_invalid         ... bench:       129 ns/iter (+/- 10)

After:

    is_utf8_100_ascii               ... bench:        77 ns/iter (+/- 4)
    is_utf8_100_multibyte           ... bench:        96 ns/iter (+/- 9)

    from_utf8_lossy_100_ascii       ... bench:        91 ns/iter (+/- 14)
    from_utf8_lossy_100_invalid     ... bench:       414 ns/iter (+/- 30)
    from_utf8_lossy_100_multibyte   ... bench:       119 ns/iter (+/- 10)
    from_utf8_lossy_invalid         ... bench:       108 ns/iter (+/- 11)
@huonw
Copy link
Member Author

huonw commented Feb 13, 2014

I don't have a good explanation why 100_invalid would be slower, especially since first_non_utf8_index is called only once in it, at the very start.

cc @kballard

@alexcrichton
Copy link
Member

I'm a little sad about all the new unsafe code as part of this commit, is there no way to limit the scope of the unsafe code or remove it entirely?

@huonw
Copy link
Member Author

huonw commented Feb 13, 2014

Well, it's having to manipulate pointers directly, so I find it unlikely that it can be anywhere near as efficient any other way. (e.g. manually stepping an iterator puts bounds checks on every access, like the old code.)

@alexcrichton
Copy link
Member

I find it somewhat difficult to believe that it is impossible for safe code to reach this same level of performance.

I'm not saying that this should have it all removed, but we're getting into the habit of reaching for unsafe far too frequently that it's diminishing the usefulness of it. I feel like a good rule of thumb is that every unsafe block should be documented why it needs to be unsafe and why it can't be done with unsafe code (unless it's already apparent). Could you add some comments explaining why unsafe is used and why it's used in such a large scope?

@lilyball
Copy link
Contributor

@huonw from_utf8_lossy_100_invalid is likely doing a bunch of reallocations. Each invalid byte is replaced by 3 bytes in the output. It's also writing 3 bytes a time instead of writing the entire contiguous good chunk, as happens with valid utf8.

@lilyball
Copy link
Contributor

I'm willing to r=me this, but I will defer to @alexcrichton's caution.

@alexcrichton
Copy link
Member

Can this be implemented with a byte iterator? That would eliminate all bounds checks and you'd only have the unwrap-related checks when dealing with wider characters. In theory we could extend the byte iterator to have a take2 method if we want to go even further reducing checks, but if bounds checks were the major issue it seems that a byte iterator should fix that?

@huonw
Copy link
Member Author

huonw commented Feb 13, 2014

@kballard both the new and old version are doing the same reallocations, so that doesn't explain why the timing is different between them.

@alexcrichton the byte iterator is doing a bound check on every access (to see if it's at the end or not), and it's not being used in a tight loop, so I find it unlikely that LLVM will be able to streamline it well; I'll have a go though.

@lilyball
Copy link
Contributor

@huonw Oh I see what you're saying. Perhaps first_non_utf8_index has more overhead in the case where it fails at index 0 than the old implementation did? I'm curious what the benchmark change is for a test where the first byte is invalid but the rest of the string is valid.

@ghost
Copy link

ghost commented Feb 13, 2014

@alexcrichton: Generally, iterator bounds checks are not optimized away at any level including LTO. It seems to me like LLVM does not have enough information.

@huonw
Copy link
Member Author

huonw commented Feb 15, 2014

I investigated using iterators, results:

test is_utf8_iter::is_utf8_100_ascii            ... bench:       144 ns/iter (+/- 10)
test is_utf8_iter::is_utf8_100_multibyte        ... bench:       120 ns/iter (+/- 12)
test is_utf8_iter::random                       ... bench:       494 ns/iter (+/- 33)
test is_utf8_iter::random_invalid               ... bench:       276 ns/iter (+/- 443)

test is_utf8_iter_inline::is_utf8_100_ascii     ... bench:       116 ns/iter (+/- 1)
test is_utf8_iter_inline::is_utf8_100_multibyte ... bench:       110 ns/iter (+/- 3)
test is_utf8_iter_inline::random                ... bench:       493 ns/iter (+/- 34)
test is_utf8_iter_inline::random_invalid        ... bench:       249 ns/iter (+/- 400)

test is_utf8_ptr::is_utf8_100_ascii             ... bench:        88 ns/iter (+/- 5)
test is_utf8_ptr::is_utf8_100_multibyte         ... bench:        97 ns/iter (+/- 4)
test is_utf8_ptr::random                        ... bench:       369 ns/iter (+/- 24)
test is_utf8_ptr::random_invalid                ... bench:       239 ns/iter (+/- 342)

test is_utf8_std::is_utf8_100_ascii             ... bench:       122 ns/iter (+/- 4)
test is_utf8_std::is_utf8_100_multibyte         ... bench:       134 ns/iter (+/- 9)
test is_utf8_std::random                        ... bench:       512 ns/iter (+/- 43)
test is_utf8_std::random_invalid                ... bench:       264 ns/iter (+/- 430)

std is the current implementation, ptr is this PR, iter and iter_inline are two iterator variants; source.

So the iterator versions is slightly faster than the current implementation (and with no unsafe, unlike the either the one in std or this PR's pointer one), but still significantly slower than the pointer version here. I think the the pure-ASCII case is hampered by #11751.

@lilyball
Copy link
Contributor

Given how core a functionality is_utf8 is I'm in favor of going with the unsafe (but fastest) ptr approach.

@huonw
Copy link
Member Author

huonw commented Feb 15, 2014

If we do decide to go with this pointer approach, I'll write a proof of safety as comments in the source, as a substitute for compiler verification.

@alexcrichton
Copy link
Member

Thank you for the thorough investigation, it's quite useful!

I'm not quite ready to give up this battle though, I was curious why the pure-ascii case with iterators was so much slower than with pointers, so I did some investigation of my own. With this code I got these timings:

$ rustc --opt-level=3 gistfile1.rs --test -Zlto && ./gistfile1 --bench is_utf8_100_ascii 

running 2 tests
test is_utf8_iter_inline::is_utf8_100_ascii ... bench:        95 ns/iter (+/- 2)
test is_utf8_ptr::is_utf8_100_ascii         ... bench:        63 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

These results lead me to believe that the vector iterator isn't quite as good as it could be, and it's leading us to unsafe code in this implementation. With this in mind, I continue to be hesitant to merging this. I believe that the speedup from using unsafe code is at least partly a result of this vector iterator performance issue.

@alexcrichton
Copy link
Member

@huonw pointed me at #11751 on IRC, which is in turn caused by #9546, which is the cause of the slowdown that I was seeing.

I am tempted to postpone the unsafe change until #9546 is fixed, I would be more than happy to accept the iterator changes, however.

@huonw
Copy link
Member Author

huonw commented Feb 16, 2014

Iterators in #12314.

@huonw huonw closed this Feb 16, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022
…t, r=jonas-schievink

fix: revert float parsing "fix" to avoid macro-related panics

Reverts rust-lang/rust-analyzer#12149 and the follow-up fixes, while keeping their tests.

rust-lang/rust-analyzer#12149 has caused many unexpected panics related to macros, and the fixes for those are not straightforward and further complicate the MBE token conversion logic, which was already fairly hard to follow before these fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants