Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<bitset>: optimize streaming operator >> #5008

Merged
merged 19 commits into from
Oct 21, 2024

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Oct 12, 2024

📝 What it does

  • Instead of calling push_back in a loop for temporary string, use temporary buffer with integer index
    • Have it on stack if small enough
    • Lazy allocate to have exceptions handled as they were before
    • Index is also reused as _Changed, there's no need for explicit variable for that
    • We also don't need to check for string::max_size() then
    • Custom RAII for that buffer if it is large to avoid dragging in <memory>
  • Also restore not calling Traits::length in operator <<. The previous optimization avoided allocation, but introduced implicit length, which is good in total effect, but could have been better

More conservative alternative

string::_Resize_and_overwrite could have been used.
It is slightly worse due to not avoiding the allocation for mid-sized bitset.
The benchmark results are listed too.

🔍 Where's coverage

Not a vectorization PR, so no need to stress. Existing coverage ought to be enough:

> python tests\utils\stl-lit\stl-lit.py ..\..\llvm-project\libcxx\test ..\..\tests\std --filter=bitset -v
-- Testing: 207 of 53614 tests, 12 workers --

⏱️ How much of improvement

Benchmark main resize_and_overwrite custom buffer (current)
bitset_from_stream<15, char> 22858 ns 24288 ns 20708 ns
bitset_from_stream<16, char> 26394 ns 26224 ns 19523 ns
bitset_from_stream<36, char> 17921 ns 14053 ns 10970 ns
bitset_from_stream<64, char> 14559 ns 9911 ns 8045 ns
bitset_from_stream<512, char> 8799 ns 5241 ns 10640 ns
bitset_from_stream<2048, char> 9396 ns 4801 ns 4619 ns
bitset_from_stream<15, wchar_t> 24327 ns 26221 ns 21511 ns
bitset_from_stream<16, wchar_t> 27978 ns 25502 ns 20432 ns
bitset_from_stream<36, wchar_t> 18754 ns 14578 ns 11391 ns
bitset_from_stream<64, wchar_t> 15853 ns 9798 ns 8462 ns
bitset_from_stream<512, wchar_t> 9609 ns 5244 ns 5014 ns
bitset_from_stream<2048, wchar_t> 8348 ns 5109 ns 4334 ns

🔜 What's up next

operator>> non-performance

The loop structure can be improved.
It can avoid unnecessary _Istr.rdbuf()->snextc() on the final counter increment.

I'm calling that back< I realized that it is an intention.

operator>> performance

Sigh... locales and streams.
Part of it is is in CRT which means it cannot be fixed.

So overall performance cannot reach the performance of from string construction.

image

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner October 12, 2024 07:10
@StephanTLavavej StephanTLavavej added the performance Must go faster label Oct 12, 2024
@StephanTLavavej StephanTLavavej self-assigned this Oct 12, 2024
benchmarks/src/bitset_from_string.cpp Outdated Show resolved Hide resolved
benchmarks/src/bitset_from_string.cpp Show resolved Hide resolved
benchmarks/src/bitset_from_string.cpp Outdated Show resolved Hide resolved
stl/inc/bitset Outdated Show resolved Hide resolved
stl/inc/bitset Outdated Show resolved Hide resolved
stl/inc/bitset Outdated Show resolved Hide resolved
stl/inc/bitset Outdated Show resolved Hide resolved
stl/inc/bitset Outdated Show resolved Hide resolved
@StephanTLavavej

This comment was marked as outdated.

@StephanTLavavej StephanTLavavej removed their assignment Oct 17, 2024
@AlexGuteniev

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev
Copy link
Contributor Author

Applied changes, verified that benchmark results are about the same (although there is seemingly random variation)

@StephanTLavavej StephanTLavavej self-assigned this Oct 18, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit aaca194 into microsoft:main Oct 21, 2024
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for optimizing this operator! 🚀 ⏱️ 🎉

@AlexGuteniev AlexGuteniev deleted the streambit branch October 22, 2024 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants