-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSE2 & AVX2 std::find & std::count #2434
Conversation
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: S. B. Tam <[email protected]>
New vector search to resolve microsoft#2379 and test that it fixes microsoft#2431 find
tests/std/tests/GH_002431_byte_range_find_with_unreachable_sentinel/env.lst
Outdated
Show resolved
Hide resolved
tests/std/tests/GH_002431_byte_range_find_with_unreachable_sentinel/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/GH_002431_byte_range_find_with_unreachable_sentinel/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/GH_002431_byte_range_find_with_unreachable_sentinel/test.cpp
Show resolved
Hide resolved
tests/std/tests/GH_002431_byte_range_find_with_unreachable_sentinel/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/Dev11_0316853_find_memchr_optimization/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/Dev11_0316853_find_memchr_optimization/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/Dev11_0316853_find_memchr_optimization/test.cpp
Outdated
Show resolved
Hide resolved
tests/std/tests/Dev11_0316853_find_memchr_optimization/test.cpp
Outdated
Show resolved
Hide resolved
This is amazing, thanks! 😻 I think this will be ready to merge after resolving the issues I found - please let me know if you'd like me to push changes for them (none were in the core vectorized machinery - the number was just large enough that I didn't feel comfortable pushing changes without asking first). |
Yes please go ahead! |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
To fix an internal build break (where the new vector algorithms wouldn't compile for ARM64EC), I've completely disabled the vector algorithms for ARM64EC. This includes the existing attempt to enable the SSE2 parts of We can explore re-enabling the vector algorithms (and extending them to |
Thanks for this major performance improvement and bugfix! 🚀 🐞 😻 |
Disabling these algorithms for ARM64EC will be a major perf degradation for x64 binaries running on ARM64 devices 😢 that dynamically link or that statically link and directly target ARM64EC. |
I'll happily approve re-enabling them once we have confidence that we're not damaging |
Ported from microsoft#2434 description, dropping unnecessary parts
📝 Summary
<xutility>
: optimize_Find_unchecked
for data sizes other than 1 #2379.New vector search, controlled by
_USE_STD_VECTOR_ALGORITHMS
. Used also instead ofmemchr
optimization for 1-byte element.unreachable_sentinel
#2431.When
_USE_STD_VECTOR_ALGORITHMS
is off, keep usingmemchr
optimization, but only for finite range.unreachable_sentinel
#2431.For both enabled and disabled
_USE_STD_VECTOR_ALGORITHMS
<xutility>
: vectorizestd::count
#2384.New vector count, very similar to vector find, also controlled by
_USE_STD_VECTOR_ALGORITHMS
🏁 Perf benchmark
Benchmark
Benchmark run and results
Results table
TL;DR: all cases have significant benefit. Decimal order of magnitude for some cases.
When AVX is disabled, the benefit becomes small for the following cases:
memchr
is also SSE2, and my version only slightly beats it, as they are very similarStill no regression found on tested cases.
⚖️ Size impact
The change adds more code, it also overrides (disables)
/Os
option forvector_algorithm.cpp
via pragma.DLLs and PDBs for them are not affected. Static libraries are affected.
The impact is negligible for static libs, small for import libs.
Table
✔️ Test coverage
I didn't try to make it working on ARM64EC. I'm not sure how to make it correctly.
_M_ARM64EC
usage<bit>
it looks like that ARM version ofpopcount
is used instead of x86._M_HYBRID
, which might have to do something with ARM64ECIt would be good to have Make ARM64EC built on GitHub #2310 addressed, so that these changes can be tested.