`<xutility>`: vectorize `std::count` #2384

AlexGuteniev · 2021-12-09T17:36:56Z

Relates to #2379

For contiguous ranges, simple types (1,2,4,8 byte integers, maybe also 4,8 bytes float in fast mode) the following vector algorithm is possible (assuming SSE2 and 8-bit type, but applicable to other sizes/vector sizes):

Spread the value to a vector register (_mm_set1 intrinsics)
Obtain matched bitmask (_mm_cmpeq_epi8 intrinsic)
Get mask as bits (_mm_movemask_epi8) , add them up (_popcnt)
Accumulate this result.
Probably hand-coded popcount will be inefficient, in this case can apply starting SSE4.2, for which we assume popcnt available.

The text was updated successfully, but these errors were encountered:

AlexGuteniev · 2021-12-12T19:20:28Z

#include <algorithm>
#include <array>

char s[] = "the quick brown fox jumps over the lazy dog";

int foxes()
{
    return std::count(std::begin(s), std::end(s), 'o');
}

https://godbolt.org/z/6KeYT4YaG
clang uses my proposal in the generated code
gcc does something which I don't understand, but like less (edit: figured out, yes, it is stupid vectorization, though could beat unvectorized)
MSVC currently naively counts bytes, so library optimization is indeed helpful

CaseyCarter added the performance Must go faster label Dec 9, 2021

CaseyCarter changed the title ~~<xutulity>: optimize std::count~~ <xutulity>: vectorize std::count Dec 9, 2021

StephanTLavavej changed the title ~~<xutulity>: vectorize std::count~~ <xutility>: vectorize std::count Dec 11, 2021

AlexGuteniev mentioned this issue Dec 16, 2021

Random effect of Intel JCC Errata on micro optimizations #2405

Closed

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Dec 19, 2021

count microsoft#2384

e4a3902

AlexGuteniev mentioned this issue Dec 19, 2021

SSE2 & AVX2 std::find & std::count #2434

Merged

StephanTLavavej closed this as completed in #2434 Apr 4, 2022

StephanTLavavej added the fixed Something works now, yay! label Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<xutility>`: vectorize `std::count` #2384

`<xutility>`: vectorize `std::count` #2384

AlexGuteniev commented Dec 9, 2021 •

edited

Loading

AlexGuteniev commented Dec 12, 2021 •

edited

Loading

<xutility>: vectorize std::count #2384

<xutility>: vectorize std::count #2384

Comments

AlexGuteniev commented Dec 9, 2021 • edited Loading

AlexGuteniev commented Dec 12, 2021 • edited Loading

`<xutility>`: vectorize `std::count` #2384

`<xutility>`: vectorize `std::count` #2384

AlexGuteniev commented Dec 9, 2021 •

edited

Loading

AlexGuteniev commented Dec 12, 2021 •

edited

Loading