-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid masks when possible in AVX2 logic #104
Conversation
Initially I expected this be wrong because you would be writing incorrect data to the array with the unmasked stores. But then I realized they get overwritten by the right numbers in subsequent stores. You don't need the masked store at all any point of time (even for the last register).
Never mind, that was surprisingly horrible!
|
src/avx2-emu-funcs.hpp
Outdated
|
||
typename vtype::reg_t temp = vtype::permutevar(reg, perm); | ||
|
||
vtype::mask_storeu(leftStore, left, temp); | ||
vtype::mask_storeu(rightStore, _mm256_xor_si256(oxff, left), temp); | ||
if constexpr (masked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the masked store? I think we can get away with just the regular store. Or am I wrong?
58ab67d
to
23b6d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @sterrettm2!
Perf improvements to AVX2 sorting: see intel/x86-simd-sort#104
Perf improvements to AVX2 sorting: see intel/x86-simd-sort#104
This avoids using masked stores when unmasked stores can be used. This gives a reasonable performance improvement for AVX2 quicksort.