-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/view to simd #1190
Feature/view to simd #1190
Conversation
44d82c9
to
66b1684
Compare
Codecov Report
@@ Coverage Diff @@
## master #1190 +/- ##
==========================================
- Coverage 96.86% 96.79% -0.08%
==========================================
Files 212 213 +1
Lines 8274 8403 +129
==========================================
+ Hits 8015 8134 +119
- Misses 259 269 +10
Continue to review full report at Codecov.
|
66b1684
to
c6de8be
Compare
@marehr ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review until now
include/seqan3/core/simd/concept.hpp
Outdated
@@ -29,36 +29,36 @@ namespace seqan3::detail | |||
// error: invalid use of incomplete type ‘struct incomplete::template_type<int>’ | |||
// requires std::Same<decltype(a - b), simd_t>; | |||
template <typename simd_t> | |||
SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b) | |||
SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a
and b
should be without std::remove_reference_t<>
because the operations (like a == b, a != b) should be still valid for reference types. And from a user perspective, he should be able to expect that an expression a == b
is working for the given type.
SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b) | |
SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b) |
The rest of the std::remove_reference_t<>
in this concept are okay.
On a side note what happens with const simds? Should we introduce a Writable/MutableSimd concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and honestly, I think we need to distinguish between them. Also for many operations we seldomly update a vector but always get a new one returned. That's at least my feeling how we use it in the algorithms.
* \see seqan3::detail::is_native_builtin_simd_v | ||
*/ | ||
template <typename builtin_simd_t> | ||
constexpr bool is_native_builtin_simd_v = is_native_builtin_simd<builtin_simd_t>::value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not evaluate it here as a lambda function? that would be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also just define the bool constants if we don't need it as a type anyway. Otherwise it follows the STL way of providing unary type traits.
return this_view->padding_value; | ||
} | ||
else | ||
{ // only increment if not at end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ // only increment if not at end. | |
{ // only increment if not at end. |
// Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix. | ||
// This quadratic byte matrix can be transposed efficiently with simd instructions. | ||
constexpr int8_t max_size = simd_traits<max_simd_t>::length; | ||
constexpr int8_t num_chunks = max_size / chunk_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This you called chunks_per_load
constexpr int8_t num_chunks = max_size / chunk_size; | |
constexpr int8_t num_chunks = chunks_per_load; |
// To fill the 16x16 matrix we need four 8x8 matrices. | ||
// Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix. | ||
// This quadratic byte matrix can be transposed efficiently with simd instructions. | ||
constexpr int8_t max_size = simd_traits<max_simd_t>::length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr int8_t max_size = simd_traits<max_simd_t>::length; | |
constexpr int8_t max_size = simd_traits<simd_t>::max_length; |
a1eee15
to
0328b07
Compare
@marehr ok, I think I addressed all your issues so far. Ready for the next ones 😏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
puh I think the high-level design seems fine, but under the hood it is pretty messy
{ | ||
detail::transpose_matrix_sse4(matrix); | ||
} | ||
else // Element wise transpose matrix which is possibly auto vectorised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine you didn't test that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested everyhing! For SSE4, AVX2 and AVX512 and no extension at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that it auto vectorises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I tested with the auto vectorisation and in fact the intrinsics version was roughly 20% faster. So I decided to add it, but kept the auto vectorisation for larger instruction sets available for now.
template <Simd target_simd_t, Simd source_simd_t> | ||
constexpr target_simd_t upcast_signed(source_simd_t const & src) | ||
{ | ||
if constexpr (simd_traits<source_simd_t>::max_length == 16) // SSE4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for now, but I'm not really a fan of the current design. It does not check wether current architecture really supports sse4, avx2 and avx512.
It will ungracefully fail if you create a simd vector that has avx512 size, but the architecture does not include avx512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I hope you don't mind, that I really don't care about corner cases right now. We don't even have a proper testing system for this right now. Not sure, if you plan to add these sometime soon, but it was already quite a bit of work to test everything properly manually. In general the whole design can/should be adapted to the SIMD proposal but this is not yet relevant. We can make it safe once we have the algorithms.
debug_stream << "\n\n"; | ||
} | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no output? it would be helpful to provide output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean a file containing the output? Or a comment with the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either would be fine
{ | ||
auto & it = cached_iter[i]; | ||
max_simd_type & tmp = matrix[pos]; | ||
tmp = simd::fill<max_simd_type>(~0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no fill it here with the padding value? and omit the ~0 semantic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the padding value is based on the scalar type of the target vector size which might be bigger than one byte.
0328b07
to
e1ef5f5
Compare
@marehr I either added all your requests or answered your comments. |
Thank you I have a (second) look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llstm
e1ef5f5
to
86a55af
Compare
@marehr I know you already agreed upon everything, but I applied 99% of your suggestions. Maybe you want to still have a look? |
Implements the to_simd view which does AoS to SoA transformation: