Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/view to simd #1190

Merged
merged 6 commits into from
Aug 21, 2019
Merged

Feature/view to simd #1190

merged 6 commits into from
Aug 21, 2019

Conversation

rrahn
Copy link
Contributor

@rrahn rrahn commented Jul 15, 2019

Implements the to_simd view which does AoS to SoA transformation:

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
// Naive implementation
to_simd_naive<std::vector<dna4>, simd_type_t<int8_t>>        7223 ns         7196 ns        86953 value=65.3887M
to_simd_naive<std::vector<dna4>, simd_type_t<int16_t>>       1938 ns         1934 ns       371708 value=279.524M
to_simd_naive<std::vector<dna4>, simd_type_t<int32_t>>       1078 ns         1074 ns       646675 value=486.3M
to_simd_naive<std::vector<dna4>, simd_type_t<int64_t>>        673 ns          671 ns       948484 value=713.26M
to_simd_naive<std::deque<dna4>, simd_type_t<int8_t>>        24870 ns        24838 ns        28445 value=21.3906M
to_simd_naive<std::deque<dna4>, simd_type_t<int16_t>>        7647 ns         7594 ns        90960 value=68.4019M
to_simd_naive<std::deque<dna4>, simd_type_t<int32_t>>        2087 ns         2083 ns       332964 value=250.389M
to_simd_naive<std::deque<dna4>, simd_type_t<int64_t>>        1140 ns         1138 ns       619266 value=465.688M

// View implementation
to_simd<std::vector<dna4>, simd_type_t<int8_t>>              2326 ns         2320 ns       305367 value=229.636M
to_simd<std::vector<dna4>, simd_type_t<int16_t>>             1462 ns         1459 ns       479160 value=360.328M
to_simd<std::vector<dna4>, simd_type_t<int32_t>>              638 ns          637 ns      1075946 value=809.111M
to_simd<std::vector<dna4>, simd_type_t<int64_t>>              348 ns          347 ns      2019136 value=1.51839G
to_simd<std::deque<dna4>, simd_type_t<int8_t>>               9592 ns         9568 ns        72257 value=54.3373M
to_simd<std::deque<dna4>, simd_type_t<int16_t>>              5360 ns         5351 ns       128215 value=96.4177M
to_simd<std::deque<dna4>, simd_type_t<int32_t>>              1573 ns         1568 ns       459113 value=345.253M
to_simd<std::deque<dna4>, simd_type_t<int64_t>>               786 ns          784 ns       774791 value=582.643M
to_simd<std::list<dna4>, simd_type_t<int8_t>>               13080 ns        13053 ns        53440 value=40.1869M
to_simd<std::list<dna4>, simd_type_t<int16_t>>               4550 ns         4539 ns       152027 value=114.324M
to_simd<std::list<dna4>, simd_type_t<int32_t>>               1053 ns         1049 ns       668098 value=502.41M
to_simd<std::list<dna4>, simd_type_t<int64_t>>                847 ns          845 ns       934093 value=702.438M

@rrahn rrahn requested a review from marehr July 15, 2019 17:39
@rrahn rrahn force-pushed the feature/view_to_simd branch 2 times, most recently from 44d82c9 to 66b1684 Compare July 15, 2019 22:29
@codecov
Copy link

codecov bot commented Jul 15, 2019

Codecov Report

Merging #1190 into master will decrease coverage by 0.07%.
The diff coverage is 91.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1190      +/-   ##
==========================================
- Coverage   96.86%   96.79%   -0.08%     
==========================================
  Files         212      213       +1     
  Lines        8274     8403     +129     
==========================================
+ Hits         8015     8134     +119     
- Misses        259      269      +10
Impacted Files Coverage Δ
include/seqan3/core/simd/simd_algorithm.hpp 100% <100%> (ø) ⬆️
include/seqan3/core/simd/view_to_simd.hpp 90.47% <90.47%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a0354a...86a55af. Read the comment docs.

@rrahn rrahn force-pushed the feature/view_to_simd branch from 66b1684 to c6de8be Compare July 16, 2019 15:33
@rrahn
Copy link
Contributor Author

rrahn commented Jul 18, 2019

@marehr ping

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review until now

@@ -29,36 +29,36 @@ namespace seqan3::detail
// error: invalid use of incomplete type ‘struct incomplete::template_type<int>’
// requires std::Same<decltype(a - b), simd_t>;
template <typename simd_t>
SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b)
SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a and b should be without std::remove_reference_t<> because the operations (like a == b, a != b) should be still valid for reference types. And from a user perspective, he should be able to expect that an expression a == b is working for the given type.

Suggested change
SEQAN3_CONCEPT Simd = requires (std::remove_reference_t<simd_t> a, std::remove_reference_t<simd_t> b)
SEQAN3_CONCEPT Simd = requires (simd_t a, simd_t b)

The rest of the std::remove_reference_t<> in this concept are okay.

On a side note what happens with const simds? Should we introduce a Writable/MutableSimd concept?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and honestly, I think we need to distinguish between them. Also for many operations we seldomly update a vector but always get a new one returned. That's at least my feeling how we use it in the algorithms.

include/seqan3/core/simd/concept.hpp Show resolved Hide resolved
* \see seqan3::detail::is_native_builtin_simd_v
*/
template <typename builtin_simd_t>
constexpr bool is_native_builtin_simd_v = is_native_builtin_simd<builtin_simd_t>::value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not evaluate it here as a lambda function? that would be more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also just define the bool constants if we don't need it as a type anyway. Otherwise it follows the STL way of providing unary type traits.

include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
return this_view->padding_value;
}
else
{ // only increment if not at end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ // only increment if not at end.
{ // only increment if not at end.

// Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix.
// This quadratic byte matrix can be transposed efficiently with simd instructions.
constexpr int8_t max_size = simd_traits<max_simd_t>::length;
constexpr int8_t num_chunks = max_size / chunk_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This you called chunks_per_load

Suggested change
constexpr int8_t num_chunks = max_size / chunk_size;
constexpr int8_t num_chunks = chunks_per_load;

// To fill the 16x16 matrix we need four 8x8 matrices.
// Thus, for the 8 sequences we need to load two times 16 consecutive bytes to fill the matrix.
// This quadratic byte matrix can be transposed efficiently with simd instructions.
constexpr int8_t max_size = simd_traits<max_simd_t>::length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr int8_t max_size = simd_traits<max_simd_t>::length;
constexpr int8_t max_size = simd_traits<simd_t>::max_length;

@rrahn rrahn force-pushed the feature/view_to_simd branch 3 times, most recently from a1eee15 to 0328b07 Compare August 8, 2019 11:57
@rrahn
Copy link
Contributor Author

rrahn commented Aug 8, 2019

@marehr ok, I think I addressed all your issues so far. Ready for the next ones 😏

@rrahn rrahn mentioned this pull request Aug 8, 2019
@rrahn rrahn requested a review from marehr August 8, 2019 14:15
Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

puh I think the high-level design seems fine, but under the hood it is pretty messy

include/seqan3/core/simd/simd_algorithm.hpp Outdated Show resolved Hide resolved
{
detail::transpose_matrix_sse4(matrix);
}
else // Element wise transpose matrix which is possibly auto vectorised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine you didn't test that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested everyhing! For SSE4, AVX2 and AVX512 and no extension at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that it auto vectorises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I tested with the auto vectorisation and in fact the intrinsics version was roughly 20% faster. So I decided to add it, but kept the auto vectorisation for larger instruction sets available for now.

template <Simd target_simd_t, Simd source_simd_t>
constexpr target_simd_t upcast_signed(source_simd_t const & src)
{
if constexpr (simd_traits<source_simd_t>::max_length == 16) // SSE4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for now, but I'm not really a fan of the current design. It does not check wether current architecture really supports sse4, avx2 and avx512.

It will ungracefully fail if you create a simd vector that has avx512 size, but the architecture does not include avx512.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I hope you don't mind, that I really don't care about corner cases right now. We don't even have a proper testing system for this right now. Not sure, if you plan to add these sometime soon, but it was already quite a bit of work to test everything properly manually. In general the whole design can/should be adapted to the SIMD proposal but this is not yet relevant. We can make it safe once we have the algorithms.

include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
debug_stream << "\n\n";
}
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no output? it would be helpful to provide output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a file containing the output? Or a comment with the output?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either would be fine

include/seqan3/core/simd/view_to_simd.hpp Show resolved Hide resolved
include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
include/seqan3/core/simd/view_to_simd.hpp Show resolved Hide resolved
include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
{
auto & it = cached_iter[i];
max_simd_type & tmp = matrix[pos];
tmp = simd::fill<max_simd_type>(~0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no fill it here with the padding value? and omit the ~0 semantic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the padding value is based on the scalar type of the target vector size which might be bigger than one byte.

@rrahn rrahn force-pushed the feature/view_to_simd branch from 0328b07 to e1ef5f5 Compare August 14, 2019 12:31
@rrahn rrahn requested a review from marehr August 14, 2019 12:31
@rrahn
Copy link
Contributor Author

rrahn commented Aug 14, 2019

@marehr I either added all your requests or answered your comments.

@marehr
Copy link
Member

marehr commented Aug 14, 2019

Thank you I have a (second) look :)

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llstm

@rrahn rrahn force-pushed the feature/view_to_simd branch from e1ef5f5 to 86a55af Compare August 20, 2019 12:50
@rrahn
Copy link
Contributor Author

rrahn commented Aug 20, 2019

@marehr I know you already agreed upon everything, but I applied 99% of your suggestions. Maybe you want to still have a look?

@rrahn rrahn merged commit 6f15cc5 into seqan:master Aug 21, 2019
@rrahn rrahn deleted the feature/view_to_simd branch September 3, 2019 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants