Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sse2/avx2 if available #7

Merged
merged 5 commits into from
Nov 7, 2023
Merged

Enable sse2/avx2 if available #7

merged 5 commits into from
Nov 7, 2023

Conversation

mulimoen
Copy link
Owner

@mulimoen mulimoen commented Nov 6, 2023

No description provided.

@mulimoen
Copy link
Owner Author

mulimoen commented Nov 6, 2023

@LDeakin Does this PR work for you in getting the intrinsics version of shuffles? This requires RUSTFLAGS=-C target-feature=+avx2,+sse2 and will not be set automatically as it would in CMake

@mulimoen mulimoen force-pushed the bugfix/enable_intrinsics branch from f1fce0a to 07cf938 Compare November 6, 2023 22:32
@mulimoen mulimoen force-pushed the bugfix/enable_intrinsics branch from 07cf938 to a15c3b2 Compare November 6, 2023 22:36
@LDeakin
Copy link
Contributor

LDeakin commented Nov 6, 2023

Yes, that does the trick! Although, I don't even need to set RUSTFLAGS=-C target-feature=+avx2,+sse2.

@LDeakin
Copy link
Contributor

LDeakin commented Nov 6, 2023

Update: with RUSTFLAGS="-C target-feature=+avx2,+sse2", I get a crash. This does not occur on the main branch, so this seems like an issue with blosc-c.

@mulimoen
Copy link
Owner Author

mulimoen commented Nov 6, 2023

Yeah, I see the crash when using avx2, I will fix this

@LDeakin
Copy link
Contributor

LDeakin commented Nov 6, 2023

You probably need these compile flags https://github.com/Blosc/c-blosc/blob/bdbb700787a2485a756eea63ced268f9a00f6370/blosc/CMakeLists.txt#L157-L192

@mulimoen
Copy link
Owner Author

mulimoen commented Nov 6, 2023

Need to pass these flags to cc as this is not done automatically, rust-lang/cc-rs#268. I'll see what I can do tomorrow

@LDeakin
Copy link
Contributor

LDeakin commented Nov 6, 2023

3290ebd works and I've tested it with a suite of external benchmarks, LGTM.

@mulimoen mulimoen merged commit a472285 into main Nov 7, 2023
4 checks passed
@mulimoen
Copy link
Owner Author

mulimoen commented Nov 7, 2023

Thanks for picking up that this crate did not use SSE2/AVX @LDeakin. I will release a new version later today

@mulimoen mulimoen deleted the bugfix/enable_intrinsics branch November 7, 2023 07:21
@LDeakin
Copy link
Contributor

LDeakin commented Nov 7, 2023

No worries, thanks for addressing it so quickly!

@mulimoen mulimoen mentioned this pull request Nov 7, 2023
LDeakin added a commit to LDeakin/zarrs that referenced this pull request Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants