-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote_cache/digest: add benchmark for sha256-simd #4547
Conversation
Provide a setup to compare minio/sha256-simd (Apache 2.0 license) performance vs the Go standard library "crypto/sha256". The `sha256-simd` library comes with 2 modes: - without server, automatically detect CPU features - with server, require Avx512 CPU features The ARM64 support is not tested. Running the benchmark against out remote executor yields ``` ==================== Test output for //server/remote_cache/digest:simd_bench_test: goos: linux goarch: amd64 cpu: Intel(R) Xeon(R) CPU @ 3.10GHz BenchmarkSIMDDigestCompute/without_SIMD/1-30 255240 5042 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/1-30 234526 5190 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/1-30 388 36804140 ns/op BenchmarkSIMDDigestCompute/without_SIMD/10-30 10000 118668 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/10-30 26204 64872 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/10-30 100 62445228 ns/op BenchmarkSIMDDigestCompute/without_SIMD/100-30 10000 193471 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/100-30 20247 135334 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/100-30 100 64685802 ns/op BenchmarkSIMDDigestCompute/without_SIMD/1000-30 14314 188163 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/1000-30 10000 176901 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/1000-30 100 212289431 ns/op BenchmarkSIMDDigestCompute/without_SIMD/10000-30 9067 658089 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/10000-30 10000 721403 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/10000-30 100 234613900 ns/op BenchmarkSIMDDigestCompute/without_SIMD/100000-30 2685 1577976 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/100000-30 1924 1079974 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/100000-30 100 146595705 ns/op BenchmarkSIMDDigestCompute/without_SIMD/1000000-30 312 9117083 ns/op BenchmarkSIMDDigestCompute/with_SIMD_no_server/1000000-30 298 13086220 ns/op BenchmarkSIMDDigestCompute/with_SIMD_with_server/1000000-30 56 211401036 ns/op PASS ================================================================================ ```
func hasherWithServer() hash.Hash { | ||
server := sha256simd.NewAvx512Server() | ||
return sha256simd.NewAvx512(server) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in real usage, would we reuse this server across requests? wonder if the server
should be declared as a top-level var instead of creating a new server on every iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/minio/sha256-simd/blob/master/README.md#support-for-avx512
Due to this different way of scheduling, we decided to use an explicit method to instantiate the AVX512 version. > Essentially one or more AVX512 processing servers (Avx512Server) have to be created whereby each server can hash over 3 GB/s on a single core. An hash.Hash object (Avx512Digest) is then instantiated using one of these servers and used in the regular fashion:
I think the expectation here is to create 1 server for each core? there are not a lot of examples 🤔
Since the doc mentioned speed up for cases Overall, the constraint of (1) a bigger size message, (2) server - CPU core 1-1 mapping, and (3) message padding for alignment make it quite unattractive. to our use case. Gona close this for now. |
After digging into this a bit more, it seems like the CPU we have on GCP, at least for our executor, do not include Intel's SHA extension
And the minio/sha256-simd code has this clause https://github.com/minio/sha256-simd/blob/6096f891a77bfe490cbea7a424c821b5fdb92849/cpuid_other.go#L27 So when we use The AVX512 implementation is mostly targeted toward hashing bigger files/messages and thus is not suitable for our use case for now. The ARM64 implementation could be attractive for ARM64 executors (Linux / MacOS) down the line, but my benchmark on M1 laptop does not show a big speed-up. Pushed my latest local setup to the branch so future me / other folks could replicate the experiment. |
Provide a setup to compare minio/sha256-simd (Apache 2.0 license)
performance vs the Go standard library "crypto/sha256".
The
sha256-simd
library comes with 2 modes:The ARM64 support is not tested.
Running the benchmark against out remote executor yields
Related issues: N/A