Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hashtree crate with feature, add Criterion benchmarks #161

Closed
wants to merge 7 commits into from

Conversation

mempirate
Copy link

@mempirate mempirate commented Sep 6, 2024

Implements a new hash_chunks method that is used everywhere and will vary on implementation based on the features enabled.

Related to #156 but further optimizations are possible.

Notes

  • All hashing is now abstracted away over a single hash_chunks function, whose implementation differs based on the enabled features. I've tried adding a generic hasher trait and using dynamic dispatch, but this became pretty messy. I think this is cleaner.
  • Had to upgrade rustc to 1.74.0 to get Criterion working

Benchmarks

sha2 (default)

hash_tree_root_sha256   time:   [49.648 µs 49.814 µs 50.056 µs]
                        change: [-0.3203% +0.5809% +1.7785%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking generate_proof_sha256: Warming up for 3.0000 s
generate_proof_sha256   time:   [811.00 ms 815.02 ms 819.58 ms]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

sha2-asm

hash_tree_root_sha256-asm
                        time:   [10.242 µs 10.252 µs 10.263 µs]
                        change: [+0.3393% +0.5229% +0.6984%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking generate_proof_sha256-asm: Warming up for 3.0000 s
generate_proof_sha256-asm
                        time:   [132.81 ms 133.67 ms 135.02 ms]
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

hashtree

hash_tree_root_hashtree time:   [8.2075 µs 8.2200 µs 8.2385 µs]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Benchmarking generate_proof_hashtree: Warming up for 3.0000 s
generate_proof_hashtree time:   [111.12 ms 111.86 ms 112.93 ms]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

In these benchmarks, hashtree is ~16% faster at generating proofs and ~20% faster at calculating hash tree roots compared to sha2-asm.

@mempirate mempirate marked this pull request as ready for review September 6, 2024 10:41
@mempirate
Copy link
Author

mempirate commented Sep 6, 2024

With the latest commit getting rid of heap allocation for chunks and replacing it with an array, benchmarks were even better for hashtree:

hash_tree_root_hashtree time:   [7.8244 µs 7.8345 µs 7.8479 µs]
                        change: [-4.8425% -4.6332% -4.4083%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Benchmarking generate_proof_hashtree: Warming up for 3.0000 s
generate_proof_hashtree time:   [97.430 ms 97.634 ms 97.872 ms]
                        change: [-13.578% -12.719% -12.096%] (p = 0.00 < 0.05)
                        Performance has improved.

@mempirate mempirate changed the title Support hashtree crate with feature Support hashtree crate with feature, add Criterion benchmarks Sep 7, 2024
@potuz
Copy link

potuz commented Sep 9, 2024

I think this shouldn't close #156 this PR creates a wrapper that hashes two chunks, not necessarily contiguous with hashtree. As such it only benefits from the padding block prescheduling of hashtree but none of the vectorized/parallel pipelining. This is why you are getting only 16%--20% benefits from it. When hashing two chunks at the time you also incur on copies and allocations for every chunk you hash, something that is utterly unnecessary in most situations.

The approach to actually gain from this library is to hash one entire layer at the time, by passing all the layer, contiguously allocated, to the chunks slice in the function https://github.com/prysmaticlabs/hashtree/blob/main/examples/basic_usage.rs#L36

You can hash a Merkle tree of base 2^{N+1} by allocating only 2^N for the first layer, and then rewriting that allocated layer on successive runs, by passing the same slice as chunks and out.

@mempirate
Copy link
Author

@potuz agreed this doesn't resolve the whole issue. I will follow this up with another PR for the other performance improvements you mentioned.

@mempirate
Copy link
Author

Closing in favor of #172 #169

@mempirate mempirate closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants