Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change hashing, curve and cipher libraries to @noble #322

Merged
merged 4 commits into from
Jul 17, 2023

Conversation

mpetrunic
Copy link
Member

@mpetrunic mpetrunic commented Apr 27, 2023

resolves #321
resolves #255

Signed-off-by: Marin Petrunic <[email protected]>
@mpetrunic
Copy link
Member Author

mpetrunic commented Apr 27, 2023

@paulmillr x25519 curve is a lot slower than stablelib implementation. It causes ~35% worse benchmark performance.

I've created #323 to update hashing libs. They seems to be pretty much the same in terms of performance

@paulmillr
Copy link

paulmillr commented Apr 27, 2023

Not sure why 35%, from my low-level benchmark of x25519 it's 21%:

noble x 1,510 ops/sec @ 661μs/op
stablelib x 1,828 ops/sec @ 546μs/op

Nevertheless, the slowdown is easily explained.

Stablelib is extremely hard to audit because it uses unrolled loops with 5x? code. My philosophy is that cryptography should be easy to audit and reason about.

@mpetrunic
Copy link
Member Author

Not sure why 35%, from my low-level benchmark of x25519 it's 21%:

Probably because it's benchmarking negotiating connections between two peers and that negotiation is not parallel so x25519 methods are called on both source and destination peer sequentially

@mpetrunic
Copy link
Member Author

@ChainSafe/lodestar ^^

@twoeths
Copy link
Contributor

twoeths commented May 4, 2023

lodestar has to struggle a lot due to performance issue, we have a lot of benchmark tests there as part of CI, everything is "performance first"

in this case I'd go with #323 if it has comparable performance to the latest release, please "~35% worse benchmark performance" is not acceptable to us 🙏

@paulmillr
Copy link

hkdf and sha256 are faster or similar to what you're having right now.

@wemeetagain
Copy link
Member

This PR is worth dusting off considering #321 (comment)

The lodestar case shouldn't be a blocker here since we override noise crypto implementation there anyways.

@mpetrunic mpetrunic changed the title fix: change hashing and curve libraries to @noble fix: change hashing, curve and cipher libraries to @noble Jul 11, 2023
@mpetrunic
Copy link
Member Author

@paulmillr @wemeetagain I'm hitting errors:

    RangeError: Error occurred during XX handshake: start offset of Uint32Array should be a multiple of 4
      at new Uint32Array (<anonymous>)
      at u32 (file:///home/mpetrunic/Projects/ChainSafe/libp2p/js-libp2p-noise/node_modules/@noble/ciphers/src/utils.ts:13:3)
      at file:///home/mpetrunic/Projects/ChainSafe/libp2p/js-libp2p-noise/node_modules/@noble/ciphers/src/_salsa.ts:149:17
      at Object.decrypt (file:///home/mpetrunic/Projects/ChainSafe/libp2p/js-libp2p-noise/node_modules/@noble/ciphers/src/chacha.ts:227:16)
      at Object.chaCha20Poly1305Decrypt (file:///home/mpetrunic/Projects/ChainSafe/libp2p/js-libp2p-noise/src/crypto/js.ts:58:5

On other notes, memory wise, it would be better if we could pass destination Uint8Array to chacha-poly1305 decrypt to save on memory allocation.

TODO: I could try to cache chachapoly function^^

@paulmillr
Copy link

it would be better if we could pass destination Uint8Array

This makes sense and pure-non-poly-chacha has this feature.

Chapoly doesn't. Chapoly ciphertext is 16 bytes longer than plaintext, which means we can't reuse plaintext u8a, for one. So, it's not radically simpler. Also i'm not sure if it will be much more performant.

@paulmillr
Copy link

Regarding the error, i've cloned the repository, but it tries to access bafybeievsa472emtpzcl5yjykw2mnqq7xav2aisk4rxrokhhtda6s3puau.ipfs.w3s.link on npm install, which my setup blocks, so if you'll want me to help, i'll need a proper small reproducible test case.

@mpetrunic
Copy link
Member Author

it would be better if we could pass destination Uint8Array

This makes sense and pure-non-poly-chacha has this feature.

Chapoly doesn't. Chapoly ciphertext is 16 bytes longer than plaintext, which means we can't reuse plaintext u8a, for one. So, it's not radically simpler. Also i'm not sure if it will be much more performant.

I agree for encryption it doesn't make sense but when decrypting we are passing ciphertext as destination not plaintext

@mpetrunic
Copy link
Member Author

Regarding the error, i've cloned the repository, but it tries to access bafybeievsa472emtpzcl5yjykw2mnqq7xav2aisk4rxrokhhtda6s3puau.ipfs.w3s.link on npm install, which my setup blocks, so if you'll want me to help, i'll need a proper small reproducible test case.

Here you go: paulmillr/noble-ciphers#2

@paulmillr
Copy link

fixed now

Signed-off-by: Marin Petrunic <[email protected]>
@mpetrunic mpetrunic marked this pull request as ready for review July 17, 2023 10:09
@mpetrunic mpetrunic requested a review from a team as a code owner July 17, 2023 10:09
@mpetrunic
Copy link
Member Author

it would be better if we could pass destination Uint8Array

This makes sense and pure-non-poly-chacha has this feature.
Chapoly doesn't. Chapoly ciphertext is 16 bytes longer than plaintext, which means we can't reuse plaintext u8a, for one. So, it's not radically simpler. Also i'm not sure if it will be much more performant.

I agree for encryption it doesn't make sense but when decrypting we are passing ciphertext as destination not plaintext

@paulmillr do you have plans of enabling that?

@wemeetagain maybe worth checking perf of assemblyscript version vs pure js. Although I imagine not reusing buffers when decrypting are gonna be blocker^^

In any case, this should be ready to go.

@mpetrunic
Copy link
Member Author

fixed now

tnx!

@wemeetagain
Copy link
Member

Lodestar will probably keep using our own implementation, as you mentioned, we rely on perf benefits of avoiding allocation as described in #242

Would definitely love to bench against noble (esp if it gets this feature too)

@wemeetagain wemeetagain merged commit 5b4f4f2 into master Jul 17, 2023
@wemeetagain wemeetagain deleted the change-hashes-curve-libs branch July 17, 2023 15:24
@paulmillr
Copy link

chainsafe/as-chacha20poly1305 is slower than noble-ciphers on meaningful (1KB+) inputs. Proof (will commit benchmark to ciphers repo):

chacha20_poly1305 (encrypt, 32B)
├─node x 521,104 ops/sec @ 1μs/op
├─stable x 572,737 ops/sec @ 1μs/op
├─chainsafe x 1,144,164 ops/sec @ 874ns/op
├─noble x 463,392 ops/sec @ 2μs/op
└─micro x 157,430 ops/sec @ 6μs/op
chacha20_poly1305 (decrypt, 32B)
├─node x 555,864 ops/sec @ 1μs/op
├─stable x 563,063 ops/sec @ 1μs/op
├─chainsafe x 1,044,932 ops/sec @ 957ns/op
├─noble x 446,030 ops/sec @ 2μs/op
└─micro x 155,714 ops/sec @ 6μs/op
chacha20_poly1305 (encrypt, 64B)
├─node x 498,256 ops/sec @ 2μs/op
├─stable x 523,834 ops/sec @ 1μs/op
├─chainsafe x 871,839 ops/sec @ 1μs/op ± 1.42% (min: 958ns, max: 2ms)
├─noble x 441,306 ops/sec @ 2μs/op
└─micro x 131,233 ops/sec @ 7μs/op
chacha20_poly1305 (decrypt, 64B)
├─node x 549,752 ops/sec @ 1μs/op ± 1.39% (min: 1μs, max: 7ms)
├─stable x 517,063 ops/sec @ 1μs/op
├─chainsafe x 919,963 ops/sec @ 1μs/op ± 1.49% (min: 916ns, max: 6ms)
├─noble x 439,174 ops/sec @ 2μs/op
└─micro x 134,282 ops/sec @ 7μs/op
chacha20_poly1305 (encrypt, 1KB)
├─node x 330,141 ops/sec @ 3μs/op ± 3.28% (min: 2μs, max: 1ms)
├─stable x 104,253 ops/sec @ 9μs/op
├─chainsafe x 135,427 ops/sec @ 7μs/op
├─noble x 140,567 ops/sec @ 7μs/op
└─micro x 18,770 ops/sec @ 53μs/op
chacha20_poly1305 (decrypt, 1KB)
├─node x 320,512 ops/sec @ 3μs/op ± 3.71% (min: 2μs, max: 1ms)
├─stable x 102,912 ops/sec @ 9μs/op
├─chainsafe x 138,773 ops/sec @ 7μs/op
├─noble x 139,918 ops/sec @ 7μs/op
└─micro x 18,782 ops/sec @ 53μs/op
chacha20_poly1305 (encrypt, 8KB)
├─node x 115,008 ops/sec @ 8μs/op ± 2.24% (min: 6μs, max: 380μs)
├─stable x 14,781 ops/sec @ 67μs/op
├─chainsafe x 18,317 ops/sec @ 54μs/op
├─noble x 23,499 ops/sec @ 42μs/op
└─micro x 2,561 ops/sec @ 390μs/op
chacha20_poly1305 (decrypt, 8KB)
├─node x 104,536 ops/sec @ 9μs/op ± 2.06% (min: 7μs, max: 373μs)
├─stable x 14,791 ops/sec @ 67μs/op
├─chainsafe x 18,672 ops/sec @ 53μs/op
├─noble x 23,473 ops/sec @ 42μs/op
└─micro x 2,576 ops/sec @ 388μs/op
chacha20_poly1305 (encrypt, 1MB)
├─node x 1,363 ops/sec @ 733μs/op
├─stable x 120 ops/sec @ 8ms/op
├─chainsafe x 149 ops/sec @ 6ms/op
├─noble x 195 ops/sec @ 5ms/op
└─micro x 20 ops/sec @ 49ms/op
chacha20_poly1305 (decrypt, 1MB)
├─node x 1,315 ops/sec @ 760μs/op
├─stable x 120 ops/sec @ 8ms/op
├─chainsafe x 153 ops/sec @ 6ms/op
├─noble x 196 ops/sec @ 5ms/op
└─micro x 20 ops/sec @ 49ms/op

@twoeths
Copy link
Contributor

twoeths commented Jul 23, 2023

Thanks @paulmillr for the benchmark, it looks great. Lodestar would definitely switch to noble if it supports an optional param dst?: Uint8Array in its decrypt function here https://github.com/ChainSafe/js-libp2p-noise/pull/322/files#diff-c3ec8eb75188633a6a283334b7ca30f367c52c6bbf0af26d18e367f6b8ef6d4eR55

cc @wemeetagain @dapplion

@paulmillr
Copy link

@tuyennhv why? What's the need?

noble implements dst for pure salsa / chacha, because there you can use the same uintarray:

length(chacha(key, nonce, plaintext)) === length(plaintext)

In chacha-poly, length(plaintext) is 12-24 bytes smaller than ciphertext length, which means you can't reuse. If you were to use the plaintext uintarray for ciphertext, that would throw an error. If you were to use the ciphertext uintarray for plaintext, that would work, but your 12-24 remaining bytes would still be ciphertext. It's a mess.

@twoeths
Copy link
Contributor

twoeths commented Jul 25, 2023

@paulmillr it's because having to allocate new memory in open() function caused performance issue in Lodestar as described in #242

I know it's not ideal to mix plaintext with ciphertext, it took us a lot of time to review and test, finally we decided to get that PR in given a lot of performance issues we've had to deal with.

in the end I think that's not a concern of a library, you only need to provide an optional dst?: Uint8Array parameter, the application should know if it wants to use that parameter or not.

@paulmillr
Copy link

@tuyennhv if the performance difference is really that big, as with the 242 pull request you've mentioned, I can definitely add the dst.

I assume you'll do similar CPU tests before noble deployment.

@paulmillr
Copy link

@tuyennhv dst is live in 0.2.0

@mpetrunic
Copy link
Member Author

@tuyennhv dst is live in 0.2.0

It's not on npm yet?

@paulmillr
Copy link

@mpetrunic hmm, seems like GitHub CI auto-publish did not go through. I've pinged it one more time and now it's on NPM.

@twoeths
Copy link
Contributor

twoeths commented Aug 1, 2023

@tuyennhv dst is live in 0.2.0

@paulmillr thanks for this. When I add noble to benchmark, not sure why I get different results, the test is here ChainSafe/as-chacha20poly1305#7

according to this, noble is 5% faster than @chainsafe/as-chacha20poly1305 starting from 8kb. However a lot of data in lodestar is <= 512 bytes so we'll put this on hold if the benchmark is consistent to yours.

@paulmillr
Copy link

paulmillr commented Aug 1, 2023

Because your benchmark software is 1000 lines of code, and each additional feature could influence the result? How could you be sure that running inside of mocha context does not influence V8 garbage collection strategy, or just-in-time compiler? You should try this instead:

git clone [email protected]:paulmillr/noble-ciphers.git
cd noble-ciphers
npm install && npm run build && npm run bench:install
cd benchmark
node aead.js

and using micro-bmark

we'll put this on hold if the benchmark is consistent to yours

This is not just about perf (even though the benchmarks are wrong).

You are using wasm, wasm is unsupported on many platforms, which makes your library unsupported on many platforms. Remember ChainSafe/ssz#313: metamask was not able to use your wasm sha256 library.

@mpetrunic
Copy link
Member Author

@mpetrunic hmm, seems like GitHub CI auto-publish did not go through. I've pinged it one more time and now it's on NPM.

Seems like types aren't updated: paulmillr/noble-ciphers#4

@twoeths
Copy link
Contributor

twoeths commented Aug 2, 2023

@paulmillr I followed your instruction, the result is quite similar to my benchmark: noble starts getting better at 8kb:

==== chacha20_poly1305 ====
chacha20_poly1305 (encrypt, 64B)
├─node x 131,113 ops/sec @ 7μs/op ± 2.48% (min: 5μs, max: 12ms)
├─stable x 124,595 ops/sec @ 8μs/op
├─chainsafe x 464,900 ops/sec @ 2μs/op ± 2.19% (min: 1μs, max: 3ms)
├─noble x 169,233 ops/sec @ 5μs/op
└─micro x 56,535 ops/sec @ 17μs/op
chacha20_poly1305 (decrypt, 64B)
├─node x 148,016 ops/sec @ 6μs/op ± 2.89% (min: 5μs, max: 12ms)
├─stable x 136,128 ops/sec @ 7μs/op
├─chainsafe x 553,097 ops/sec @ 1μs/op ± 3.74% (min: 1μs, max: 4ms)
├─noble x 158,780 ops/sec @ 6μs/op
└─micro x 58,224 ops/sec @ 17μs/op
chacha20_poly1305 (encrypt, 1KB)
├─node x 102,438 ops/sec @ 9μs/op ± 4.26% (min: 7μs, max: 14ms)
├─stable x 31,598 ops/sec @ 31μs/op
├─chainsafe x 95,978 ops/sec @ 10μs/op ± 2.87% (min: 9μs, max: 10ms)
├─noble x 79,270 ops/sec @ 12μs/op
└─micro x 8,725 ops/sec @ 114μs/op
chacha20_poly1305 (decrypt, 1KB)
├─node x 86,820 ops/sec @ 11μs/op ± 7.70% (min: 7μs, max: 18ms)
├─stable x 35,469 ops/sec @ 28μs/op
├─chainsafe x 94,616 ops/sec @ 10μs/op ± 2.10% (min: 9μs, max: 9ms)
├─noble x 77,537 ops/sec @ 12μs/op
└─micro x 8,854 ops/sec @ 112μs/op
chacha20_poly1305 (encrypt, 8KB)
├─node x 45,179 ops/sec @ 22μs/op ± 4.43% (min: 16μs, max: 4ms)
├─stable x 4,873 ops/sec @ 205μs/op
├─chainsafe x 14,050 ops/sec @ 71μs/op
├─noble x 14,790 ops/sec @ 67μs/op ± 1.37% (min: 60μs, max: 6ms)
└─micro x 1,203 ops/sec @ 830μs/op
chacha20_poly1305 (decrypt, 8KB)
├─node x 39,901 ops/sec @ 25μs/op ± 3.29% (min: 17μs, max: 2ms)
├─stable x 5,637 ops/sec @ 177μs/op
├─chainsafe x 14,617 ops/sec @ 68μs/op
├─noble x 15,088 ops/sec @ 66μs/op ± 1.54% (min: 60μs, max: 5ms)
└─micro x 1,217 ops/sec @ 821μs/op
chacha20_poly1305 (encrypt, 1MB)
├─node x 545 ops/sec @ 1ms/op ± 6.65% (min: 1ms, max: 23ms)
├─stable x 39 ops/sec @ 25ms/op
├─chainsafe x 114 ops/sec @ 8ms/op
├─noble x 130 ops/sec @ 7ms/op
└─micro x 9 ops/sec @ 107ms/op
chacha20_poly1305 (decrypt, 1MB)
├─node x 567 ops/sec @ 1ms/op ± 4.08% (min: 1ms, max: 12ms)
├─stable x 46 ops/sec @ 21ms/op
├─chainsafe x 118 ops/sec @ 8ms/op
├─noble x 132 ops/sec @ 7ms/op
└─micro x 9 ops/sec @ 104ms/op

@paulmillr
Copy link

Interesting -- thanks for the test. I wonder why my m2 has parity on 1kb.

Still, it's wasm (ChainSafe/ssz#313).

@twoeths
Copy link
Contributor

twoeths commented Aug 2, 2023

If noble or any libraries, at some point, is consistently better than as-chacha20poly1305 then I'll use it for Lodestar which is not a library but an application (an ethereum consensus client).
A Lodestar full node is supposed to run on NodeJS environment so we only care about performance here

@paulmillr
Copy link

This is not about lodestar - this is about libp2p.

In ssz, you've added noble as default, but kept using native, performant version for lodestar. Seems like a good idea.

@mpetrunic
Copy link
Member Author

This is not about lodestar - this is about libp2p.

In ssz, you've added noble as default, but kept using native, performant version for lodestar. Seems like a good idea.

It's same here, noble (or previously stablelib) is default but lodestar uses as-chacha20poly1305^^

@paulmillr
Copy link

Great! 👏 So we're good here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Security: Replace js crypto packages with noble Migrate to as-chacha20poly1305
4 participants