Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use as-chacha20poly1305 #4787

Merged
merged 4 commits into from
Nov 24, 2022
Merged

Use as-chacha20poly1305 #4787

merged 4 commits into from
Nov 24, 2022

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Nov 21, 2022

Motivation

It takes time for chacha20poly1305Decrypt function, on a goerli node of 1000 validators it takes ~10% of cpu time

Description

Closes #4652

Result after 3 day deployment:

  • The new profile showed chacha20poly1305Decrypt now takes 3.4% of cpu time with same number of rpc messages received

Screen Shot 2022-11-21 at 09 52 07

  • There are some random differences on the metrics but the main difference regarding this PR is lower cpu time on all nodes (while gc pause time rate is only a bit lower)

Screen Shot 2022-11-21 at 09 55 41

  • Number of rpc messages on all nodes are the same (and number of mesh peers too)

Screen Shot 2022-11-21 at 09 58 14

1121_lg1k_as-chacha20poly1305.cpuprofile.zip

@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 41dc272 Previous: 3373687 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 1.8615 ms/op 2.5305 ms/op 0.74
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 75.479 us/op 91.341 us/op 0.83
BLS verify - blst-native 1.8581 ms/op 2.2514 ms/op 0.83
BLS verifyMultipleSignatures 3 - blst-native 3.8059 ms/op 4.4575 ms/op 0.85
BLS verifyMultipleSignatures 8 - blst-native 8.1917 ms/op 9.5944 ms/op 0.85
BLS verifyMultipleSignatures 32 - blst-native 29.664 ms/op 34.717 ms/op 0.85
BLS aggregatePubkeys 32 - blst-native 39.272 us/op 46.078 us/op 0.85
BLS aggregatePubkeys 128 - blst-native 152.71 us/op 174.44 us/op 0.88
getAttestationsForBlock 89.825 ms/op 103.81 ms/op 0.87
isKnown best case - 1 super set check 520.00 ns/op 480.00 ns/op 1.08
isKnown normal case - 2 super set checks 506.00 ns/op 466.00 ns/op 1.09
isKnown worse case - 16 super set checks 512.00 ns/op 484.00 ns/op 1.06
CheckpointStateCache - add get delete 9.2340 us/op 10.466 us/op 0.88
validate gossip signedAggregateAndProof - struct 4.2655 ms/op 5.2672 ms/op 0.81
validate gossip attestation - struct 2.0271 ms/op 2.4525 ms/op 0.83
pickEth1Vote - no votes 2.0990 ms/op 2.4053 ms/op 0.87
pickEth1Vote - max votes 20.533 ms/op 22.592 ms/op 0.91
pickEth1Vote - Eth1Data hashTreeRoot value x2048 11.068 ms/op 13.014 ms/op 0.85
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 20.954 ms/op 23.932 ms/op 0.88
pickEth1Vote - Eth1Data fastSerialize value x2048 1.5587 ms/op 1.8968 ms/op 0.82
pickEth1Vote - Eth1Data fastSerialize tree x2048 14.600 ms/op 15.926 ms/op 0.92
bytes32 toHexString 1.0850 us/op 1.1460 us/op 0.95
bytes32 Buffer.toString(hex) 709.00 ns/op 807.00 ns/op 0.88
bytes32 Buffer.toString(hex) from Uint8Array 991.00 ns/op 1.0800 us/op 0.92
bytes32 Buffer.toString(hex) + 0x 717.00 ns/op 794.00 ns/op 0.90
Object access 1 prop 0.36800 ns/op 0.42600 ns/op 0.86
Map access 1 prop 0.29200 ns/op 0.33800 ns/op 0.86
Object get x1000 17.003 ns/op 20.628 ns/op 0.82
Map get x1000 1.0020 ns/op 1.1600 ns/op 0.86
Object set x1000 121.25 ns/op 139.81 ns/op 0.87
Map set x1000 71.877 ns/op 78.706 ns/op 0.91
Return object 10000 times 0.36860 ns/op 0.42370 ns/op 0.87
Throw Error 10000 times 6.0258 us/op 6.8672 us/op 0.88
fastMsgIdFn sha256 / 200 bytes 4.0990 us/op 5.0080 us/op 0.82
fastMsgIdFn h32 xxhash / 200 bytes 525.00 ns/op 607.00 ns/op 0.86
fastMsgIdFn h64 xxhash / 200 bytes 683.00 ns/op 908.00 ns/op 0.75
fastMsgIdFn sha256 / 1000 bytes 13.130 us/op 15.235 us/op 0.86
fastMsgIdFn h32 xxhash / 1000 bytes 676.00 ns/op 798.00 ns/op 0.85
fastMsgIdFn h64 xxhash / 1000 bytes 813.00 ns/op 882.00 ns/op 0.92
fastMsgIdFn sha256 / 10000 bytes 112.23 us/op 130.03 us/op 0.86
fastMsgIdFn h32 xxhash / 10000 bytes 2.4760 us/op 2.7550 us/op 0.90
fastMsgIdFn h64 xxhash / 10000 bytes 1.8210 us/op 2.1290 us/op 0.86
enrSubnets - fastDeserialize 64 bits 2.6960 us/op 3.0430 us/op 0.89
enrSubnets - ssz BitVector 64 bits 772.00 ns/op 955.00 ns/op 0.81
enrSubnets - fastDeserialize 4 bits 375.00 ns/op 434.00 ns/op 0.86
enrSubnets - ssz BitVector 4 bits 761.00 ns/op 919.00 ns/op 0.83
prioritizePeers score -10:0 att 32-0.1 sync 2-0 95.482 us/op 109.63 us/op 0.87
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 125.48 us/op 161.62 us/op 0.78
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 213.85 us/op 278.06 us/op 0.77
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 474.96 us/op 452.23 us/op 1.05
prioritizePeers score 0:0 att 64-1 sync 4-1 460.54 us/op 536.12 us/op 0.86
RateTracker 1000000 limit, 1 obj count per request 187.90 ns/op 239.91 ns/op 0.78
RateTracker 1000000 limit, 2 obj count per request 140.69 ns/op 182.05 ns/op 0.77
RateTracker 1000000 limit, 4 obj count per request 117.40 ns/op 154.91 ns/op 0.76
RateTracker 1000000 limit, 8 obj count per request 107.83 ns/op 139.43 ns/op 0.77
RateTracker with prune 4.6000 us/op 5.0690 us/op 0.91
array of 16000 items push then shift 2.8720 us/op 3.4359 us/op 0.84
LinkedList of 16000 items push then shift 18.754 ns/op 20.344 ns/op 0.92
array of 16000 items push then pop 242.55 ns/op 273.62 ns/op 0.89
LinkedList of 16000 items push then pop 17.618 ns/op 19.227 ns/op 0.92
array of 24000 items push then shift 3.9678 us/op 4.6040 us/op 0.86
LinkedList of 24000 items push then shift 22.434 ns/op 23.649 ns/op 0.95
array of 24000 items push then pop 217.31 ns/op 254.11 ns/op 0.86
LinkedList of 24000 items push then pop 19.992 ns/op 20.603 ns/op 0.97
intersect bitArray bitLen 8 11.436 ns/op 13.244 ns/op 0.86
intersect array and set length 8 169.40 ns/op 190.28 ns/op 0.89
intersect bitArray bitLen 128 71.984 ns/op 84.377 ns/op 0.85
intersect array and set length 128 2.2381 us/op 2.6330 us/op 0.85
Buffer.concat 32 items 1.9310 ns/op 2.1940 ns/op 0.88
pass gossip attestations to forkchoice per slot 6.0370 ms/op 4.7354 ms/op 1.27
computeDeltas 6.6992 ms/op 7.0719 ms/op 0.95
computeProposerBoostScoreFromBalances 921.16 us/op 1.1230 ms/op 0.82
altair processAttestation - 250000 vs - 7PWei normalcase 3.9857 ms/op 4.4491 ms/op 0.90
altair processAttestation - 250000 vs - 7PWei worstcase 5.9805 ms/op 6.7130 ms/op 0.89
altair processAttestation - setStatus - 1/6 committees join 209.78 us/op 254.22 us/op 0.83
altair processAttestation - setStatus - 1/3 committees join 403.03 us/op 487.07 us/op 0.83
altair processAttestation - setStatus - 1/2 committees join 560.03 us/op 697.00 us/op 0.80
altair processAttestation - setStatus - 2/3 committees join 723.59 us/op 877.71 us/op 0.82
altair processAttestation - setStatus - 4/5 committees join 997.89 us/op 1.2419 ms/op 0.80
altair processAttestation - setStatus - 100% committees join 1.1824 ms/op 1.4256 ms/op 0.83
altair processBlock - 250000 vs - 7PWei normalcase 29.305 ms/op 32.298 ms/op 0.91
altair processBlock - 250000 vs - 7PWei normalcase hashState 42.593 ms/op 43.484 ms/op 0.98
altair processBlock - 250000 vs - 7PWei worstcase 82.249 ms/op 109.89 ms/op 0.75
altair processBlock - 250000 vs - 7PWei worstcase hashState 102.01 ms/op 115.01 ms/op 0.89
phase0 processBlock - 250000 vs - 7PWei normalcase 3.7098 ms/op 4.0298 ms/op 0.92
phase0 processBlock - 250000 vs - 7PWei worstcase 45.755 ms/op 53.608 ms/op 0.85
altair processEth1Data - 250000 vs - 7PWei normalcase 819.88 us/op 988.27 us/op 0.83
Tree 40 250000 create 861.09 ms/op 920.21 ms/op 0.94
Tree 40 250000 get(125000) 295.64 ns/op 335.40 ns/op 0.88
Tree 40 250000 set(125000) 3.0133 us/op 2.8010 us/op 1.08
Tree 40 250000 toArray() 34.518 ms/op 37.436 ms/op 0.92
Tree 40 250000 iterate all - toArray() + loop 34.472 ms/op 37.666 ms/op 0.92
Tree 40 250000 iterate all - get(i) 114.37 ms/op 132.36 ms/op 0.86
MutableVector 250000 create 18.971 ms/op 23.123 ms/op 0.82
MutableVector 250000 get(125000) 13.414 ns/op 18.729 ns/op 0.72
MutableVector 250000 set(125000) 688.96 ns/op 696.63 ns/op 0.99
MutableVector 250000 toArray() 7.5758 ms/op 8.4567 ms/op 0.90
MutableVector 250000 iterate all - toArray() + loop 7.8176 ms/op 8.7291 ms/op 0.90
MutableVector 250000 iterate all - get(i) 3.4396 ms/op 4.1293 ms/op 0.83
Array 250000 create 7.2547 ms/op 7.8409 ms/op 0.93
Array 250000 clone - spread 3.8494 ms/op 4.0853 ms/op 0.94
Array 250000 get(125000) 1.5930 ns/op 1.6940 ns/op 0.94
Array 250000 set(125000) 1.5440 ns/op 1.6710 ns/op 0.92
Array 250000 iterate all - loop 167.85 us/op 201.42 us/op 0.83
effectiveBalanceIncrements clone Uint8Array 300000 87.611 us/op 97.792 us/op 0.90
effectiveBalanceIncrements clone MutableVector 300000 1.1820 us/op 1.2220 us/op 0.97
effectiveBalanceIncrements rw all Uint8Array 300000 252.82 us/op 301.16 us/op 0.84
effectiveBalanceIncrements rw all MutableVector 300000 225.28 ms/op 232.06 ms/op 0.97
phase0 afterProcessEpoch - 250000 vs - 7PWei 188.61 ms/op 221.76 ms/op 0.85
phase0 beforeProcessEpoch - 250000 vs - 7PWei 102.42 ms/op 107.91 ms/op 0.95
altair processEpoch - mainnet_e81889 517.12 ms/op 595.18 ms/op 0.87
mainnet_e81889 - altair beforeProcessEpoch 154.83 ms/op 157.82 ms/op 0.98
mainnet_e81889 - altair processJustificationAndFinalization 24.038 us/op 36.297 us/op 0.66
mainnet_e81889 - altair processInactivityUpdates 11.581 ms/op 13.013 ms/op 0.89
mainnet_e81889 - altair processRewardsAndPenalties 97.028 ms/op 110.79 ms/op 0.88
mainnet_e81889 - altair processRegistryUpdates 4.4880 us/op 7.3820 us/op 0.61
mainnet_e81889 - altair processSlashings 834.00 ns/op 1.3610 us/op 0.61
mainnet_e81889 - altair processEth1DataReset 938.00 ns/op 1.7580 us/op 0.53
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.2826 ms/op 2.7777 ms/op 0.82
mainnet_e81889 - altair processSlashingsReset 6.2710 us/op 12.586 us/op 0.50
mainnet_e81889 - altair processRandaoMixesReset 7.8010 us/op 12.049 us/op 0.65
mainnet_e81889 - altair processHistoricalRootsUpdate 1.1190 us/op 2.5450 us/op 0.44
mainnet_e81889 - altair processParticipationFlagUpdates 3.8040 us/op 8.9150 us/op 0.43
mainnet_e81889 - altair processSyncCommitteeUpdates 952.00 ns/op 2.4070 us/op 0.40
mainnet_e81889 - altair afterProcessEpoch 202.23 ms/op 235.45 ms/op 0.86
phase0 processEpoch - mainnet_e58758 615.07 ms/op 624.21 ms/op 0.99
mainnet_e58758 - phase0 beforeProcessEpoch 217.70 ms/op 281.07 ms/op 0.77
mainnet_e58758 - phase0 processJustificationAndFinalization 22.194 us/op 39.250 us/op 0.57
mainnet_e58758 - phase0 processRewardsAndPenalties 135.79 ms/op 163.10 ms/op 0.83
mainnet_e58758 - phase0 processRegistryUpdates 13.414 us/op 16.854 us/op 0.80
mainnet_e58758 - phase0 processSlashings 878.00 ns/op 1.5930 us/op 0.55
mainnet_e58758 - phase0 processEth1DataReset 1.0530 us/op 2.0460 us/op 0.51
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.1533 ms/op 2.6450 ms/op 0.81
mainnet_e58758 - phase0 processSlashingsReset 6.3650 us/op 12.291 us/op 0.52
mainnet_e58758 - phase0 processRandaoMixesReset 6.0290 us/op 13.923 us/op 0.43
mainnet_e58758 - phase0 processHistoricalRootsUpdate 911.00 ns/op 2.2260 us/op 0.41
mainnet_e58758 - phase0 processParticipationRecordUpdates 5.8750 us/op 13.764 us/op 0.43
mainnet_e58758 - phase0 afterProcessEpoch 161.44 ms/op 193.51 ms/op 0.83
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.6847 ms/op 3.2241 ms/op 0.83
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 3.5001 ms/op 4.2969 ms/op 0.81
altair processInactivityUpdates - 250000 normalcase 40.917 ms/op 48.311 ms/op 0.85
altair processInactivityUpdates - 250000 worstcase 51.735 ms/op 60.659 ms/op 0.85
phase0 processRegistryUpdates - 250000 normalcase 11.258 us/op 17.813 us/op 0.63
phase0 processRegistryUpdates - 250000 badcase_full_deposits 412.99 us/op 598.33 us/op 0.69
phase0 processRegistryUpdates - 250000 worstcase 0.5 223.30 ms/op 253.40 ms/op 0.88
altair processRewardsAndPenalties - 250000 normalcase 137.07 ms/op 165.99 ms/op 0.83
altair processRewardsAndPenalties - 250000 worstcase 88.025 ms/op 98.668 ms/op 0.89
phase0 getAttestationDeltas - 250000 normalcase 13.690 ms/op 15.303 ms/op 0.89
phase0 getAttestationDeltas - 250000 worstcase 13.800 ms/op 17.307 ms/op 0.80
phase0 processSlashings - 250000 worstcase 5.2804 ms/op 6.9887 ms/op 0.76
altair processSyncCommitteeUpdates - 250000 283.28 ms/op 351.49 ms/op 0.81
BeaconState.hashTreeRoot - No change 499.00 ns/op 697.00 ns/op 0.72
BeaconState.hashTreeRoot - 1 full validator 63.864 us/op 81.626 us/op 0.78
BeaconState.hashTreeRoot - 32 full validator 544.06 us/op 913.80 us/op 0.60
BeaconState.hashTreeRoot - 512 full validator 8.4037 ms/op 7.7019 ms/op 1.09
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 78.587 us/op 97.117 us/op 0.81
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.1726 ms/op 1.4508 ms/op 0.81
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 15.335 ms/op 18.503 ms/op 0.83
BeaconState.hashTreeRoot - 1 balances 62.408 us/op 72.718 us/op 0.86
BeaconState.hashTreeRoot - 32 balances 559.96 us/op 695.41 us/op 0.81
BeaconState.hashTreeRoot - 512 balances 5.8095 ms/op 7.0610 ms/op 0.82
BeaconState.hashTreeRoot - 250000 balances 78.891 ms/op 112.37 ms/op 0.70
aggregationBits - 2048 els - zipIndexesInBitList 39.506 us/op 42.161 us/op 0.94
regular array get 100000 times 67.426 us/op 80.810 us/op 0.83
wrappedArray get 100000 times 67.523 us/op 80.329 us/op 0.84
arrayWithProxy get 100000 times 32.757 ms/op 38.466 ms/op 0.85
ssz.Root.equals 478.00 ns/op 545.00 ns/op 0.88
byteArrayEquals 468.00 ns/op 544.00 ns/op 0.86
shuffle list - 16384 els 11.238 ms/op 13.198 ms/op 0.85
shuffle list - 250000 els 165.37 ms/op 190.08 ms/op 0.87
processSlot - 1 slots 12.513 us/op 14.707 us/op 0.85
processSlot - 32 slots 1.7381 ms/op 2.0648 ms/op 0.84
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 642.76 us/op 489.88 us/op 1.31
getCommitteeAssignments - req 1 vs - 250000 vc 5.3178 ms/op 6.1772 ms/op 0.86
getCommitteeAssignments - req 100 vs - 250000 vc 7.3342 ms/op 8.5534 ms/op 0.86
getCommitteeAssignments - req 1000 vs - 250000 vc 7.7726 ms/op 9.2389 ms/op 0.84
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 10.800 ns/op 11.590 ns/op 0.93
state getBlockRootAtSlot - 250000 vs - 7PWei 1.1094 us/op 1.3667 us/op 0.81
computeProposers - vc 250000 16.952 ms/op 19.673 ms/op 0.86
computeEpochShuffling - vc 250000 170.17 ms/op 194.90 ms/op 0.87
getNextSyncCommittee - vc 250000 280.06 ms/op 342.15 ms/op 0.82

by benchmarkbot/action

@dapplion
Copy link
Contributor

Results look great! Thank you so much for the full implementation and thorough testing.

@twoeths twoeths marked this pull request as ready for review November 23, 2022 03:49
@twoeths twoeths requested a review from a team as a code owner November 23, 2022 03:49
Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@twoeths twoeths force-pushed the tuyen/as-chacha20poly1305 branch from a3dec62 to b0ffa6a Compare November 24, 2022 02:23
@twoeths twoeths merged commit a4c0edd into unstable Nov 24, 2022
@twoeths twoeths deleted the tuyen/as-chacha20poly1305 branch November 24, 2022 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue with stablelib
2 participants