-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increases account hash's stack buffer to hold 200 bytes of data #56
Increases account hash's stack buffer to hold 200 bytes of data #56
Conversation
cfa6a2a
to
a40f09c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #56 +/- ##
=========================================
- Coverage 81.8% 81.8% -0.1%
=========================================
Files 837 837
Lines 225922 225922
=========================================
- Hits 184955 184944 -11
- Misses 40967 40978 +11 |
a40f09c
to
dbe3a71
Compare
dbe3a71
to
b3be4e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Lgtm. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
It is counter-intuitive to me that copying the slice of data and then hashing it is faster than hashing it in place without the copy.
Here's the original PR that added the buffer: solana-labs#33788. In it @HaoranYi notes that the hasher works better with a larger buffer to work on. Since the current PR is an improvement over |
Internally, blake3 will copy the data into its local buffer and use SIMD to compute hash. |
Here's the benchmark running with the following scenarios. All were on a dev box that can run mnb.
Trends:
Results: patch for [B]
@@ -6122,27 +6122,20 @@ impl AccountsDb {
// allocate a buffer on the stack that's big enough
// to hold a token account or a stake account
const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
- const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+ const DATA_SIZE: usize = 0;
const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
// collect lamports, rent_epoch into buffer to hash
buffer.extend_from_slice(&lamports.to_le_bytes());
buffer.extend_from_slice(&rent_epoch.to_le_bytes());
+ hasher.update(&buffer);
- if data.len() > DATA_SIZE {
- // For larger accounts whose data can't fit into the buffer, update the hash now.
- hasher.update(&buffer);
- buffer.clear();
-
- // hash account's data
- hasher.update(data);
- } else {
- // For small accounts whose data can fit into the buffer, append it to the buffer.
- buffer.extend_from_slice(data);
- }
+ // hash account's data
+ hasher.update(data);
// collect exec_flag, owner, pubkey into buffer to hash
+ buffer.clear();
buffer.push(executable.into());
buffer.extend_from_slice(owner.as_ref());
buffer.extend_from_slice(pubkey.as_ref()); patch for [C]
@@ -6122,32 +6122,21 @@ impl AccountsDb {
// allocate a buffer on the stack that's big enough
// to hold a token account or a stake account
const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
- const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+ const DATA_SIZE: usize = 0;
const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
// collect lamports, rent_epoch into buffer to hash
buffer.extend_from_slice(&lamports.to_le_bytes());
buffer.extend_from_slice(&rent_epoch.to_le_bytes());
-
- if data.len() > DATA_SIZE {
- // For larger accounts whose data can't fit into the buffer, update the hash now.
- hasher.update(&buffer);
- buffer.clear();
-
- // hash account's data
- hasher.update(data);
- } else {
- // For small accounts whose data can fit into the buffer, append it to the buffer.
- buffer.extend_from_slice(data);
- }
-
- // collect exec_flag, owner, pubkey into buffer to hash
buffer.push(executable.into());
buffer.extend_from_slice(owner.as_ref());
buffer.extend_from_slice(pubkey.as_ref());
hasher.update(&buffer);
+ // hash account's data
+ hasher.update(data);
+
AccountHash(Hash::new_from_array(hasher.finalize().into()))
} |
Problem
When hashing an account, we create a buffer on the stack to reduce the number of unique hash calls to make. The buffer is currently 128 bytes. The metadata fields occupy 81 bytes, which leaves 47 bytes for the account's data.
On mnb today there is approximately 431 million accounts. Of that, approximately 388 million are token accounts. That's 90% of accounts. We should ensure the stack buffer can hold the account data for a token account.
Token accounts have a size of 165 bytes1. This size will be growing though. ATA and Token22 extensions will be larger too. Jon estimates 170-182 bytes is "safe"2.
Also, there are ~1 million stake accounts. They are 200 bytes.
We could size the stack buffer to hold 200 bytes, which will cover the vast majority of accounts. Currently, the vast majority of calls to hash an account have to perform a heap allocation.
Summary of Changes
Increases account hash's stack buffer to hold 200 bytes of data. This improves perf for hashing both stake accounts and spl token accounts by ~5%.
Benchmark Results
I ran the account hashing benchmark on this pr and on master multiple times to get stable results. Here's a representative result.
First with this PR (stack buffer is 200 bytes)
Second with master (stack buffer is 47 bytes)
My takeaways:
Footnotes
https://github.com/solana-labs/solana-program-library/blob/e651623033fca7997ccd21e55d0f2388473122f9/token/program/src/state.rs#L134 ↩
https://discord.com/channels/428295358100013066/977244255212937306/1212827836281524224 ↩