Increases account hash's stack buffer to hold 200 bytes of data #56

brooksprumo · 2024-03-04T15:45:32Z

Problem

When hashing an account, we create a buffer on the stack to reduce the number of unique hash calls to make. The buffer is currently 128 bytes. The metadata fields occupy 81 bytes, which leaves 47 bytes for the account's data.

On mnb today there is approximately 431 million accounts. Of that, approximately 388 million are token accounts. That's 90% of accounts. We should ensure the stack buffer can hold the account data for a token account.

Token accounts have a size of 165 bytes¹. This size will be growing though. ATA and Token22 extensions will be larger too. Jon estimates 170-182 bytes is "safe"².

Also, there are ~1 million stake accounts. They are 200 bytes.

We could size the stack buffer to hold 200 bytes, which will cover the vast majority of accounts. Currently, the vast majority of calls to hash an account have to perform a heap allocation.

Summary of Changes

Increases account hash's stack buffer to hold 200 bytes of data. This improves perf for hashing both stake accounts and spl token accounts by ~5%.

Benchmark Results

I ran the account hashing benchmark on this pr and on master multiple times to get stable results. Here's a representative result.

First with this PR (stack buffer is 200 bytes)

hash_account/data_size/0                                                                            
                        time:   [202.56 ns 202.64 ns 202.72 ns]
                        thrpt:  [381.05 MiB/s 381.20 MiB/s 381.35 MiB/s]
hash_account/data_size/165                                                                            
                        time:   [344.77 ns 345.00 ns 345.26 ns]
                        thrpt:  [679.49 MiB/s 680.01 MiB/s 680.46 MiB/s]
hash_account/data_size/200                                                                            
                        time:   [411.29 ns 411.79 ns 412.32 ns]
                        thrpt:  [649.93 MiB/s 650.78 MiB/s 651.57 MiB/s]
hash_account/data_size/1024                                                                             
                        time:   [1.4292 µs 1.4295 µs 1.4302 µs]
                        thrpt:  [736.85 MiB/s 737.17 MiB/s 737.37 MiB/s]
hash_account/data_size/1048576                                                                            
                        time:   [269.75 µs 269.78 µs 269.80 µs]
                        thrpt:  [3.6198 GiB/s 3.6202 GiB/s 3.6205 GiB/s]
hash_account/data_size/10485760                                                                             
                        time:   [2.6642 ms 2.6648 ms 2.6654 ms]
                        thrpt:  [3.6638 GiB/s 3.6647 GiB/s 3.6655 GiB/s]

Second with master (stack buffer is 47 bytes)

hash_account/data_size/0                                                                            
                        time:   [205.01 ns 205.17 ns 205.37 ns]
                        thrpt:  [376.14 MiB/s 376.50 MiB/s 376.80 MiB/s]
                 change:
                        time:   [+0.9893% +1.1179% +1.2289%] (p = 0.00 < 0.05)
                        thrpt:  [-1.2140% -1.1055% -0.9796%]
                        Change within noise threshold.
hash_account/data_size/165                                                                            
                        time:   [362.42 ns 362.56 ns 362.72 ns]
                        thrpt:  [646.80 MiB/s 647.07 MiB/s 647.32 MiB/s]
                 change:
                        time:   [+5.0576% +5.1817% +5.3143%] (p = 0.00 < 0.05)
                        thrpt:  [-5.0461% -4.9265% -4.8141%]
                        Performance has regressed.
hash_account/data_size/200                                                                            
                        time:   [435.20 ns 435.33 ns 435.46 ns]
                        thrpt:  [615.40 MiB/s 615.58 MiB/s 615.77 MiB/s]
                 change:
                        time:   [+5.3151% +5.4726% +5.6149%] (p = 0.00 < 0.05)
                        thrpt:  [-5.3164% -5.1887% -5.0469%]
                        Performance has regressed.
hash_account/data_size/1024                                                                             
                        time:   [1.4255 µs 1.4261 µs 1.4270 µs]
                        thrpt:  [738.48 MiB/s 738.96 MiB/s 739.28 MiB/s]
                 change:
                        time:   [-0.2172% -0.1373% -0.0381%] (p = 0.00 < 0.05)
                        thrpt:  [+0.0381% +0.1374% +0.2176%]
                        Change within noise threshold.
hash_account/data_size/1048576                                                                            
                        time:   [270.51 µs 270.59 µs 270.75 µs]
                        thrpt:  [3.6071 GiB/s 3.6092 GiB/s 3.6104 GiB/s]
                 change:
                        time:   [+0.2766% +0.3051% +0.3489%] (p = 0.00 < 0.05)
                        thrpt:  [-0.3477% -0.3041% -0.2758%]
                        Change within noise threshold.
hash_account/data_size/10485760                                                                             
                        time:   [2.6608 ms 2.6614 ms 2.6622 ms]
                        thrpt:  [3.6682 GiB/s 3.6694 GiB/s 3.6703 GiB/s]
                 change:
                        time:   [-0.1625% -0.1292% -0.0874%] (p = 0.00 < 0.05)
                        thrpt:  [+0.0875% +0.1294% +0.1628%]
                        Change within noise threshold.

My takeaways:

Both the 165 and 200 byte benchmarks are better with this PR; about 5%. This makes sense to me, and that's what this PR is targeting.
The sizes above 200 bytes (1k, 1m, 10m) perform the same. This makes sense to me because both master and this PR will need to make a heap allocation for the larger data size.
The 0 byte benchmark is the same/within the noise (I did run this many times, and often the change was less than 0.1%). This also makes sense, as neither master nor this PR will need to make a heap allocation.

codecov-commenter · 2024-03-04T17:56:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (3f9a7a5) to head (a40f09c).

Additional details and impacted files

@@            Coverage Diff            @@
##           master      #56     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         837      837             
  Lines      225922   225922             
=========================================
- Hits       184955   184944     -11     
- Misses      40967    40978     +11

HaoranYi

Nice work. Lgtm. Thanks!

jeffwashington

lgtm
It is counter-intuitive to me that copying the slice of data and then hashing it is faster than hashing it in place without the copy.

brooksprumo · 2024-03-05T17:01:06Z

lgtm It is counter-intuitive to me that copying the slice of data and then hashing it is faster than hashing it in place without the copy.

Here's the original PR that added the buffer: solana-labs#33788. In it @HaoranYi notes that the hasher works better with a larger buffer to work on.

Since the current PR is an improvement over master, I'll merge it. I'll also benchmark not copying the data, and always doing a separate hash. Lastly, I'll benchmark changing the hashing order, so account data is last, so the meta data fields are all hashed at once, and the data is done second.

HaoranYi · 2024-03-05T17:24:38Z

Internally, blake3 will copy the data into its local buffer and use SIMD to compute hash.
I think multiple updates with small pieces of data will not use SIMD effectively. By aggregating the data into a buffer, we can use SIMD more effectively. I think it is best to choose the buffer size to be multiple of SIMD width (multiple of 64 types).

brooksprumo · 2024-03-05T19:17:16Z

Here's the benchmark running with the following scenarios. All were on a dev box that can run mnb.

[A] Runs 0-3: this PR
[B] Runs 4-8: not copying into a buffer ever
[C] Runs 9-15: reordering the hashing so that all meta fields are copied into the buffer and hashed, then the account data is hashing afterwards (not copying into the buffer)

Trends:

Hashing accounts with data size 0 had results with the biggest difference. Interestingly, [B] was the slowest. Maybe something about how hashing zero bytes on its own actually has non-zero cost?
Data sizes of 200 and 1K showed some improvement with [C]. I'm not sure if this is significant enough to pursue.
Data sizes of 1M and 10M were all about even. This implies to me that reading the data itself was the bulk of the time.

Results:

data size 0

data size 165 and 200

data size 1K, 1M, and 10M

patch for [B]

@@ -6122,27 +6122,20 @@ impl AccountsDb {
         // allocate a buffer on the stack that's big enough
         // to hold a token account or a stake account
         const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
-        const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+        const DATA_SIZE: usize = 0;
         const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
         let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
 
         // collect lamports, rent_epoch into buffer to hash
         buffer.extend_from_slice(&lamports.to_le_bytes());
         buffer.extend_from_slice(&rent_epoch.to_le_bytes());
+        hasher.update(&buffer);
 
-        if data.len() > DATA_SIZE {
-            // For larger accounts whose data can't fit into the buffer, update the hash now.
-            hasher.update(&buffer);
-            buffer.clear();
-
-            // hash account's data
-            hasher.update(data);
-        } else {
-            // For small accounts whose data can fit into the buffer, append it to the buffer.
-            buffer.extend_from_slice(data);
-        }
+        // hash account's data
+        hasher.update(data);
 
         // collect exec_flag, owner, pubkey into buffer to hash
+        buffer.clear();
         buffer.push(executable.into());
         buffer.extend_from_slice(owner.as_ref());
         buffer.extend_from_slice(pubkey.as_ref());

patch for [C]

@@ -6122,32 +6122,21 @@ impl AccountsDb {
         // allocate a buffer on the stack that's big enough
         // to hold a token account or a stake account
         const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
-        const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+        const DATA_SIZE: usize = 0;
         const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
         let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
 
         // collect lamports, rent_epoch into buffer to hash
         buffer.extend_from_slice(&lamports.to_le_bytes());
         buffer.extend_from_slice(&rent_epoch.to_le_bytes());
-
-        if data.len() > DATA_SIZE {
-            // For larger accounts whose data can't fit into the buffer, update the hash now.
-            hasher.update(&buffer);
-            buffer.clear();
-
-            // hash account's data
-            hasher.update(data);
-        } else {
-            // For small accounts whose data can fit into the buffer, append it to the buffer.
-            buffer.extend_from_slice(data);
-        }
-
-        // collect exec_flag, owner, pubkey into buffer to hash
         buffer.push(executable.into());
         buffer.extend_from_slice(owner.as_ref());
         buffer.extend_from_slice(pubkey.as_ref());
         hasher.update(&buffer);
 
+        // hash account's data
+        hasher.update(data);
+
         AccountHash(Hash::new_from_array(hasher.finalize().into()))
     }

…-xyz#56)

brooksprumo self-assigned this Mar 4, 2024

brooksprumo mentioned this pull request Mar 4, 2024

Increases account hash's stack buffer to hold 200 bytes of data solana-labs/solana#35394

Closed

brooksprumo force-pushed the hash/stack-buffer-size branch from cfa6a2a to a40f09c Compare March 4, 2024 16:28

brooksprumo marked this pull request as draft March 4, 2024 17:32

brooksprumo force-pushed the hash/stack-buffer-size branch from a40f09c to dbe3a71 Compare March 4, 2024 19:00

Increases account hash's stack buffer to hold 200 bytes of data

b3be4e6

brooksprumo force-pushed the hash/stack-buffer-size branch from dbe3a71 to b3be4e6 Compare March 5, 2024 12:44

brooksprumo marked this pull request as ready for review March 5, 2024 13:39

brooksprumo requested review from HaoranYi and jeffwashington March 5, 2024 13:39

HaoranYi approved these changes Mar 5, 2024

View reviewed changes

jeffwashington approved these changes Mar 5, 2024

View reviewed changes

brooksprumo merged commit 1e133bc into anza-xyz:master Mar 5, 2024
46 checks passed

brooksprumo deleted the hash/stack-buffer-size branch March 5, 2024 17:02

codebender828 pushed a commit to codebender828/agave that referenced this pull request Oct 3, 2024

Increases account hash's stack buffer to hold 200 bytes of data (anza…

f6cd7c2

…-xyz#56)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increases account hash's stack buffer to hold 200 bytes of data #56

Increases account hash's stack buffer to hold 200 bytes of data #56

brooksprumo commented Mar 4, 2024 •

edited

Loading

codecov-commenter commented Mar 4, 2024

HaoranYi left a comment

jeffwashington left a comment

brooksprumo commented Mar 5, 2024

HaoranYi commented Mar 5, 2024

brooksprumo commented Mar 5, 2024 •

edited

Loading

Increases account hash's stack buffer to hold 200 bytes of data #56

Increases account hash's stack buffer to hold 200 bytes of data #56

Conversation

brooksprumo commented Mar 4, 2024 • edited Loading

Problem

Summary of Changes

Footnotes

codecov-commenter commented Mar 4, 2024

Codecov Report

HaoranYi left a comment

Choose a reason for hiding this comment

jeffwashington left a comment

Choose a reason for hiding this comment

brooksprumo commented Mar 5, 2024

HaoranYi commented Mar 5, 2024

brooksprumo commented Mar 5, 2024 • edited Loading

brooksprumo commented Mar 4, 2024 •

edited

Loading

brooksprumo commented Mar 5, 2024 •

edited

Loading