Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increases account hash's stack buffer to hold 200 bytes of data #56

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Mar 4, 2024

Problem

When hashing an account, we create a buffer on the stack to reduce the number of unique hash calls to make. The buffer is currently 128 bytes. The metadata fields occupy 81 bytes, which leaves 47 bytes for the account's data.

On mnb today there is approximately 431 million accounts. Of that, approximately 388 million are token accounts. That's 90% of accounts. We should ensure the stack buffer can hold the account data for a token account.

Token accounts have a size of 165 bytes1. This size will be growing though. ATA and Token22 extensions will be larger too. Jon estimates 170-182 bytes is "safe"2.

Also, there are ~1 million stake accounts. They are 200 bytes.

We could size the stack buffer to hold 200 bytes, which will cover the vast majority of accounts. Currently, the vast majority of calls to hash an account have to perform a heap allocation.

Summary of Changes

Increases account hash's stack buffer to hold 200 bytes of data. This improves perf for hashing both stake accounts and spl token accounts by ~5%.

Benchmark Results

I ran the account hashing benchmark on this pr and on master multiple times to get stable results. Here's a representative result.

First with this PR (stack buffer is 200 bytes)

hash_account/data_size/0                                                                            
                        time:   [202.56 ns 202.64 ns 202.72 ns]
                        thrpt:  [381.05 MiB/s 381.20 MiB/s 381.35 MiB/s]
hash_account/data_size/165                                                                            
                        time:   [344.77 ns 345.00 ns 345.26 ns]
                        thrpt:  [679.49 MiB/s 680.01 MiB/s 680.46 MiB/s]
hash_account/data_size/200                                                                            
                        time:   [411.29 ns 411.79 ns 412.32 ns]
                        thrpt:  [649.93 MiB/s 650.78 MiB/s 651.57 MiB/s]
hash_account/data_size/1024                                                                             
                        time:   [1.4292 µs 1.4295 µs 1.4302 µs]
                        thrpt:  [736.85 MiB/s 737.17 MiB/s 737.37 MiB/s]
hash_account/data_size/1048576                                                                            
                        time:   [269.75 µs 269.78 µs 269.80 µs]
                        thrpt:  [3.6198 GiB/s 3.6202 GiB/s 3.6205 GiB/s]
hash_account/data_size/10485760                                                                             
                        time:   [2.6642 ms 2.6648 ms 2.6654 ms]
                        thrpt:  [3.6638 GiB/s 3.6647 GiB/s 3.6655 GiB/s]

Second with master (stack buffer is 47 bytes)

hash_account/data_size/0                                                                            
                        time:   [205.01 ns 205.17 ns 205.37 ns]
                        thrpt:  [376.14 MiB/s 376.50 MiB/s 376.80 MiB/s]
                 change:
                        time:   [+0.9893% +1.1179% +1.2289%] (p = 0.00 < 0.05)
                        thrpt:  [-1.2140% -1.1055% -0.9796%]
                        Change within noise threshold.
hash_account/data_size/165                                                                            
                        time:   [362.42 ns 362.56 ns 362.72 ns]
                        thrpt:  [646.80 MiB/s 647.07 MiB/s 647.32 MiB/s]
                 change:
                        time:   [+5.0576% +5.1817% +5.3143%] (p = 0.00 < 0.05)
                        thrpt:  [-5.0461% -4.9265% -4.8141%]
                        Performance has regressed.
hash_account/data_size/200                                                                            
                        time:   [435.20 ns 435.33 ns 435.46 ns]
                        thrpt:  [615.40 MiB/s 615.58 MiB/s 615.77 MiB/s]
                 change:
                        time:   [+5.3151% +5.4726% +5.6149%] (p = 0.00 < 0.05)
                        thrpt:  [-5.3164% -5.1887% -5.0469%]
                        Performance has regressed.
hash_account/data_size/1024                                                                             
                        time:   [1.4255 µs 1.4261 µs 1.4270 µs]
                        thrpt:  [738.48 MiB/s 738.96 MiB/s 739.28 MiB/s]
                 change:
                        time:   [-0.2172% -0.1373% -0.0381%] (p = 0.00 < 0.05)
                        thrpt:  [+0.0381% +0.1374% +0.2176%]
                        Change within noise threshold.
hash_account/data_size/1048576                                                                            
                        time:   [270.51 µs 270.59 µs 270.75 µs]
                        thrpt:  [3.6071 GiB/s 3.6092 GiB/s 3.6104 GiB/s]
                 change:
                        time:   [+0.2766% +0.3051% +0.3489%] (p = 0.00 < 0.05)
                        thrpt:  [-0.3477% -0.3041% -0.2758%]
                        Change within noise threshold.
hash_account/data_size/10485760                                                                             
                        time:   [2.6608 ms 2.6614 ms 2.6622 ms]
                        thrpt:  [3.6682 GiB/s 3.6694 GiB/s 3.6703 GiB/s]
                 change:
                        time:   [-0.1625% -0.1292% -0.0874%] (p = 0.00 < 0.05)
                        thrpt:  [+0.0875% +0.1294% +0.1628%]
                        Change within noise threshold.

My takeaways:

  • Both the 165 and 200 byte benchmarks are better with this PR; about 5%. This makes sense to me, and that's what this PR is targeting.
  • The sizes above 200 bytes (1k, 1m, 10m) perform the same. This makes sense to me because both master and this PR will need to make a heap allocation for the larger data size.
  • The 0 byte benchmark is the same/within the noise (I did run this many times, and often the change was less than 0.1%). This also makes sense, as neither master nor this PR will need to make a heap allocation.

Footnotes

  1. https://github.com/solana-labs/solana-program-library/blob/e651623033fca7997ccd21e55d0f2388473122f9/token/program/src/state.rs#L134

  2. https://discord.com/channels/428295358100013066/977244255212937306/1212827836281524224

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (3f9a7a5) to head (a40f09c).

Additional details and impacted files
@@            Coverage Diff            @@
##           master      #56     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         837      837             
  Lines      225922   225922             
=========================================
- Hits       184955   184944     -11     
- Misses      40967    40978     +11     

@brooksprumo brooksprumo force-pushed the hash/stack-buffer-size branch from a40f09c to dbe3a71 Compare March 4, 2024 19:00
@brooksprumo brooksprumo force-pushed the hash/stack-buffer-size branch from dbe3a71 to b3be4e6 Compare March 5, 2024 12:44
@brooksprumo brooksprumo marked this pull request as ready for review March 5, 2024 13:39
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Lgtm. Thanks!

Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
It is counter-intuitive to me that copying the slice of data and then hashing it is faster than hashing it in place without the copy.

@brooksprumo
Copy link
Author

lgtm It is counter-intuitive to me that copying the slice of data and then hashing it is faster than hashing it in place without the copy.

Here's the original PR that added the buffer: solana-labs#33788. In it @HaoranYi notes that the hasher works better with a larger buffer to work on.

Since the current PR is an improvement over master, I'll merge it. I'll also benchmark not copying the data, and always doing a separate hash. Lastly, I'll benchmark changing the hashing order, so account data is last, so the meta data fields are all hashed at once, and the data is done second.

@brooksprumo brooksprumo merged commit 1e133bc into anza-xyz:master Mar 5, 2024
46 checks passed
@brooksprumo brooksprumo deleted the hash/stack-buffer-size branch March 5, 2024 17:02
@HaoranYi
Copy link

HaoranYi commented Mar 5, 2024

Internally, blake3 will copy the data into its local buffer and use SIMD to compute hash.
I think multiple updates with small pieces of data will not use SIMD effectively. By aggregating the data into a buffer, we can use SIMD more effectively. I think it is best to choose the buffer size to be multiple of SIMD width (multiple of 64 types).

@brooksprumo
Copy link
Author

brooksprumo commented Mar 5, 2024

Here's the benchmark running with the following scenarios. All were on a dev box that can run mnb.

  • [A] Runs 0-3: this PR
  • [B] Runs 4-8: not copying into a buffer ever
  • [C] Runs 9-15: reordering the hashing so that all meta fields are copied into the buffer and hashed, then the account data is hashing afterwards (not copying into the buffer)

Trends:

  • Hashing accounts with data size 0 had results with the biggest difference. Interestingly, [B] was the slowest. Maybe something about how hashing zero bytes on its own actually has non-zero cost?
  • Data sizes of 200 and 1K showed some improvement with [C]. I'm not sure if this is significant enough to pursue.
  • Data sizes of 1M and 10M were all about even. This implies to me that reading the data itself was the bulk of the time.

Results:

data size 0

history 0

data size 165 and 200

history 165
history 200

data size 1K, 1M, and 10M

history 1 K
history 1 M
history 10 M

patch for [B]

@@ -6122,27 +6122,20 @@ impl AccountsDb {
         // allocate a buffer on the stack that's big enough
         // to hold a token account or a stake account
         const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
-        const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+        const DATA_SIZE: usize = 0;
         const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
         let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
 
         // collect lamports, rent_epoch into buffer to hash
         buffer.extend_from_slice(&lamports.to_le_bytes());
         buffer.extend_from_slice(&rent_epoch.to_le_bytes());
+        hasher.update(&buffer);
 
-        if data.len() > DATA_SIZE {
-            // For larger accounts whose data can't fit into the buffer, update the hash now.
-            hasher.update(&buffer);
-            buffer.clear();
-
-            // hash account's data
-            hasher.update(data);
-        } else {
-            // For small accounts whose data can fit into the buffer, append it to the buffer.
-            buffer.extend_from_slice(data);
-        }
+        // hash account's data
+        hasher.update(data);
 
         // collect exec_flag, owner, pubkey into buffer to hash
+        buffer.clear();
         buffer.push(executable.into());
         buffer.extend_from_slice(owner.as_ref());
         buffer.extend_from_slice(pubkey.as_ref());

patch for [C]

@@ -6122,32 +6122,21 @@ impl AccountsDb {
         // allocate a buffer on the stack that's big enough
         // to hold a token account or a stake account
         const META_SIZE: usize = 8 /* lamports */ + 8 /* rent_epoch */ + 1 /* executable */ + 32 /* owner */ + 32 /* pubkey */;
-        const DATA_SIZE: usize = 200; // stake acounts are 200 B and token accounts are 165-182ish B
+        const DATA_SIZE: usize = 0;
         const BUFFER_SIZE: usize = META_SIZE + DATA_SIZE;
         let mut buffer = SmallVec::<[u8; BUFFER_SIZE]>::new();
 
         // collect lamports, rent_epoch into buffer to hash
         buffer.extend_from_slice(&lamports.to_le_bytes());
         buffer.extend_from_slice(&rent_epoch.to_le_bytes());
-
-        if data.len() > DATA_SIZE {
-            // For larger accounts whose data can't fit into the buffer, update the hash now.
-            hasher.update(&buffer);
-            buffer.clear();
-
-            // hash account's data
-            hasher.update(data);
-        } else {
-            // For small accounts whose data can fit into the buffer, append it to the buffer.
-            buffer.extend_from_slice(data);
-        }
-
-        // collect exec_flag, owner, pubkey into buffer to hash
         buffer.push(executable.into());
         buffer.extend_from_slice(owner.as_ref());
         buffer.extend_from_slice(pubkey.as_ref());
         hasher.update(&buffer);
 
+        // hash account's data
+        hasher.update(data);
+
         AccountHash(Hash::new_from_array(hasher.finalize().into()))
     }

codebender828 pushed a commit to codebender828/agave that referenced this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants