file based append vecs #1394

jeffwashington · 2024-05-16T21:44:21Z

Problem

Working on getting rid of mmaps for append vecs.
mmaps are poorly managed by linux kernel. file i/o for cold accounts will work more efficiently

Summary of Changes

Add ability with cli arg to access storages using file i/o.

Fixes #

codecov-commenter · 2024-05-16T23:49:18Z

Codecov Report

Attention: Patch coverage is 88.04348% with 55 lines in your changes are missing coverage. Please review.

Project coverage is 82.7%. Comparing base (e01278e) to head (9919eee).
Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #1394     +/-   ##
=========================================
- Coverage    82.7%    82.7%   -0.1%     
=========================================
  Files         872      871      -1     
  Lines      369436   369655    +219     
=========================================
+ Hits       305779   305959    +180     
- Misses      63657    63696     +39

accounts-db/src/buffered_reader.rs

accounts-db/src/file_io.rs

accounts-db/src/buffered_reader.rs

alessandrod · 2024-06-12T13:00:30Z

accounts-db/src/file_io.rs

+    valid_len: usize,
+) -> std::io::Result<usize> {
+    let mut offset = start_offset;
+    let mut start_read = 0;


nit: i'd call this buffer_offset, or even better remove the variable altogether
and rebind buffer moving it forward with buffer = &mut buffer[read_this_time..];

renamed to buffer_offset

accounts-db/src/file_io.rs

alessandrod · 2024-06-12T13:25:38Z

accounts-db/src/file_io.rs

+    let mut start_read = 0;
+    let mut total_bytes_read = 0;
+    if start_offset >= valid_len {
+        return Ok(0);


Why not make this an error? In fact if you take a Range this can't happen at all

caller doesn't have to track valid len of file/know how many bytes are valid in the file. This is how we know to stop reading. There are more bytes in the file, but they aren't valid.

Yes I get the invalid padding stuff in files.

This function at the moment has two callers: get_stored_account_meta_callback and read_more_buffer.

In get_stored_account_meta_callback it's not clear to me that offset should ever be > self.len()? In which case I think it would be better to assert or handle the condition explicitly. Right now I think read_into_buffer will return Ok(0), then Self::get_type() will return None reading from the empty slice. I'd rather make the condition explicit.

read_more_buffer is called by BufferedReader::read. read is already asserting something, and it looks like like it could assert the offset > len condition too, or handle it.

I'm your guest in this area of the code, so ultimately up to you. But I personally prefer seeing the happy path clearly separated from the edge conditions, with the edge conditions isolated at the boundaries, so that the inner code can make stronger invariants and have less complexity.

I can appreciate all of that. I believe what we have is correct. I think what you're discussing is a refactoring we could do afterwards. There are only 2 callers as you said. This feature is 3 months out from use and is cli opt-in atm.

alessandrod · 2024-06-12T13:32:51Z

accounts-db/src/file_io.rs

+            }
+            Ok(bytes_read_this_time) => {
+                total_bytes_read += bytes_read_this_time;
+                if total_bytes_read + start_offset >= valid_len {


Perf wise here it would be better to clamp the buffer before starting the while
loop, rather than throwing away excess data after reading it.

please tell me how to do what you are asking.

something like this https://github.com/alessandrod/solana/blob/7a74d5e047433ded5455d5cdf83da274196ed961/accounts-db/src/file_io.rs#L38

something like thi

I mean clamping. I can imagine you'd like the range to be passed in instead of 2 separate fields. I'd prefer to do that after. Perhaps it adds clarity or simplifies. Easy enough to change in isolation without another 10 comments on this pr. ;-)

I am clamping in that commit?

buffer = &mut buffer[..file_offsets.len().min(buffer_len)];

This makes sure we never read more than requested, and I updated a test that was checking that we read too much :P

oh. I didn't know what you meant by clamping. I was thinking of something like 'pin' since you mentioned performance.

I propose what we have is correct and safe (and only enabled by cli arg). I imagine this function and api could be better. I'm awaiting perf results from you. I'd like to make changes to this fn in a different pr. Is that ok?

accounts-db/src/append_vec.rs

alessandrod · 2024-06-12T17:18:38Z

accounts-db/src/append_vec.rs

+                    } else {
+                        // not enough was read from file to get `data`
+                        assert!(data_len <= MAX_PERMITTED_DATA_LENGTH, "{data_len}");
+                        let mut data = vec![0u8; data_len as usize];


Why can't this code use the new buffered reader?

we aren't scanning the whole file ahead in this case. We don't want to read any more than we need to.
In reality, this code pre-dated the buffered reader. But I'm still not sure the buffered reader is the right one. We don't want to choose a buffer that is 10M so that we could hold the whole account + data if we only need 400 bytes (or 4k). so we still will want 2 buffers. This led me to think of this as 1-2 specifically sized reads.

accounts-db/src/append_vec.rs

HaoranYi

lgtm

accounts-db/src/append_vec.rs

brooksprumo

Let's get it into master and iterate on smaller pieces.

accounts-db/src/append_vec.rs

* file based storages * pr feedback * add comments * add tests for buffered reader and file reader * fix windows clippy * pr: update tests for jeff comments * pr feedback * fix a bug - read at least default min bytes * Revert "fix a bug - read at least default min bytes" This reverts commit 52ccd8e. * add test coverages flush, reset, reopen functions for append_vec.rs * one more dead code * renames * renames * renames * wee wah wee wah grammar police * rename * add debug assert * use const * const * reorder * reorders and renames --------- Co-authored-by: HaoranYi <[email protected]>

jeffwashington force-pushed the 4my10_full_thing_rebasing_bk3 branch 15 times, most recently from 47023ba to 713f80f Compare May 20, 2024 02:34

jeffwashington force-pushed the 4my10_full_thing_rebasing_bk3 branch 9 times, most recently from f42d035 to f443a78 Compare June 3, 2024 13:55

jeffwashington force-pushed the 4my10_full_thing_rebasing_bk3 branch 5 times, most recently from 83c0d5e to 7997037 Compare June 5, 2024 21:10

one more dead code

ddb9ae3

brooksprumo reviewed Jun 12, 2024

View reviewed changes

brooksprumo self-requested a review June 12, 2024 13:45

alessandrod reviewed Jun 12, 2024

View reviewed changes

jeffwashington added 8 commits June 12, 2024 10:28

renames

a99e825

renames

8b85d40

renames

8b231b9

wee wah wee wah grammar police

d5c369a

rename

be3e499

add debug assert

2a43adb

use const

dd79242

const

4cbb54f

jeffwashington requested a review from alessandrod June 12, 2024 15:47

jeffwashington added 2 commits June 12, 2024 11:01

reorder

8e56d98

reorders and renames

34b8378

alessandrod reviewed Jun 12, 2024

View reviewed changes

HaoranYi reviewed Jun 12, 2024

View reviewed changes

accounts-db/src/append_vec.rs Show resolved Hide resolved

jeffwashington requested review from alessandrod and HaoranYi June 12, 2024 18:53

HaoranYi approved these changes Jun 12, 2024

View reviewed changes

brooksprumo reviewed Jun 12, 2024

View reviewed changes

accounts-db/src/append_vec.rs Show resolved Hide resolved

jeffwashington requested a review from brooksprumo June 12, 2024 20:15

brooksprumo reviewed Jun 12, 2024

View reviewed changes

accounts-db/src/append_vec.rs Show resolved Hide resolved

accounts-db/src/append_vec.rs Show resolved Hide resolved

jeffwashington requested a review from brooksprumo June 12, 2024 20:20

brooksprumo approved these changes Jun 12, 2024

View reviewed changes

accounts-db/src/append_vec.rs Show resolved Hide resolved

jeffwashington merged commit 1bd9bd1 into anza-xyz:master Jun 12, 2024
40 checks passed

HaoranYi mentioned this pull request Jun 13, 2024

accounts-db: get rid of const in tests #1732

Merged

This was referenced Jul 10, 2024

Optimizes AppendVec::scan_pubkeys() when using file io #2077

Merged

Optimizes AppendVec::get_account_sizes() when using file io #2083

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file based append vecs #1394

file based append vecs #1394

jeffwashington commented May 16, 2024 •

edited

Loading

codecov-commenter commented May 16, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

jeffwashington Jun 12, 2024

alessandrod Jun 12, 2024

alessandrod Jun 12, 2024

jeffwashington Jun 12, 2024

HaoranYi left a comment

brooksprumo left a comment

file based append vecs #1394

file based append vecs #1394

Conversation

jeffwashington commented May 16, 2024 • edited Loading

Problem

Summary of Changes

codecov-commenter commented May 16, 2024

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HaoranYi left a comment

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

jeffwashington commented May 16, 2024 •

edited

Loading