-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file based append vecs #1394
file based append vecs #1394
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1394 +/- ##
=========================================
- Coverage 82.7% 82.7% -0.1%
=========================================
Files 872 871 -1
Lines 369436 369655 +219
=========================================
+ Hits 305779 305959 +180
- Misses 63657 63696 +39 |
47023ba
to
713f80f
Compare
f42d035
to
f443a78
Compare
83c0d5e
to
7997037
Compare
accounts-db/src/file_io.rs
Outdated
valid_len: usize, | ||
) -> std::io::Result<usize> { | ||
let mut offset = start_offset; | ||
let mut start_read = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i'd call this buffer_offset, or even better remove the variable altogether
and rebind buffer moving it forward with buffer = &mut buffer[read_this_time..];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to buffer_offset
let mut start_read = 0; | ||
let mut total_bytes_read = 0; | ||
if start_offset >= valid_len { | ||
return Ok(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this an error? In fact if you take a Range this can't happen at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caller doesn't have to track valid len of file/know how many bytes are valid in the file. This is how we know to stop reading. There are more bytes in the file, but they aren't valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I get the invalid padding stuff in files.
This function at the moment has two callers: get_stored_account_meta_callback and read_more_buffer.
In get_stored_account_meta_callback it's not clear to me that offset should ever be > self.len()? In which case I think it would be better to assert or handle the condition explicitly. Right now I think read_into_buffer will return Ok(0), then Self::get_type() will return None reading from the empty slice. I'd rather make the condition explicit.
read_more_buffer is called by BufferedReader::read. read is already asserting something, and it looks like like it could assert the offset > len condition too, or handle it.
I'm your guest in this area of the code, so ultimately up to you. But I personally prefer seeing the happy path clearly separated from the edge conditions, with the edge conditions isolated at the boundaries, so that the inner code can make stronger invariants and have less complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can appreciate all of that. I believe what we have is correct. I think what you're discussing is a refactoring we could do afterwards. There are only 2 callers as you said. This feature is 3 months out from use and is cli opt-in atm.
accounts-db/src/file_io.rs
Outdated
} | ||
Ok(bytes_read_this_time) => { | ||
total_bytes_read += bytes_read_this_time; | ||
if total_bytes_read + start_offset >= valid_len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf wise here it would be better to clamp the buffer before starting the while
loop, rather than throwing away excess data after reading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please tell me how to do what you are asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like thi
I mean clamping. I can imagine you'd like the range to be passed in instead of 2 separate fields. I'd prefer to do that after. Perhaps it adds clarity or simplifies. Easy enough to change in isolation without another 10 comments on this pr. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am clamping in that commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer = &mut buffer[..file_offsets.len().min(buffer_len)];
This makes sure we never read more than requested, and I updated a test that was checking that we read too much :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I didn't know what you meant by clamping. I was thinking of something like 'pin' since you mentioned performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose what we have is correct and safe (and only enabled by cli arg). I imagine this function and api could be better. I'm awaiting perf results from you. I'd like to make changes to this fn in a different pr. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
} else { | ||
// not enough was read from file to get `data` | ||
assert!(data_len <= MAX_PERMITTED_DATA_LENGTH, "{data_len}"); | ||
let mut data = vec![0u8; data_len as usize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this code use the new buffered reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we aren't scanning the whole file ahead in this case. We don't want to read any more than we need to.
In reality, this code pre-dated the buffered reader. But I'm still not sure the buffered reader is the right one. We don't want to choose a buffer that is 10M so that we could hold the whole account + data if we only need 400 bytes (or 4k). so we still will want 2 buffers. This led me to think of this as 1-2 specifically sized reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get it into master and iterate on smaller pieces.
* file based storages * pr feedback * add comments * add tests for buffered reader and file reader * fix windows clippy * pr: update tests for jeff comments * pr feedback * fix a bug - read at least default min bytes * Revert "fix a bug - read at least default min bytes" This reverts commit 52ccd8e. * add test coverages flush, reset, reopen functions for append_vec.rs * one more dead code * renames * renames * renames * wee wah wee wah grammar police * rename * add debug assert * use const * const * reorder * reorders and renames --------- Co-authored-by: HaoranYi <[email protected]>
Problem
Working on getting rid of mmaps for append vecs.
mmaps are poorly managed by linux kernel. file i/o for cold accounts will work more efficiently
Summary of Changes
Add ability with cli arg to access storages using file i/o.
Fixes #