-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concat kernel capacity estimation #3546
Conversation
buf_len - offset.to_usize().unwrap() | ||
}) | ||
.sum() | ||
fn binary_capacity<T: ByteArrayType>(arrays: &[&dyn Array]) -> Capacities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty chuffed with the GenericByteArray abstraction 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- bonus points for comments explaining the expected values but very nice 👌 overall
.unwrap(); | ||
|
||
let offsets = a.value_offsets(); | ||
bytes_capacity += offsets[offsets.len() - 1].as_usize() - offsets[0].as_usize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is pretty clever.
Do you need to check that offsets.len()
> 0 (aka that this is a non zero length array)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because this is guaranteed by the GenericByteArray
let data = a.data(); | ||
assert_eq!(data.buffers()[0].len(), 420); | ||
assert_eq!(data.buffers()[0].capacity(), 448); // Nearest multiple of 64 | ||
assert_eq!(data.buffers()[1].len(), 315); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
315 = len("foo") + len("bingo") + len("bongo") + len("lorem") + len("") ?
|
||
let a = concat(&[&a, &b]).unwrap(); | ||
let data = a.data(); | ||
assert_eq!(data.buffers()[0].len(), 420); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 420 because it is 100 offsets + 4 --> 104, and then each takes 4 bytes --> 416 --> round up to multiple if 64?
5f7e71f
to
0a100e2
Compare
Benchmark runs are scheduled for baseline = 96831de and contender = 56dfad0. 56dfad0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #.
Rationale for this change
This improves the estimation of the necessary capacity for Binary and LargeBinary, whilst also improving the capacity estimation for ByteArray slices in general
What changes are included in this PR?
Are there any user-facing changes?