-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ipc reading code into methods on ArrayReader
#7006
Conversation
.map(|_| reader.next_buffer()) | ||
.collect::<Result<Vec<_>, _>>()?; | ||
create_primitive_array( | ||
impl ArrayReader<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This illustrates the point of this PR: remove the need to pass require_alignment
as an argument.
It does this by this code into a method on ArrayReader
which means require_alignment
is now passed as a field.
require_alignment
is not a problem, but skip_validation
is, for the reasons @tustvold states on #6938 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename ArrayReader to RecordBatchDecoder (there is a trait already called RecordBatchReader), and this isn't actually doing anything to do with io::Read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea -- I will do so in a follow on PR
field: &Field, | ||
variadic_counts: &mut VecDeque<i64>, | ||
) -> Result<ArrayRef, ArrowError> { | ||
let reader = self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aliased self --> reader to minimize the diff. I can remove this rename as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably also be good to move it to the rest of the impl at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few other methods that need to be moved into ArrayReader
(e.g. anything that has a require_alignment
flag
} | ||
|
||
impl<'a> ArrayReader<'a> { | ||
/// Create a reader for decoding arrays from an encoded [`RecordBatch`] | ||
fn try_new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is refactored code from read_record_batch_impl
metadata: &MetadataVersion, | ||
require_alignment: bool, | ||
) -> Result<RecordBatch, ArrowError> { | ||
let buffers = batch.buffers().ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code in this method was split across ArrayReader::try_new
and ArrayReader::read_record_batch
56e9e07
to
81d0146
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I did have a thought that this might make it harder to "safely" encapsulate the unsafe unchecked flag, but I think we could do something like
mod private {
/// A boolean flag that cannot be mutated outside of unsafe code
pub struct UnsafeFlag(bool);
impl UnsafeFlag {
#[inline]
pub unsafe fn set(&mut self, val: bool) {
self.0 = val;
}
#[inline]
pub fn get(&self) -> bool {
return self.0
}
}
}
.map(|_| reader.next_buffer()) | ||
.collect::<Result<Vec<_>, _>>()?; | ||
create_primitive_array( | ||
impl ArrayReader<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should rename ArrayReader to RecordBatchDecoder (there is a trait already called RecordBatchReader), and this isn't actually doing anything to do with io::Read
field: &Field, | ||
variadic_counts: &mut VecDeque<i64>, | ||
) -> Result<ArrayRef, ArrowError> { | ||
let reader = self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably also be good to move it to the rest of the impl at the same time
|
||
let options = RecordBatchOptions::new().with_row_count(Some(self.batch.length() as usize)); | ||
|
||
let schema = Arc::clone(&self.schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this will now always clone the schema even if it is projected, this is almost certainly irrelevant but something I noticed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point -- this Arc clone comes from the fact that self.create_array
below takes &mut self
so the code can't also have self.schema
borrowed immutably as well
As it happens once per RecordBatch
I agree with your assesment that this will be irrelevant in practice
This is a great idea; I tried it out in ArrayDataBuilder and I think it looks quite good 👌 |
Thanks again @tustvold |
Note this PR includes a bunch of whitespace changes that make it look like a much larger change than it is
Viewing the diff by ignoring whitespace makes the structure more clear I think
Which issue does this PR close?
Rationale for this change
@totoroyyb and I are working on adding another flag to the ipc reader code that allows disabling validation during read (see #6938). If the flag is true it will be
unsafe
.The current structure of the code as a bunch of free functions makes it impossible to pass the parameter down without having to mark all the inner functions unsafe, which is not correct.
What changes are included in this PR?
This PR refactors the code that currently takes an
ArrayReader
as a parameterand makes it a method on the function. This has the nice property that
we don't have to pass
require_alignment
as a flag anymore and it sets us up very nicelyto be able to add
disable_validation
in a follow on PRAre there any user-facing changes?
There are no intended changes in this PR -- all changes are to internal structures only.