Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support from_slice for binary, string, and boolean array types #1589

Merged
merged 2 commits into from
Jan 17, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 5 additions & 26 deletions datafusion/src/from_slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,7 @@ where
S: AsRef<[T::Native]>,
{
fn from_slice(slice: S) -> Self {
let slice = slice.as_ref();
let array_data = ArrayData::builder(T::DATA_TYPE)
.len(slice.len())
.add_buffer(Buffer::from_slice_ref(&slice));
let array_data = unsafe { array_data.build_unchecked() };
Self::from(array_data)
Self::from_iter_values(slice.as_ref().iter().cloned())
}
}

Expand All @@ -59,6 +54,9 @@ where
S: AsRef<[I]>,
I: AsRef<[u8]>,
{
/// convert a slice of byte slices into a binary array (without nulls)
///
/// implementation details: here the Self::from_vec can be called but not without another copy
fn from_slice(slice: S) -> Self {
let slice = slice.as_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could call from_iter_values rather than replicate the code in DataFuson?

I haven't tried but perhaps something like

fn from _slice(slice: S) -> Self {
  GenericBinaryArray::from_iter_values(slice.iter())

?

https://docs.rs/arrow/7.0.0/arrow/array/struct.GenericStringArray.html#method.from_iter_values

Copy link
Member Author

@jimexist jimexist Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this change is necessary:

let mut offsets = Vec::with_capacity(slice.len() + 1);
Expand Down Expand Up @@ -88,26 +86,7 @@ where
I: AsRef<str>,
{
fn from_slice(slice: S) -> Self {
jimexist marked this conversation as resolved.
Show resolved Hide resolved
let slice = slice.as_ref();
let mut offsets =
MutableBuffer::new((slice.len() + 1) * std::mem::size_of::<OffsetSize>());
let mut values = MutableBuffer::new(0);

let mut length_so_far = OffsetSize::zero();
offsets.push(length_so_far);

for s in slice {
let s = s.as_ref();
length_so_far += OffsetSize::from_usize(s.len()).unwrap();
offsets.push(length_so_far);
values.extend_from_slice(s.as_bytes());
}
let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
.len(slice.len())
.add_buffer(offsets.into())
.add_buffer(values.into());
let array_data = unsafe { array_data.build_unchecked() };
Self::from(array_data)
Self::from_iter_values(slice.as_ref().iter())
}
}

Expand Down