-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards #75: Further refactor for using GATs + clippy fixes for rust 1.65 #76
Changes from 2 commits
8d5f8dd
783fe65
6192b8c
e021b8b
eec346b
00a66a6
ff6ba00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,36 +7,36 @@ use crate::field::*; | |
|
||
#[doc(hidden)] | ||
/// Type whose reference can be used to create an iterator. | ||
pub trait IterRef { | ||
pub trait RefIntoIterator: Sized { | ||
/// Iterator type. | ||
type Iter<'a>: Iterator | ||
type Iterator<'a>: Iterator | ||
where | ||
Self: 'a; | ||
|
||
/// Converts `&self` into an iterator. | ||
fn iter_ref(&self) -> Self::Iter<'_>; | ||
fn ref_into_iter(&self) -> Self::Iterator<'_>; | ||
} | ||
|
||
impl<T> IterRef for T | ||
impl<T> RefIntoIterator for T | ||
where | ||
for<'a> &'a T: IntoIterator, | ||
{ | ||
type Iter<'a> = <&'a T as IntoIterator>::IntoIter where Self: 'a; | ||
type Iterator<'a> = <&'a T as IntoIterator>::IntoIter<> where Self: 'a; | ||
|
||
#[inline] | ||
fn iter_ref(&self) -> Self::Iter<'_> { | ||
fn ref_into_iter(&self) -> Self::Iterator<'_> { | ||
self.into_iter() | ||
} | ||
} | ||
|
||
/// Implemented by [`ArrowField`] that can be deserialized from arrow | ||
pub trait ArrowDeserialize: ArrowField + Sized { | ||
/// The `arrow2::Array` type corresponding to this field | ||
type ArrayType: ArrowArray; | ||
type ArrayType: RefIntoIterator; | ||
|
||
/// Deserialize this field from arrow | ||
fn arrow_deserialize( | ||
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item, | ||
v: <<Self::ArrayType as RefIntoIterator>::Iterator<'_> as Iterator>::Item, | ||
) -> Option<<Self as ArrowField>::Type>; | ||
|
||
#[inline] | ||
|
@@ -48,23 +48,36 @@ pub trait ArrowDeserialize: ArrowField + Sized { | |
/// something like for<'a> &'a T::ArrayType: IntoIterator<Item=Option<E>>, | ||
/// However, the E parameter seems to confuse the borrow checker if it's a reference. | ||
fn arrow_deserialize_internal( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aldanor Cleaning this up will need a bit more thought. Elements in arrow2 arrays are optional, since they can be nullable. Therefore when converting from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Option/internal stuff aside, being able to take There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback @aldanor. I started this discussion in arrow2 (jorgecarleitao/arrow2#1273). Lets see where that lands. I don't think this should hold up wrapping up GATs and your generics change. We can make another pass after those changes before v0.4.0 release. |
||
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item, | ||
v: <<Self::ArrayType as RefIntoIterator>::Iterator<'_> as Iterator>::Item, | ||
) -> <Self as ArrowField>::Type { | ||
Self::arrow_deserialize(v).unwrap() | ||
} | ||
} | ||
|
||
/// Internal trait used to support deserialization and iteration of structs, and nested struct lists | ||
/// | ||
/// Trivial pass-thru implementations are provided for arrow2 arrays that auto-implement IterRef. | ||
/// | ||
/// The derive macro generates implementations for typed struct arrays. | ||
#[doc(hidden)] | ||
pub trait ArrowArray: IterRef { | ||
type BaseArrayType: Array; | ||
|
||
// Returns a typed iterator to the underlying elements of the array from an untyped Array reference. | ||
fn iter_from_array_ref(b: &dyn Array) -> <Self as IterRef>::Iter<'_>; | ||
#[inline] | ||
#[doc(hidden)] | ||
/// For internal use only | ||
/// | ||
/// TODO: this can be removed up by using arrow2::array::StructArray and | ||
/// arrow2::array::UnionArray to perform the iteration for unions and structs | ||
/// which should be possible if structs and unions are deserialized via scalars. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ncpenke Could you please expand on this? E.g. what needs to be done and what exactly does "deserialized via scalars" mean/imply? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aldanor The However, they're also used to represent a single |
||
/// | ||
/// Helper to return an iterator for elements from a [`arrow2::array::Array`]. | ||
/// | ||
/// Overridden by struct and enum arrays generated by the derive macro, to | ||
/// downcast to the arrow2 array type. | ||
fn arrow_array_ref_into_iter( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Merging There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing it seemed simpler, but I think I'll have to bring it back in a different way to cleanly iterate through structs and enums. |
||
array: &dyn Array, | ||
) -> Option<<Self::ArrayType as RefIntoIterator>::Iterator<'_>> | ||
where | ||
Self::ArrayType: 'static, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this have to be 'static? Or just within the lifetime of &dyn? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried with a named lifetime. I'm not fully sure, but I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ohh, right, because Might even add a comment to avoid future confusion if you'd like:
|
||
{ | ||
Some( | ||
array | ||
.as_any() | ||
.downcast_ref::<Self::ArrayType>()? | ||
.ref_into_iter(), | ||
) | ||
} | ||
} | ||
|
||
// Macro to facilitate implementation for numeric types and numeric arrays. | ||
|
@@ -78,23 +91,6 @@ macro_rules! impl_arrow_deserialize_primitive { | |
v.map(|t| *t) | ||
} | ||
} | ||
|
||
impl_arrow_array!(PrimitiveArray<$physical_type>); | ||
}; | ||
} | ||
|
||
macro_rules! impl_arrow_array { | ||
($array:ty) => { | ||
impl ArrowArray for $array { | ||
type BaseArrayType = Self; | ||
|
||
fn iter_from_array_ref(b: &dyn Array) -> <Self as IterRef>::Iter<'_> { | ||
b.as_any() | ||
.downcast_ref::<Self::BaseArrayType>() | ||
.unwrap() | ||
.iter_ref() | ||
} | ||
} | ||
}; | ||
} | ||
|
||
|
@@ -107,17 +103,26 @@ where | |
|
||
#[inline] | ||
fn arrow_deserialize( | ||
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item, | ||
v: <<Self::ArrayType as RefIntoIterator>::Iterator<'_> as Iterator>::Item, | ||
) -> Option<<Self as ArrowField>::Type> { | ||
Self::arrow_deserialize_internal(v).map(Some) | ||
} | ||
|
||
#[inline] | ||
fn arrow_deserialize_internal( | ||
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item, | ||
v: <<Self::ArrayType as RefIntoIterator>::Iterator<'_> as Iterator>::Item, | ||
) -> <Self as ArrowField>::Type { | ||
<T as ArrowDeserialize>::arrow_deserialize(v) | ||
} | ||
|
||
fn arrow_array_ref_into_iter( | ||
array: &dyn Array, | ||
) -> Option<<Self::ArrayType as RefIntoIterator>::Iterator<'_>> | ||
where | ||
Self::ArrayType: 'static, | ||
{ | ||
<T as ArrowDeserialize>::arrow_array_ref_into_iter(array) | ||
} | ||
} | ||
|
||
impl_arrow_deserialize_primitive!(u8); | ||
|
@@ -140,8 +145,6 @@ impl<const PRECISION: usize, const SCALE: usize> ArrowDeserialize for I128<PRECI | |
} | ||
} | ||
|
||
impl_arrow_array!(PrimitiveArray<i128>); | ||
|
||
impl ArrowDeserialize for String { | ||
type ArrayType = Utf8Array<i32>; | ||
|
||
|
@@ -221,10 +224,10 @@ where | |
T: ArrowDeserialize + ArrowEnableVecForType + 'static, | ||
{ | ||
use std::ops::Deref; | ||
v.map(|t| { | ||
arrow_array_deserialize_iterator_internal::<<T as ArrowField>::Type, T>(t.deref()) | ||
.collect::<Vec<<T as ArrowField>::Type>>() | ||
}) | ||
Some( | ||
arrow_array_deserialize_iterator_internal::<<T as ArrowField>::Type, T>(v?.deref())? | ||
.collect::<Vec<<T as ArrowField>::Type>>(), | ||
) | ||
} | ||
|
||
// Blanket implementation for Vec | ||
|
@@ -261,16 +264,6 @@ where | |
} | ||
} | ||
|
||
impl_arrow_array!(BooleanArray); | ||
impl_arrow_array!(Utf8Array<i32>); | ||
impl_arrow_array!(Utf8Array<i64>); | ||
impl_arrow_array!(BinaryArray<i32>); | ||
impl_arrow_array!(BinaryArray<i64>); | ||
impl_arrow_array!(FixedSizeBinaryArray); | ||
impl_arrow_array!(ListArray<i32>); | ||
impl_arrow_array!(ListArray<i64>); | ||
impl_arrow_array!(FixedSizeListArray); | ||
|
||
/// Top-level API to deserialize from Arrow | ||
pub trait TryIntoCollection<Collection, Element> | ||
where | ||
|
@@ -288,40 +281,42 @@ where | |
} | ||
|
||
/// Helper to return an iterator for elements from a [`arrow2::array::Array`]. | ||
fn arrow_array_deserialize_iterator_internal<'a, Element, Field>( | ||
b: &'a dyn Array, | ||
) -> impl Iterator<Item = Element> + 'a | ||
fn arrow_array_deserialize_iterator_internal<Element, Field>( | ||
b: &dyn Array, | ||
) -> Option<impl Iterator<Item = Element> + '_> | ||
where | ||
Field: ArrowDeserialize + ArrowField<Type = Element> + 'static, | ||
{ | ||
<<Field as ArrowDeserialize>::ArrayType as ArrowArray>::iter_from_array_ref(b) | ||
.map(<Field as ArrowDeserialize>::arrow_deserialize_internal) | ||
Some( | ||
<Field as ArrowDeserialize>::arrow_array_ref_into_iter(b)? | ||
.map(<Field as ArrowDeserialize>::arrow_deserialize_internal), | ||
) | ||
} | ||
|
||
/// Returns a typed iterator to a target type from an `arrow2::Array` | ||
pub fn arrow_array_deserialize_iterator_as_type<'a, Element, ArrowType>( | ||
arr: &'a dyn Array, | ||
) -> arrow2::error::Result<impl Iterator<Item = Element> + 'a> | ||
pub fn arrow_array_deserialize_iterator_as_type<Element, ArrowType>( | ||
arr: &dyn Array, | ||
) -> arrow2::error::Result<impl Iterator<Item = Element> + '_> | ||
where | ||
Element: 'static, | ||
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static, | ||
{ | ||
if &<ArrowType as ArrowField>::data_type() != arr.data_type() { | ||
// TODO: use arrow2_convert error type here and include more detail | ||
Err(arrow2::error::Error::InvalidArgumentError( | ||
"Data type mismatch".to_string(), | ||
)) | ||
} else { | ||
Ok(arrow_array_deserialize_iterator_internal::< | ||
Element, | ||
ArrowType, | ||
>(arr)) | ||
arrow_array_deserialize_iterator_internal::<Element, ArrowType>(arr).ok_or_else(|| | ||
// TODO: use arrow2_convert error type here and include more detail | ||
arrow2::error::Error::InvalidArgumentError("Schema mismatch".to_string())) | ||
} | ||
} | ||
|
||
/// Return an iterator that deserializes an [`Array`] to an element of type T | ||
pub fn arrow_array_deserialize_iterator<'a, T>( | ||
arr: &'a dyn Array, | ||
) -> arrow2::error::Result<impl Iterator<Item = T> + 'a> | ||
pub fn arrow_array_deserialize_iterator<T>( | ||
arr: &dyn Array, | ||
) -> arrow2::error::Result<impl Iterator<Item = T> + '_> | ||
where | ||
T: ArrowDeserialize + ArrowField<Type = T> + 'static, | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this was
Iter
is because this is the common naming e.g. in the standard library (this way you can infer what kind of a symbol it is without going to hunt for its definition):std::option::Iter
,std::result::Iter
, etc, always a*Iter
if it's a type*Iterator
if it's a trait, likeDoubleEndedIterator
Definitely +1 on
RefIntoIterator
, but I thinkIter<'_>
is more consistent with the commonly used naming scheme?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. I'll rework this PR based on your feedback, and either include this change in this PR or next one.