Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic bytes dictionary builder #3426

Merged
merged 5 commits into from
Jan 3, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions arrow-array/src/builder/generic_bytes_dictionary_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,7 @@ where
pub fn new_with_dictionary(
keys_capacity: usize,
dictionary_values: &GenericByteArray<T>,
) -> Result<Self, ArrowError>
where
<T as ByteArrayType>::Native: AsRef<<T as ByteArrayType>::Native> + AsRef<[u8]>,
{
) -> Result<Self, ArrowError> {
let state = ahash::RandomState::default();
let dict_len = dictionary_values.len();

Expand Down Expand Up @@ -352,6 +349,12 @@ fn get_bytes<'a, K: ArrowNativeType, T: ByteArrayType>(
pub type StringDictionaryBuilder<K> =
GenericByteDictionaryBuilder<K, GenericStringType<i32>>;

/// Array builder for `DictionaryArray` that stores large Strings. For example to map a set of byte indices
/// to String values. Note that the use of a `HashMap` here will not scale to very large
/// arrays or result in an ordered dictionary.
pub type LargeStringDictionaryBuilder<K> =
GenericByteDictionaryBuilder<K, GenericStringType<i64>>;

/// Array builder for `DictionaryArray` that stores binary. For example to map a set of byte indices
/// to binary values. Note that the use of a `HashMap` here will not scale to very large
/// arrays or result in an ordered dictionary.
Expand Down Expand Up @@ -390,6 +393,12 @@ pub type StringDictionaryBuilder<K> =
pub type BinaryDictionaryBuilder<K> =
GenericByteDictionaryBuilder<K, GenericBinaryType<i32>>;

/// Array builder for `DictionaryArray` that stores large binary. For example to map a set of byte indices
/// to binary values. Note that the use of a `HashMap` here will not scale to very large
/// arrays or result in an ordered dictionary.
pub type LargeBinaryDictionaryBuilder<K> =
GenericByteDictionaryBuilder<K, GenericBinaryType<i64>>;

#[cfg(test)]
mod tests {
use super::*;
Expand Down
2 changes: 1 addition & 1 deletion arrow-array/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -713,7 +713,7 @@ pub trait ByteArrayType: 'static + Send + Sync + bytes::ByteArrayTypeSealed {
/// Type for representing its equivalent rust type i.e
/// Utf8Array will have native type has &str
/// BinaryArray will have type as [u8]
type Native: bytes::ByteArrayNativeType + AsRef<[u8]> + ?Sized;
type Native: bytes::ByteArrayNativeType + AsRef<Self::Native> + AsRef<[u8]> + ?Sized;
/// "Binary" or "String", for use in error messages
const PREFIX: &'static str;
/// Datatype of array elements
Expand Down
11 changes: 7 additions & 4 deletions arrow-select/src/take.rs
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,10 @@ where

let s = array.value(index);

length_so_far += T::Offset::from_usize(s.as_ref().len()).unwrap();
length_so_far += T::Offset::from_usize(
<<T as ByteArrayType>::Native as AsRef<[u8]>>::as_ref(s).len(),
)
.unwrap();
values.extend_from_slice(s.as_ref());
viirya marked this conversation as resolved.
Show resolved Hide resolved
*offset = length_so_far;
}
Expand All @@ -609,7 +612,7 @@ where
})?;

if array.is_valid(index) {
let s = array.value(index).as_ref();
let s: &[u8] = array.value(index).as_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this technically mean this is an API change? That's kind of unintentional, but not necessarily a problem 😅


length_so_far += T::Offset::from_usize(s.len()).unwrap();
values.extend_from_slice(s.as_ref());
Expand All @@ -627,7 +630,7 @@ where
ArrowError::ComputeError("Cast to usize failed".to_string())
})?;

let s = array.value(index).as_ref();
let s: &[u8] = array.value(index).as_ref();

length_so_far += T::Offset::from_usize(s.len()).unwrap();
values.extend_from_slice(s);
Expand All @@ -647,7 +650,7 @@ where
})?;

if array.is_valid(index) && indices.is_valid(i) {
let s = array.value(index).as_ref();
let s: &[u8] = array.value(index).as_ref();

length_so_far += T::Offset::from_usize(s.len()).unwrap();
values.extend_from_slice(s);
Expand Down