-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dictionary array support for substring function #1665
Conversation
/// let error = substring(&array, 0, Some(5)).unwrap_err().to_string(); | ||
/// assert!(error.contains("invalid utf-8 boundary")); | ||
/// ``` | ||
pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this func to the beginning of the file, before all other non-public ones, for better readability.
DataType::Dictionary(kt, _) => { | ||
substring_dict!( | ||
kt, | ||
Int8: Int8Type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may make this shorter via concat_idents
(e.g., concat_idents($t, Type)
) but it's only available in nightly.
@@ -954,6 +992,56 @@ mod tests { | |||
without_nulls_generic_string::<i64>() | |||
} | |||
|
|||
#[test] | |||
fn dictionary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn dictionary() -> Result<()> { | |
fn test_substring_dictionary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not necessary to add test_
prefix for Rust tests since they are already under the tests
module. The substring
here also seem redundant since the full test name compute::kernels::substring::tests::dictionary
already contain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. A few minor comments.
Codecov Report
@@ Coverage Diff @@
## master #1665 +/- ##
==========================================
+ Coverage 83.10% 83.16% +0.05%
==========================================
Files 193 193
Lines 55864 56039 +175
==========================================
+ Hits 46425 46603 +178
+ Misses 9439 9436 -3
Continue to review full report at Codecov.
|
/// let error = substring(&array, 0, Some(5)).unwrap_err().to_string(); | ||
/// assert!(error.contains("invalid utf-8 boundary")); | ||
/// ``` | ||
pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit: Maybe we could let length
be Option<u32>
. Because the longest length will not exceed 1<<31 - 1
(for LargeBinaryArray
and LargeStringArray
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think this is not quite related to this PR. I can open another one for the change.
/// ``` | ||
/// | ||
/// # Error | ||
/// - The function errors when the passed array is not a \[Large\]String array or \[Large\]Binary array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also update that Dictionary
arrays with [large]string/[large]binary values are also accepted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Updated.
Thank you @sunchao ❤️ |
Merged, thanks @sunchao @HaoYang670 @alamb |
Which issue does this PR close?
Closes #1656.
Rationale for this change
Currently the
substring
kernel only support "plain" arrays but not dictionary encoded ones. With dictionary array, the compute could be much more efficient since it only needs to be done on the dictionary values.What changes are included in this PR?
This PR adds the support of dictionary array for
substring
kernel.Are there any user-facing changes?
No