-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate dictionary key in TypedDictionaryArray (#2578) #2589
Conversation
arrow/src/array/array_dictionary.rs
Outdated
|
||
// As dictionary keys are only verified for non-null indexes | ||
// we must check the value is within bounds | ||
match value_idx < self.dictionary.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out by @crepererum the fact the value is in a dictionary in the first place likely indicates that operations on it are likely to be non-trival (i.e. it is likely a string). Therefore the cost of this bound check is likely irrelevant, especially since because of #1918 we are doing checked conversion to usize anyway.
@@ -491,21 +490,30 @@ where | |||
K: ArrowPrimitiveType, | |||
V: Sync + Send, | |||
&'a V: ArrayAccessor, | |||
<&'a V as ArrayAccessor>::Item: Default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we don't currently support dictionaries containing complex types such as ListArray, etc... (neither does C++) nor even implement ArrayAccessor for those types, this restriction is unlikely to matter in practice
3a697de
to
edfb526
Compare
// we must check the value is within bounds | ||
match value_idx < self.values.len() { | ||
true => self.values.value_unchecked(value_idx), | ||
false => Default::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. This is less or zero performance impact.
Benchmark runs are scheduled for baseline = c64ca4f and contender = c692b25. c692b25 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2578
Closes #2564
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?