-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add freeze_with_dictionary
API to MutableArrayData
#2915
Conversation
cc @sunchao |
/// | ||
/// # Safety | ||
/// | ||
/// As this doesn't validate the provided dictionary `ArrayData` values, the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure this will work as expected, MutableArrayData has some pretty hard assumptions when it comes to dictionaries - it concatenates the dictionary arrays, and computes new keys based on this assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the values of dictionaries. What MutableArrayData does is simply concatenates the value arrays with another MutableArrayData. The caller must ensure it provides correct concatenated dictionary value array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that is correct - https://github.com/apache/arrow-rs/blob/master/arrow-data/src/transform/mod.rs#L499
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm? That is also where I looked at. For dictionaries, it uses another MutableArrayData
to extend from their value arrays. Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried it. I can use Just feeling that this way it is somehow not efficient as it needs to freeze and make an array, then takes the value array and copy and attach it back. For multiple dictionaries, one more issue is there as internally |
I'm closing this as |
I think I am missing something. I don't think there is a way to avoid copying and attaching back - this is what MutableArrayData does also? I did create #1981 a while back to track allowing mutation of arrays, but currently there isn't any support for this?
This is the step I don't understand, what is this operation you are implementing here? Why are you using MutableArrayData for this at all? |
Oh, no, copying is unavoidable. I meant that I need to create an array first and take value value from it. I feel that it is not smooth in usage as letting MutableArrayData to handle it as the proposed API. Just my preference.
We reuse array allocation generally. But for certain operations, we need to hold on and wait for all arrays before we can do the operation. So we need to copy arrays there otherwise they will be overwritten. But while MutableArrayData extends DictionaryArray, it doesn't copy the value array (if only one DictionaryArray there). |
Which issue does this PR close?
Closes #2914.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?