-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, rust): arrow large/view types passthrough, rust default engine #2738
feat(python, rust): arrow large/view types passthrough, rust default engine #2738
Conversation
@rtyler still a couple test failures, so will take another round tomorrow on this! |
@aersam hey! Could you perhaps take a look on the refactored casting logic :) I reintroduce some old code since we now have to allow large/view types passthrough |
98cef9b
to
e04661d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this I feel our conversion code is getting a bit excessive and we have to eventually step back and see if we can find a cleaner solution to all of this - I do not however have a better idea ready :).
My main question would be what now our minimal python / pyarrow support would look like as we are not testing this anymore? Should there not be some test for this?
Yup.. same goes for the writer. Also now with this passthrough change we allow more flexibility between utf8/binary/list flavours, but we are now less flexible on writing as a side effect. So a batch that is int64 to a table with int32 for example. Before this "might" have worked, however now it won't because we can't merge those schema's Not sure what to do here, we can add this functionality in the schema merge to simply check if something is a supertype but it doesn't sit right |
Hmm I'll put this back in draft and think of a redesign. |
0df9df4
to
9db4e16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only partially reviewed this but recognize the work @ion-elgreco has put in. This also touches a lot of other stuff we're currently looking at, so I would rather iterate on it in main
to avoid monster conflicts should more tuning be needed
9db4e16
to
002fbe4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good - left some very minor questions, thus did not want to hit approve, as that would auto-merge stuff, and wanted to leave the option to comment on these ...
let input_schema = match schema { | ||
Some(schema) => schema, | ||
None => snapshot.input_schema()?, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is interesting, since there seem to be some issues with the wrapped schema also in the pruning logic, which I have yet to fully grok. Why did we need to change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take the arrow_schema it was always wrapping partition values in dictionary, somewhere down the line where it was going to do divide by partition values, it would fail because a slice to get a value from a DictionaryArray will always return None
pub(crate) fn merge_delta_type( | ||
left: &DeltaDataType, | ||
right: &DeltaDataType, | ||
) -> Result<DeltaDataType, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may come in hayndy when we do type widening as well :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
NORMAL = "NORMAL" | ||
LARGE = "LARGE" | ||
PASSTHROUGH = "PASSTHROUGH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add some comments on what the behavior of these modes is?
Description
Allows large/view types to be passed through during write, and prevents unnecessary potentially costly casting that could fail.
In Pyarrow engine only normal/large modes can be used, in Rust engine we always passthrough since we allow passthrough throughout the codebase now, this notion can be removed once pyarrow engine is fully deprecated.
Can be merged after: #2727
Related issues
StringView
andBinaryView
in CDataInterface apache/arrow-rs#6171