-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve record batch partitioning #1396
Conversation
@@ -410,6 +403,18 @@ pub(crate) fn divide_by_partition_values( | |||
Ok(partitions) | |||
} | |||
|
|||
fn lexsort_to_indices(arrays: &[ArrayRef]) -> UInt32Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following why this function is being used here instead of the previous import from arrow::compute
as far as I can tell the only difference is this version doesn't take the optional limit
parameter?
😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version uses arrow-row rather then the "regular" compute kernels. https://github.com/apache/arrow-rs/blob/77aa8f5b2645a91724048f5c1d644c6b52880028/arrow-ord/src/sort.rs#L1081-L1082
the implementation is actually lifted from a comment in the docs :) https://github.com/apache/arrow-rs/blob/77aa8f5b2645a91724048f5c1d644c6b52880028/arrow-row/src/lib.rs#L105-L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -410,6 +403,18 @@ pub(crate) fn divide_by_partition_values( | |||
Ok(partitions) | |||
} | |||
|
|||
fn lexsort_to_indices(arrays: &[ArrayRef]) -> UInt32Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn lexsort_to_indices(arrays: &[ArrayRef]) -> UInt32Array { | |
/* | |
* This version uses arrow-row rather then the "regular" compute kernels. | |
*/ | |
fn lexsort_to_indices(arrays: &[ArrayRef]) -> UInt32Array { |
# Description Some time ago, the arrow-row crate was published, specifically with use cases like multi-column ordering / sorting in mind. We need to do this whenever we write data to a table to partition batches according to their partition values. In this PR we adopt this. As a drive-by I also updated the imports in the module to import from the respective sub-crates. # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation --->
Description
Some time ago, the arrow-row crate was published, specifically with use cases like multi-column ordering / sorting in mind. We need to do this whenever we write data to a table to partition batches according to their partition values. In this PR we adopt this. As a drive-by I also updated the imports in the module to import from the respective sub-crates.
Related Issue(s)
Documentation