-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Sort kernel for RunArray
#3695
Conversation
RunArray
RunArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to review this fully tomorrow morning
I just noticed a key issue in this code. So changing this to draft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, sorry for taking so long to review. Left some minor comments
Benchmark runs are scheduled for baseline = e753dea and contender = ebe6f53. ebe6f53 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #3520
Rationale for this change
See issue description
What changes are included in this PR?
Built on top to yet to be merged PR #3681
sort_run_to_indices
forRunArray
sort_run
forRunArray
sort_kernel
benchmarkSorting run array to indices is very slow if the intention is to get output run array. The sorted indices are logical indices which has to be encoded back to run array. The function
sort_run
will rearrange runs based on sorted values and hence will be faster to get output run array.How much faster is
sort_run
compared tosort_run_to_indices
?Below benchmark result shows
sort_run
produces the output run array using same time taken bysort_run_to_indices
to produce indices.What's the catch?
The
sort_run
will only rearrange the runs and not re-encode them for efficiency. For e.g. an inputRunArray { run_ends = [2,4,6,8], values = [1,2,1,2] }
will result in outputRunArray { run_ends = [2,4,6,8], values = [1,1,2,2] }
and notRunArray { run_ends = [4,8], values = [1,2] }
. The output ofsort_run_to_indices
can be used to re-encode theRunArray
.Are there any user-facing changes?
Users will get a new sort function for
RunArray