-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Optimize union equality comparison #26648
Comments
hello @pitrou I would like to work on this |
@mcshawn10 Feel free to start working on it and submit a PR when your work is ready! |
### Rationale for this change #26648 proposes an optimization in checking sparse array equality by detecting contiguous runs, this PR implements that change ### What changes are included in this PR? previously, sparse array comparison was checked one by one, in this change, contiguous runs are detected and compared by checking equality of current and previous child_ids ### Are these changes tested? already covered by existing unit tests ### Are there any user-facing changes? no * GitHub Issue: #26648 Lead-authored-by: shawn <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Issue resolved by pull request 45384 |
Thank you for your contribution @mcshawn10 ! |
happy to! just so I am understanding right: per benchmarking, the throughput is for |
Yes, roughly. |
Currently, union array comparison in
ArrayRangeEqual
computes child equality over single union elements. This adds a large per-element comparison overhead. At least for sparse unions, it may be beneficial to detect contiguous runs of child ids and run child comparisons on entire runs.Reporter: Antoine Pitrou / @pitrou
Note: This issue was originally created as ARROW-10698. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: