Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Optimize union equality comparison #26648

Closed
asfimport opened this issue Nov 23, 2020 · 7 comments
Closed

[C++] Optimize union equality comparison #26648

asfimport opened this issue Nov 23, 2020 · 7 comments

Comments

@asfimport
Copy link
Collaborator

Currently, union array comparison in ArrayRangeEqual computes child equality over single union elements. This adds a large per-element comparison overhead. At least for sparse unions, it may be beneficial to detect contiguous runs of child ids and run child comparisons on entire runs.

Reporter: Antoine Pitrou / @pitrou

Note: This issue was originally created as ARROW-10698. Please see the migration documentation for further details.

@mcshawn10
Copy link
Contributor

hello @pitrou I would like to work on this

@pitrou
Copy link
Member

pitrou commented Jan 28, 2025

@mcshawn10 Feel free to start working on it and submit a PR when your work is ready!

@mcshawn10
Copy link
Contributor

hi @pitrou , I have submitted my PR - could you take a look at it? 🙏🏾 thanks!! #45384

pitrou added a commit that referenced this issue Feb 4, 2025
### Rationale for this change

#26648 proposes an optimization in checking sparse array equality by detecting contiguous runs, this PR implements that change

### What changes are included in this PR?

previously, sparse array comparison was checked one by one, in this change, contiguous runs are detected and compared by checking equality of current and previous child_ids

### Are these changes tested?

already covered by existing unit tests

### Are there any user-facing changes?

no

* GitHub Issue: #26648

Lead-authored-by: shawn <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@pitrou
Copy link
Member

pitrou commented Feb 4, 2025

Issue resolved by pull request 45384
#45384

@pitrou pitrou added this to the 20.0.0 milestone Feb 4, 2025
@pitrou pitrou closed this as completed Feb 4, 2025
@pitrou
Copy link
Member

pitrou commented Feb 4, 2025

Thank you for your contribution @mcshawn10 !

@mcshawn10
Copy link
Contributor

Thank you for your contribution @mcshawn10 !

happy to! just so I am understanding right: per benchmarking, the throughput is for ArrayRangeEqualsSparseUnion comparisons is about 2x more efficient?

@pitrou
Copy link
Member

pitrou commented Feb 4, 2025

happy to! just so I am understanding right: per benchmarking, the throughput is for ArrayRangeEqualsSparseUnion comparisons is about 2x more efficient?

Yes, roughly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants