Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to skip sorting in hashdiff for improved performance #45

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

alex-mirkin
Copy link
Contributor

Added an option to skip sorting results in hashdiff for improved performance, when there is a large number of differences.
When enabled, entries with the same key but different column values may not appear adjacent in the output.
This change discussed in #41.

@alex-mirkin
Copy link
Contributor Author

Could you advise on how the tests should look like?
I’m not entirely sure about the best approach to test this functionality within the existing testing framework.

@erezsh
Copy link
Owner

erezsh commented Sep 14, 2024

You could place your test in test_diff_tables. You can run a diff that only diffs one segment, so you can check that the order is preserved. (assuming the data has a non-sorted order)

I think you can ensure one segment by putting a very low bisection-factor of 1 or 2, and high bisection threshold.

@alex-mirkin alex-mirkin marked this pull request as draft September 19, 2024 19:11
@alex-mirkin alex-mirkin marked this pull request as ready for review September 19, 2024 19:20
@erezsh erezsh merged commit a0047e8 into erezsh:master Sep 20, 2024
6 checks passed
@erezsh
Copy link
Owner

erezsh commented Sep 20, 2024

Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants