Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python, rust): allow in pushdowns in early_filter #2807

Merged

Conversation

ion-elgreco
Copy link
Collaborator

Description

is_in filters can be used now to do partition pruning, I also moved all the filter logic into it's own module.

Related issues

@ion-elgreco ion-elgreco force-pushed the fix/missing_isin_in_early_filter branch from 9531181 to 70d4f92 Compare August 21, 2024 12:21
@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Aug 21, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ion-elgreco ion-elgreco changed the title fix(python, rust): allow in pushdowns in early_filter fix(python, rust): allow in pushdowns in early_filter Aug 21, 2024
@rtyler
Copy link
Member

rtyler commented Aug 21, 2024

@ion-elgreco I pulled this into my CI infrastructure, and also have some bigger tables at $WORKPLACE that @roeap and I were doing some performance analysis with. I will take a look and report back with a review.

@JonasDev1
Copy link
Contributor

In the mod.rs are also tests related to the filters, maybe you can move them also to the filter.rs

@ion-elgreco
Copy link
Collaborator Author

ion-elgreco commented Aug 21, 2024

In the mod.rs are also tests related to the filters, maybe you can move them also to the filter.rs

Ah yes, missed those :)

@JonasDev1
Copy link
Contributor

A test with a condition like target.col in (source.col1, source.col2) would be interesting

@ion-elgreco
Copy link
Collaborator Author

A test with a condition like target.col in (source.col1, source.col2) would be interesting

Hmm I will add some more tests, by the way I wonder if we want placeholder exprs if a user already explicitly passed a predicate for that partition column.

If I do s.partition and t.partition and t.partition in (3,4), I will get multiple placeholders saying t.partition=3 or t.partition=4 and t.partition in (3,4), result will be the same but I wonder if this would make it less efficient in terms of query runtime

@ion-elgreco ion-elgreco marked this pull request as draft August 22, 2024 06:04
@ion-elgreco
Copy link
Collaborator Author

I've put it in draft until I add some more tests in Rust

@JonasDev1
Copy link
Contributor

A test with a condition like target.col in (source.col1, source.col2) would be interesting

Hmm I will add some more tests, by the way I wonder if we want placeholder exprs if a user already explicitly passed a predicate for that partition column.

If I do s.partition and t.partition and t.partition in (3,4), I will get multiple placeholders saying t.partition=3 or t.partition=4 and t.partition in (3,4), result will be the same but I wonder if this would make it less efficient in terms of query runtime

The placeholder expr could be more restrictive than the explicit expr. I think performance should not be affected

@omkar-foss
Copy link
Contributor

omkar-foss commented Aug 29, 2024

A test with a condition like target.col in (source.col1, source.col2) would be interesting

Hi! I've added a Rust test that covers above mentiond condition here in this PR.

Also added a few more Rust unit tests for try_construct_early_filter().

@ion-elgreco ion-elgreco force-pushed the fix/missing_isin_in_early_filter branch from 9cdaf1e to b5ecfbc Compare August 29, 2024 18:44
@ion-elgreco ion-elgreco marked this pull request as ready for review August 29, 2024 18:49
@ion-elgreco ion-elgreco enabled auto-merge August 30, 2024 18:48
@rtyler
Copy link
Member

rtyler commented Aug 30, 2024

I have done some manual testing with this pull request and I can say that at least it did not adversely impact performance 😄 . I will rebase this branch, merge it, and then we can go from there

@rtyler rtyler force-pushed the fix/missing_isin_in_early_filter branch from b5ecfbc to ef9971e Compare August 30, 2024 19:06
@ion-elgreco ion-elgreco added this pull request to the merge queue Aug 30, 2024
Merged via the queue into delta-io:main with commit 04b8637 Aug 30, 2024
21 checks passed
@ion-elgreco ion-elgreco deleted the fix/missing_isin_in_early_filter branch August 30, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
4 participants