Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix detection of parquet filter pushdown #910

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented Feb 29, 2024

we don't want to use are_co_aligned here, it's way too permissive.

    # Don't push down if replace is in there
    x = df[df.replace(5, 50).a == 5]["b"]
    y = x.optimize(fuse=False)
    assert isinstance(y.expr, Filter)
    assert len(y.compute()) == 0

This shouldn't move the filters inside of read_parquet

Another lessons learned for me personally is that we shouldn't use it in hot-paths in the current implementation

@phofl phofl requested a review from fjetter February 29, 2024 13:43
@fjetter
Copy link
Member

fjetter commented Feb 29, 2024

is this still needed with #909 (comment) ?

@phofl phofl changed the title Improve performance of parquet filter pushdown Fix detection of parquet filter pushdown Feb 29, 2024
@phofl
Copy link
Collaborator Author

phofl commented Feb 29, 2024

PR title is bad (sorry), this is actually more of a bug fix that brings a performance improvement as well. Currently we are too eagerly pushing down filters into the read_parquet call (see the example in the top post). are_co_aligned will allow all kinds of operations between read_parquet and the actual predicate, which will produce incorrect results

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@fjetter fjetter merged commit 2d380c7 into dask:main Feb 29, 2024
7 checks passed
@phofl phofl deleted the parquet_filter_performance branch February 29, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants