Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve err msg on use of non-partitioned column #1221

Merged
merged 6 commits into from
Mar 24, 2023

Conversation

marijncv
Copy link
Contributor

@marijncv marijncv commented Mar 12, 2023

Description

Updates the error message when trying to use a partition filter with a column that is not partitioned.

Code to trigger the error (adapted from #1219):

from datetime import date

import numpy as np
import numpy.random
import pandas as pd
from deltalake import DeltaTable, write_deltalake

nrows = 9
data = pd.DataFrame(
    {
        "year": np.repeat(2023, nrows),
        "month": np.repeat([1, 2, 3], nrows / 3),
        "values": numpy.random.normal(size=nrows),
    }
)
table_path = "tables/monthly"
write_deltalake(
    table_path,
    data,
    partition_by=["year", "month"],
    mode="overwrite",
)

nrows = 4
new_data = pd.DataFrame(
    {
        "year": np.repeat(2023, nrows),
        "month": np.repeat([4], nrows),
        "values": numpy.random.normal(size=nrows) + 10,
    }
)

write_deltalake(
    table_path,
    new_data,
    mode="overwrite",
    partition_filters=[("values", "=", "1")],
)
DeltaTable(table_path).to_pandas()

Related Issue(s)

closes #1218

Documentation

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Mar 12, 2023
Signed-off-by: Marijn Valk <[email protected]>
@marijncv marijncv marked this pull request as ready for review March 13, 2023 05:44
Signed-off-by: Marijn Valk <[email protected]>
@github-actions github-actions bot added the binding/python Issues for the Python package label Mar 13, 2023
Signed-off-by: Marijn Valk <[email protected]>
for f in filters {
if !current_metadata.partition_columns.contains(&f.key.into()) {
let column = f.key.to_string();
return Err(DeltaTableError::ColumnNotPartitioned {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, the original implementation returns all columns that aren't partitioned, but this implementation does an early return, which results in a different behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @houqp, good point! I tried to restore to the original behavior in the latest commit

Signed-off-by: Marijn Valk <[email protected]>
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took a bit to get to reading this. Looks good :)

@wjones127 wjones127 merged commit 4e0b74e into delta-io:main Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message when filtering on non-existant partition columns
3 participants