Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to check column order when comparing polars dataframes #3206

Merged
merged 3 commits into from
Apr 25, 2022

Conversation

physinet
Copy link
Contributor

This adds check_column_order as a keyword argument to pl.testing.assert_frame_equal to allow comparison of DataFrames with columns in an arbitrary order.

@github-actions github-actions bot added the python Related to Python Polars label Apr 21, 2022
@ritchie46
Copy link
Member

ritchie46 commented Apr 21, 2022

How would this work? Which column is compared to which then?

Right, I now see the keyword in that function. Can we rename that to check_column_names?

@physinet
Copy link
Contributor Author

physinet commented Apr 21, 2022

I think check_column_order (as the variable was originally named) makes more sense. If False, the name of the column still has to match the values in that column, but the order of columns can be arbitrary.

check_column_names to me implies that as long as the values are in the correct positions, the column names do not matter. For example, does that imply these dataframes should be asserted as equal?

df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"b": [1, 2], "a": [3, 4]})

This is not the case that I'm hoping to cover with this keyword argument.

@physinet
Copy link
Contributor Author

physinet commented Apr 21, 2022

FWIW, the equivalent pandas argument is named check_like: https://github.com/pandas-dev/pandas/blob/v1.4.2/pandas/_testing/asserters.py#L1196

@ritchie46
Copy link
Member

I think check_column_order (as the variable was originally named) makes more sense. If False, the name of the column still has to match the values in that column, but the order of columns can be arbitrary.

check_column_names to me implies that as long as the values are in the correct positions, the column names do not matter. For example, does that imply these dataframes should be asserted as equal?

df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"b": [1, 2], "a": [3, 4]})

This is not the case that I'm hoping to cover with this keyword argument.

To me it implies that the column names must be correct and in the same order. The other name confused me, because how can we check if the combination of columns is correct. It does not make much sense to me.

@physinet
Copy link
Contributor Author

Changed to check_column_names - makes sense enough and the docstring should help clarify.

@ritchie46
Copy link
Member

Thanks @physinet

@ritchie46 ritchie46 merged commit 929ec8c into pola-rs:master Apr 25, 2022
moritzwilksch pushed a commit to moritzwilksch/polars that referenced this pull request May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants