Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more details in error message around concat_rows #1023

Closed
LostKobrakai opened this issue Nov 20, 2024 · 2 comments · Fixed by #1057
Closed

Provide more details in error message around concat_rows #1023

LostKobrakai opened this issue Nov 20, 2024 · 2 comments · Fixed by #1057

Comments

@LostKobrakai
Copy link

Given the following dataframes:

[
  #Explorer.DataFrame<
    Polars[1134 x 4]
    gtin string […]
    series string […]
    program string […]
    program_color string […]
  >,
  #Explorer.DataFrame<
    Polars[1520 x 4]
    gtin string […]
    series string […]
    program null […]
    series_color string […]
  >
]

I got

[error] ** (ArgumentError) dataframes must have the same columns
    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

The error message could be better and concat_rows docs could call out that typecasting works between null and other column types

@billylanchantin
Copy link
Member

Small clarification (we chatted on slack):

The error message could be improved by calling out which columns specifically didn't match. Something like:

** (ArgumentError) dataframes must have the same columns

  * Left DataFrame has these columns not present in the right DataFrame:

      ["program_color"]

  * Right DataFrame has these columns not present in the left DataFrame:

      ["series_color"]

    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

where internally we'd do something like:

left_cols = left_df |> names() |> MapSet.new()
right_cols = right_df |> names() |> MapSet.new()

mismatched_cols = MapSet.symmetric_difference(left_cols, right_cols)

in_left_only = left_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
in_right_only = right_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()

@viniciussbs
Copy link
Contributor

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

Just as a feedback, I had the same issue a few days ago and I came to the same wrong conclusion around null vs string. It was just singular vs plural column name, though. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants