Provide more details in error message around `concat_rows` #1023

LostKobrakai · 2024-11-20T17:04:04Z

Given the following dataframes:

[
  #Explorer.DataFrame<
    Polars[1134 x 4]
    gtin string […]
    series string […]
    program string […]
    program_color string […]
  >,
  #Explorer.DataFrame<
    Polars[1520 x 4]
    gtin string […]
    series string […]
    program null […]
    series_color string […]
  >
]

I got

[error] ** (ArgumentError) dataframes must have the same columns
    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

The error message could be better and concat_rows docs could call out that typecasting works between null and other column types

The text was updated successfully, but these errors were encountered:

billylanchantin · 2024-11-20T17:51:15Z

Small clarification (we chatted on slack):

The error message could be improved by calling out which columns specifically didn't match. Something like:

** (ArgumentError) dataframes must have the same columns

  * Left DataFrame has these columns not present in the right DataFrame:

      ["program_color"]

  * Right DataFrame has these columns not present in the left DataFrame:

      ["series_color"]

    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

where internally we'd do something like:

left_cols = left_df |> names() |> MapSet.new()
right_cols = right_df |> names() |> MapSet.new()

mismatched_cols = MapSet.symmetric_difference(left_cols, right_cols)

in_left_only = left_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
in_right_only = right_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()

viniciussbs · 2025-01-18T14:33:37Z

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

Just as a feedback, I had the same issue a few days ago and I came to the same wrong conclusion around null vs string. It was just singular vs plural column name, though. 😅

billylanchantin mentioned this issue Jan 18, 2025

Better concat_rows error messages #1057

Merged

billylanchantin closed this as completed in #1057 Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide more details in error message around `concat_rows` #1023

Provide more details in error message around `concat_rows` #1023

LostKobrakai commented Nov 20, 2024

billylanchantin commented Nov 20, 2024

viniciussbs commented Jan 18, 2025

Provide more details in error message around concat_rows #1023

Provide more details in error message around concat_rows #1023

Comments

LostKobrakai commented Nov 20, 2024

billylanchantin commented Nov 20, 2024

viniciussbs commented Jan 18, 2025

Provide more details in error message around `concat_rows` #1023

Provide more details in error message around `concat_rows` #1023