Dialyzer complains about `ungroup` with no colum names #1055

viniciussbs · 2025-01-16T15:22:23Z

Hi! I've mentioned in a closed issue about ungroup that I was experiencing an error when calling ungroup with no column names. But actually, the issue is different: it works, but Dialyzer complains about it.

The issue

This works:

[
  %{date: ~D[2025-01-14], id: 7, price: 4200},
  %{date: ~D[2025-01-14], id: 19, price: 5000},
  %{date: ~D[2025-01-15], id: 7, price: 4000},
  %{date: ~D[2025-01-15], id: 19, price: 5000}
]
|> DF.new()
|> DF.group_by(:date)
|> DF.summarise(price: min(price))
|> DF.ungroup() # no column names specified
|> DF.print()

+--------------------------------------------+
| Explorer DataFrame: [rows: 2, columns: 2]  |
+------------------------+-------------------+
|          date          |       price       |
|         <date>         |       <s64>       |
+========================+===================+
| 2025-01-14             | 4200              |
+------------------------+-------------------+
| 2025-01-15             | 4000              |
+------------------------+-------------------+

But Dialyzer complains:

lib/explorer_app.ex:4:no_return
Function test/0 has no local return.
________________________________________________________________________________
done (warnings were emitted)
Halting VM with exit status 2

Then you pass column names in ungroup:

|> DF.ungroup(:date)

And Dialyzer stops complaining:

done (passed successfully)

Repo reproducing the issue:

https://github.com/viniciussbs/explorer-app/tree/ungroup-and-dialyzer

The text was updated successfully, but these errors were encountered:

billylanchantin · 2025-01-16T16:11:10Z

Great demo of the issue, thanks!

@philss I believe adding Range.t() to the @spec solves the issue:

@spec ungroup(
        df :: DataFrame.t(),
        groups_or_group :: column_names() | column_name() | Range.t()
      ) ::
        DataFrame.t()
def ungroup(df, groups \\ ..)

But I'm also wondering if we even want .. in the first place. If the only range we support is .., should we instead have nil? Or is that weird semantically? Usually nil means "none" but here we'd be using it to mean "all".

Apologizes if the reasoning behind .. was discussed elsewhere and I missed it.

josevalim · 2025-01-16T16:21:21Z

Can we access group by index anywhere? I don't think we can, so I would agree we should not support ranges. Maybe introduce :all to denote all of them? Do we use :all or another atom anywhere else?

billylanchantin · 2025-01-16T16:25:52Z

@josevalim I think the original #1035 was about removing :all. We could of course add it back (and make it actually work), but I think the hesitation is that any atom we choose could also be a valid column name.

josevalim · 2025-01-16T16:51:57Z

So I'd say: let's support ranges even though it probably would have very little application in practice. At least it is consistent with all other APIs that accept column names.

billylanchantin · 2025-01-16T17:02:09Z

I think we can make that work. One complication though: does the range represent the column indices or the group indices?

For ungroup(df, ..) the difference is moot. But for non-empty ranges the meaning is unclear:

import Explorer.DataFrame

df = new(a: [1, 1, 1], b: [2, 2, 2], c: [3, 3, 3])

grouped = group_by(df, 1..2) # groups by [:b, :c]

ungrouped = ungroup(df, 0..1) # ungroups [:b, :c] or just [:b]?

josevalim · 2025-01-16T17:18:50Z

Gut feeling says it applies to groups, don't groups become the first columns once ungrouped anyway?

billylanchantin · 2025-01-16T17:24:48Z

Gut feeling says it applies to groups

I agree. Otherwise you'd have to find the location of the grouped column in df.names to use this feature. I'll just need to make it clear in the docs.

don't groups become the first columns once ungrouped anyway?

It doesn't appear so.

iex> import Explorer.DataFrame
Explorer.DataFrame

iex> df = new(a: [1, 1, 1], b: [2, 2, 2], c: [3, 3, 3])
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> grouped = group_by(df, 1..2) # groups by [:b, :c]
#Explorer.DataFrame<
  Polars[3 x 3]
  Groups: ["b", "c"]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> ungrouped = ungroup(df, ..)
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> ungrouped.names
["a", "b", "c"]

josevalim · 2025-01-16T17:29:02Z

Sounds good then with groups only!

billylanchantin · 2025-01-16T20:36:41Z

@viniciussbs If you're able, please double check that this actually got fixed on main. Thanks!

viniciussbs · 2025-01-17T23:25:49Z

@billylanchantin So,

This didn't compile:

{:explorer,
 git: "https://github.com/elixir-explorer/explorer.git",
 branch: "main",
 submodules: true}

Then I've tried this suggestion from README, but it didn't compile:

{:explorer,
 git: "https://github.com/elixir-explorer/explorer.git",
 branch: "main",
 submodules: true,
 system_env: %{"EXPLORER_BUILD" => "1"}},
{:rustler, ">= 0.0.0"}

Then I installed Rust via asdf to have cargo and it complained about missing cmake. So I've installed cmake and it looks like it's compiling. I'm not shure if it's stuck, because it started compiling more than 20 minutes ago.

Compiling crate explorer in release mode (native/explorer)

   ... compiled a bunch of stuf...

   Compiling explorer v0.1.0 (/Users/vinicius/repo/explorer_app/deps/explorer/native/explorer)

Am I doing something wrong or does it take too long to compile?

Also, should the REAME mention that the developer needs Rust/cargo installed? I believed rustler was enough, at first, but it's not.

billylanchantin · 2025-01-17T23:53:28Z

@viniciussbs Thanks for checking!

Am I doing something wrong or does it take too long to compile?

I seem to recall long compile times for the first compile too 😬

Also, should the REAME mention that the developer needs Rust/cargo installed? I believed rustler was enough, at first, but it's not.

Yes it's mentioned in the contributing guidelines: https://github.com/elixir-explorer/explorer?tab=readme-ov-file#contributing. Though I do apologize, I forgot that this would all be required for you to check that this was fixed on main.

If it keeps giving you trouble please don't feel like you need to persevere! I'm pretty sure the situation is fixed, I was just hoping for a double check. But it's not necessary :)

viniciussbs · 2025-01-18T14:27:40Z

@billylanchantin forgot to mention here, but it compiled.

Finished `release` profile [optimized] target(s) in 35m 50s

After that, mix dialyzer finished successfully.

done (passed successfully)

billylanchantin mentioned this issue Jan 16, 2025

Fix ungroup spec #1056

Merged

billylanchantin closed this as completed in #1056 Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dialyzer complains about `ungroup` with no colum names #1055

Dialyzer complains about `ungroup` with no colum names #1055

viniciussbs commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025 •

edited

Loading

billylanchantin commented Jan 16, 2025 •

edited

Loading

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

viniciussbs commented Jan 17, 2025 •

edited

Loading

billylanchantin commented Jan 17, 2025

viniciussbs commented Jan 18, 2025

Dialyzer complains about ungroup with no colum names #1055

Dialyzer complains about ungroup with no colum names #1055

Comments

viniciussbs commented Jan 16, 2025

The issue

Repo reproducing the issue:

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

josevalim commented Jan 16, 2025 • edited Loading

billylanchantin commented Jan 16, 2025 • edited Loading

josevalim commented Jan 16, 2025

billylanchantin commented Jan 16, 2025

viniciussbs commented Jan 17, 2025 • edited Loading

billylanchantin commented Jan 17, 2025

viniciussbs commented Jan 18, 2025

Dialyzer complains about `ungroup` with no colum names #1055

Dialyzer complains about `ungroup` with no colum names #1055

josevalim commented Jan 16, 2025 •

edited

Loading

billylanchantin commented Jan 16, 2025 •

edited

Loading

viniciussbs commented Jan 17, 2025 •

edited

Loading