Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialyzer complains about ungroup with no colum names #1055

Closed
viniciussbs opened this issue Jan 16, 2025 · 12 comments · Fixed by #1056
Closed

Dialyzer complains about ungroup with no colum names #1055

viniciussbs opened this issue Jan 16, 2025 · 12 comments · Fixed by #1056

Comments

@viniciussbs
Copy link
Contributor

Hi! I've mentioned in a closed issue about ungroup that I was experiencing an error when calling ungroup with no column names. But actually, the issue is different: it works, but Dialyzer complains about it.

The issue

This works:

[
  %{date: ~D[2025-01-14], id: 7, price: 4200},
  %{date: ~D[2025-01-14], id: 19, price: 5000},
  %{date: ~D[2025-01-15], id: 7, price: 4000},
  %{date: ~D[2025-01-15], id: 19, price: 5000}
]
|> DF.new()
|> DF.group_by(:date)
|> DF.summarise(price: min(price))
|> DF.ungroup() # no column names specified
|> DF.print()
+--------------------------------------------+
| Explorer DataFrame: [rows: 2, columns: 2]  |
+------------------------+-------------------+
|          date          |       price       |
|         <date>         |       <s64>       |
+========================+===================+
| 2025-01-14             | 4200              |
+------------------------+-------------------+
| 2025-01-15             | 4000              |
+------------------------+-------------------+

But Dialyzer complains:

lib/explorer_app.ex:4:no_return
Function test/0 has no local return.
________________________________________________________________________________
done (warnings were emitted)
Halting VM with exit status 2

Then you pass column names in ungroup:

|> DF.ungroup(:date)

And Dialyzer stops complaining:

done (passed successfully)

Repo reproducing the issue:

https://github.com/viniciussbs/explorer-app/tree/ungroup-and-dialyzer

@billylanchantin
Copy link
Member

Great demo of the issue, thanks!

@philss I believe adding Range.t() to the @spec solves the issue:

@spec ungroup(
        df :: DataFrame.t(),
        groups_or_group :: column_names() | column_name() | Range.t()
      ) ::
        DataFrame.t()
def ungroup(df, groups \\ ..)

But I'm also wondering if we even want .. in the first place. If the only range we support is .., should we instead have nil? Or is that weird semantically? Usually nil means "none" but here we'd be using it to mean "all".

Apologizes if the reasoning behind .. was discussed elsewhere and I missed it.

@josevalim
Copy link
Member

Can we access group by index anywhere? I don't think we can, so I would agree we should not support ranges. Maybe introduce :all to denote all of them? Do we use :all or another atom anywhere else?

@billylanchantin
Copy link
Member

@josevalim I think the original #1035 was about removing :all. We could of course add it back (and make it actually work), but I think the hesitation is that any atom we choose could also be a valid column name.

@josevalim
Copy link
Member

So I'd say: let's support ranges even though it probably would have very little application in practice. At least it is consistent with all other APIs that accept column names.

@billylanchantin
Copy link
Member

I think we can make that work. One complication though: does the range represent the column indices or the group indices?

For ungroup(df, ..) the difference is moot. But for non-empty ranges the meaning is unclear:

import Explorer.DataFrame

df = new(a: [1, 1, 1], b: [2, 2, 2], c: [3, 3, 3])

grouped = group_by(df, 1..2) # groups by [:b, :c]

ungrouped = ungroup(df, 0..1) # ungroups [:b, :c] or just [:b]?

@josevalim
Copy link
Member

josevalim commented Jan 16, 2025

Gut feeling says it applies to groups, don't groups become the first columns once ungrouped anyway?

@billylanchantin
Copy link
Member

billylanchantin commented Jan 16, 2025

Gut feeling says it applies to groups

I agree. Otherwise you'd have to find the location of the grouped column in df.names to use this feature. I'll just need to make it clear in the docs.

don't groups become the first columns once ungrouped anyway?

It doesn't appear so.

iex> import Explorer.DataFrame
Explorer.DataFrame

iex> df = new(a: [1, 1, 1], b: [2, 2, 2], c: [3, 3, 3])
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> grouped = group_by(df, 1..2) # groups by [:b, :c]
#Explorer.DataFrame<
  Polars[3 x 3]
  Groups: ["b", "c"]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> ungrouped = ungroup(df, ..)
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, 1, 1]
  b s64 [2, 2, 2]
  c s64 [3, 3, 3]
>

iex> ungrouped.names
["a", "b", "c"]

@josevalim
Copy link
Member

Sounds good then with groups only!

@billylanchantin
Copy link
Member

@viniciussbs If you're able, please double check that this actually got fixed on main. Thanks!

@viniciussbs
Copy link
Contributor Author

viniciussbs commented Jan 17, 2025

@billylanchantin So,

This didn't compile:

{:explorer,
 git: "https://github.com/elixir-explorer/explorer.git",
 branch: "main",
 submodules: true}

Then I've tried this suggestion from README, but it didn't compile:

{:explorer,
 git: "https://github.com/elixir-explorer/explorer.git",
 branch: "main",
 submodules: true,
 system_env: %{"EXPLORER_BUILD" => "1"}},
{:rustler, ">= 0.0.0"}

Then I installed Rust via asdf to have cargo and it complained about missing cmake. So I've installed cmake and it looks like it's compiling. I'm not shure if it's stuck, because it started compiling more than 20 minutes ago.

Compiling crate explorer in release mode (native/explorer)

   ... compiled a bunch of stuf...

   Compiling explorer v0.1.0 (/Users/vinicius/repo/explorer_app/deps/explorer/native/explorer)

Am I doing something wrong or does it take too long to compile?

Also, should the REAME mention that the developer needs Rust/cargo installed? I believed rustler was enough, at first, but it's not.

@billylanchantin
Copy link
Member

@viniciussbs Thanks for checking!

Am I doing something wrong or does it take too long to compile?

I seem to recall long compile times for the first compile too 😬

Also, should the REAME mention that the developer needs Rust/cargo installed? I believed rustler was enough, at first, but it's not.

Yes it's mentioned in the contributing guidelines: https://github.com/elixir-explorer/explorer?tab=readme-ov-file#contributing. Though I do apologize, I forgot that this would all be required for you to check that this was fixed on main.

If it keeps giving you trouble please don't feel like you need to persevere! I'm pretty sure the situation is fixed, I was just hoping for a double check. But it's not necessary :)

@viniciussbs
Copy link
Contributor Author

@billylanchantin forgot to mention here, but it compiled.

Finished `release` profile [optimized] target(s) in 35m 50s

After that, mix dialyzer finished successfully.

done (passed successfully)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants