Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: polars, sort on a list #180

Open
cchudant opened this issue Jan 12, 2023 · 3 comments
Open

Panic: polars, sort on a list #180

cchudant opened this issue Jan 12, 2023 · 3 comments
Labels
A-server Area: Server C-bug Category: Bug

Comments

@cchudant
Copy link
Contributor

Repro:

1)
import polars as pl
from bastionlab.polars.policy import Policy, TrueRule, Log

repro_df = pl.DataFrame({ "hello": [1, 2, 3], "world": [1, 2, 3] })
repro_rdf = connection.client.polars.send_df(repro_df, policy=Policy(safe_zone=TrueRule(), unsafe_handling=Log(), savable=False))

(
    repro_rdf
    .groupby(pl.col("hello"))
    .agg(pl.col("world")) # replace with .agg(pl.col("world")).sum() for it to work
    .sort(pl.col("world"))
).collect().fetch()

results in

thread 'tokio-runtime-worker' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/cchudant/.cargo/registry/src/github.com-1ecc6299db9ec823/polars-core-0.25.1/src/series/series_trait.rs:427:9

This is inside polars, not sure how to proceed here

@cchudant cchudant added A-server Area: Server C-bug Category: Bug labels Jan 12, 2023
@dhalf
Copy link
Contributor

dhalf commented Jan 13, 2023

I believe the operation you're trying to perform does no really make sense: the aggregation step does not involve an aggregation function which means you get the list of values in the group for the "world" column, then you try to sort the same column which fails as polars doesn't know how to sort lists.

>>> import polars as pl
>>> df = pl.DataFrame({ "hello": [1,2,3], "world": [1,2,3]})
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).collect()
shape: (3, 2)
┌───────┬───────────┐
│ helloworld     │
│ ------       │
│ i64list[i64] │
╞═══════╪═══════════╡
│ 2     ┆ [2]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ [3]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1     ┆ [1]       │
└───────┴───────────┘
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).sort(pl.col("world")).collect()
thread '<unnamed>' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/runner/work/polars/polars/polars/polars-core/src/series/series_trait.rs:427:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dhalf/Documents/bastionlab/env/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 920, in collect
    return pli.wrap_df(ldf.collect())
pyo3_runtime.PanicException: this operation is not implemented/valid for this dtype: List(Int64)

@dhalf
Copy link
Contributor

dhalf commented Jan 13, 2023

That said, we can probably improve error reporting.

@cchudant
Copy link
Contributor Author

Should i close this issue / reopen this in polars instead then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-server Area: Server C-bug Category: Bug
Projects
None yet
Development

No branches or pull requests

2 participants