Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Decimal series? #4104

Closed
j-a-m-l opened this issue Jul 21, 2022 · 18 comments
Closed

Support for Decimal series? #4104

j-a-m-l opened this issue Jul 21, 2022 · 18 comments
Milestone

Comments

@j-a-m-l
Copy link

j-a-m-l commented Jul 21, 2022

Since Arrow2 provides support for the Decimal data type, is there any plan to allow it for Polars series?

@pengyizeng
Copy link

x2

@gussen
Copy link

gussen commented Oct 10, 2022

I am not familiar with the arrow format and the inner workings of polars but I guess that adding proper support for decimal is a significant work. In the mean time a probably easier thing to do is convert decimal with scale 0, precision 1-9 to Int32, decimal with scale 0, precision 10-19 to Int64 and in all other cases convert to Text. Polars can then read the files with decimal and do a lot of analysis with the functionality that exists for Int and Text and of course do analysis on all the other fields.

@thomasaarholt
Copy link
Contributor

At the least, it would be nice if pl.read_parquet automatically cast from the decimal type to float type, like pl.from_arrow seems to do. Instead of just throwing an error:

one_row.zip

import pyarrow.parquet
data = pyarrow.parquet.read_table("one_row.parquet")
data["pred_mean"].type # Decimal128Type(decimal128(38, 0))
df = pl.from_arrow(data) 
df["pred_mean"].dtype # polars.datatypes.Float64


df = pl.read_parquet("one_row.parquet")
# InvalidOperationError: Cannot create polars series from Decimal(38, 0) type

@WillenCrypto
Copy link

Same issue here. Please support decimal!

@sa-
Copy link
Contributor

sa- commented Dec 26, 2022

I was trying out polar's new delta reader and ran into this error using pl.scan_delta(...)

Arrow datatype Decimal(18, 5) not supported by Polars. You probably need to activate that data-type feature

I've installed polars with pip install polars[all] so it's unlikely that the feature is missing. It would be nice if polars could support this.

@rossmechanic
Copy link

+1!

1 similar comment
@houyingkun
Copy link

+1!

@plaflamme
Copy link
Contributor

Is there a rundown of what it would take to add decimal support to pola-rs? A previous comment mentioned this might be a large effort, but perhaps if it's broken down into manageable pieces, the community could take this on?

I understand that breaking this down is also a bunch of work in itself, so I totally understand that this isn't done nor prioritized. Still, providing some guidance to new contributors might be a way to get this done at a lower bar than doing the work itself.

Also, as a side-note, I would recommend making the conversion to f64 require being explicitly enabled by the user, e.g.: pl.read_parquet("foo.parquet", decimal_as_f64 = True) and fail when not enabled. I think it's surprising that pola-rs does this by default (logging isn't a bad idea, but it can easily be missed in some deployment scenarios) since it can lead to issues if the computations need to be exact (money or otherwise).

@rossmechanic
Copy link

Agreed with @plaflamme. I'd be happy to contribute to some of the work to support decimal type. It's the only thing keeping us from moving from pandas to polars.

@ritchie46
Copy link
Member

I can see of I can take some time for this in the coming weeks.

@plaflamme
Copy link
Contributor

@ritchie46 That's great news! Please note that I'd be happy to help out this effort if there's a way to break it down. I can also help testing.

@ritchie46
Copy link
Member

@ritchie46 That's great news! Please note that I'd be happy to help out this effort if there's a way to break it down. I can also help testing.

That would be great. The first step would be adding ChunkedArray<Int128Type>. This would be adding that logic to the existing Numeric traits and types. This should be added under a dtype-i128 feature flag.

We could merge this first and then we'll have to look what the best next steps are.

@plaflamme
Copy link
Contributor

Alright, here's a draft PR, feedback would be greatly appreciated.

@stinodego stinodego added this to the Python Polars 1.0.0 milestone Mar 3, 2023
@arachnegl
Copy link

I believe Polars supports Decimals since https://github.com/pola-rs/polars/releases/tag/py-0.16.10

I have added #7566 relating to the current implementation.

@martin-wiebusch-thg
Copy link

In polars[all] v0.16.13 I still get the following when trying to load a parquet file with decimals in it:

Arrow datatype Decimal(38, 9) not supported by Polars. You probably need to activate that data-type feature.

How do I activate the feature?

@naarkhoo
Copy link

naarkhoo commented Mar 20, 2023

I got the same with '0.16.13' when reading with scan_parquet
thread '<unnamed>' panicked at 'Arrow datatype Decimal(2, 1) not supported by Polars. You probably need to activate that data-type feature.', /Users/runner/work/polars/polars/polars/polars-core/src/datatypes/field.rs:153:19

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 27, 2023

How do I activate the feature?

@martin-wiebusch-thg, @naarkhoo:

pl.Config.activate_decimals()

Still in early development/testing, hence the need to explicitly activate (for now).

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 27, 2023

Now that the feature exists (in an early/development state), I'm closing this issue in favour of more specific issues/requirements found with the implementation...👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests