Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Decimal to float type casting in pl.from_dicts() & pl.DataFrame() #8641

Closed
2 tasks done
scur-iolus opened this issue May 2, 2023 · 6 comments
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@scur-iolus
Copy link

scur-iolus commented May 2, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

The 2 examples provided below should be self-explicit and easier to read than a description of the issue.

The problem is the same whether the conversion (Decimal to float) is explicit or implicit, cf. examples below.

I am aware that the support for Decimal is an "experimental work-in-progress feature" (see #4104).

Reproducible example

import polars as pl
from decimal import Decimal

# example 1
pl.DataFrame([{"a": Decimal("1.0000000001")}])

# example 2
pl.Config.activate_decimals()

pl.from_dicts(
    data=[
        {"a": Decimal("1.0000000001")},
    ],
    schema={"a": pl.Float64},
)

Expected behavior

In both examples, I was expecting to get something like:

a: f64
1.0

But I got:

a: f64
1.0000e10

The exponent-part of this value was 1, it became 10.

Installed versions

Polars 0.17.11

---Version info---
Polars: 0.17.11
Index type: UInt32
Platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.31
Python: 3.11.3 (main, Apr  5 2023, 14:15:06) [GCC 9.4.0]
---Optional dependencies---
numpy: 1.24.3
pandas: 2.0.1
pyarrow: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fsspec: <not installed>
@scur-iolus scur-iolus added bug Something isn't working python Related to Python Polars labels May 2, 2023
@scur-iolus scur-iolus changed the title Inconsistent Decimal to float conversion in pl.from_dicts Inconsistent Decimal to float cast in pl.from_dicts() & pl.DataFrame() May 2, 2023
@scur-iolus scur-iolus changed the title Inconsistent Decimal to float cast in pl.from_dicts() & pl.DataFrame() Inconsistent Decimal to float type casting in pl.from_dicts() & pl.DataFrame() May 2, 2023
@ronaldrichter
Copy link

I observed a similar issue when constructing a DataFrame with Decimal values in row orientation and without activate_decimals():

import polars as pl
from decimal import Decimal

pl.DataFrame([[Decimal("0.5")]], orient="row")

shape: (1, 1)
┌──────────┐
│ column_0 │
│ ---      │
│ f64      │
╞══════════╡
│ 5.0      │
└──────────┘

The decimal scale seems to get lost during the conversion to float. I'm not sure how complicated it would be to fix this issue, but maybe it would be possible to raise an exception in this case for now.

@lmmentel
Copy link

lmmentel commented Sep 17, 2023

I'm also experiencing very inconsistent behavior with Decimal.

Just changing the flag "orient" give different processing of Decimal, consider this example:

import polars as pl
from decimal import Decimal

pl.DataFrame([[Decimal("12.45")]], orient="row")

which gives:

<small>shape: (1, 1)</small>

column_0
--
f64
1245.0

but

pl.DataFrame([[Decimal("12.45")]], orient="col")

gives

<small>shape: (1, 1)</small>

column_0
--
decimal[2]
12.45

So it seems conversion to float goes wrong. Would be great to have it fixed since it leads to pretty severely wrong results.

I'm on linux with polars 0.18.15

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 17, 2023

@scur-iolus: I notice you checked the box that says "I have confirmed this bug exists on the latest version of Polars.", but your installed version is 0.17.11, which is about 4 months behind the current release - this bug was fixed some time ago; please install the latest version and you'll find that your example now works as expected ;)

@ronaldrichter: Looks like you're also on an older version; please install the latest release and you'll see the correct result.

@lmmentel: Your scaling issue was essentially the same as the others and was also fixed, but the dtype difference observed when initialising with row vs col orientation looks like a real issue - can you open a separate issue for it? I'm going to close this one as the scaling bug was fixed some time ago; tag me in the new issue and I can take a look tomorrow - thanks! (Note that you can also just set pl.Config.activate_decimals(True) to solve it for now; we're going to make the more comprehensive Decimal support active by default shortly, so this discrepancy will disappear soon).

@scur-iolus
Copy link
Author

@alexander-beedie you probably did not notice that I opened this issue 5 months ago, at that time 0.17.11 was indeed the latest version 😉 But I wasn't aware it had been fixed, thank you!

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 17, 2023

@alexander-beedie you probably did not notice that I opened this issue 5 months ago, at that time 0.17.11 was indeed the latest version 😉 But I wasn't aware it had been fixed, thank you!

Oh! Quite right, my bad... odd that it showed up on the first page of the Issues list - that threw me off; I must have accidentally sorted or filtered it by something 😅

@lmmentel
Copy link

@lmmentel: Your scaling issue was essentially the same as the others and was also fixed, but the dtype difference observed when initialising with row vs col orientation looks like a real issue - can you open a separate issue for it? I'm going to close this one as the scaling bug was fixed some time ago; tag me in the new issue and I can take a look tomorrow - thanks! (Note that you can also just set pl.Config.activate_decimals(True) to solve it for now; we're going to make the more comprehensive Decimal support active by default shortly, so this discrepancy will disappear soon).

Sure thing, I can verify that with 0.19.3 the decimals are parsed correctly but still being converted to either decimal or float depending on the orient parameter. I've submitted a separate issue #11194.

Thanks for your help o this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants