Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of decimals in scientific notation #2221

Closed
nixent opened this issue Feb 26, 2024 · 4 comments
Closed

Handling of decimals in scientific notation #2221

nixent opened this issue Feb 26, 2024 · 4 comments
Assignees
Labels
bug Something isn't working on-hold Issues and Pull Requests that are on hold for some reason

Comments

@nixent
Copy link

nixent commented Feb 26, 2024

Environment

Delta-rs version:
0.17.0
Binding:
Python


Bug

Reading parquet file with fields of decimal(32,16) type, some of the fields have zero values and calling write_deltalake results in error:
Exception: Parser error: can't parse the string value 0E-16 to decimal

According to 22171 if scale of decimal type is > 6 , 0 value will be shown in scientific notation.

What happened:
write_deltalake writes delta to the disk and the decimal columns with 0 values are written as 0, however it is unclear if metadata are written correctly
What you expected to happen:
write_deltalake writes delta table without error

@nixent nixent added the bug Something isn't working label Feb 26, 2024
@nixent
Copy link
Author

nixent commented Feb 26, 2024

Same issue #2193

@rtyler rtyler added the binding/python Issues for the Python package label Mar 7, 2024
@neo4py
Copy link

neo4py commented Mar 13, 2024

it appears zero values are successfully written if there are other records in the batch with non-zero values. but if there is only one record in the batch, the zero value is not written. and in both the cases, error is the same:
Parser error: can't parse the string value 0.0 to decimal.

@ion-elgreco ion-elgreco self-assigned this Mar 24, 2024
@ion-elgreco
Copy link
Collaborator

This is caused by the upstream json parser in Arrow-RS not supporting scientific notations to be parsed

@ion-elgreco ion-elgreco added on-hold Issues and Pull Requests that are on hold for some reason and removed binding/python Issues for the Python package labels Mar 24, 2024
@leo-schick
Copy link

The decimal in scientific notation issue occurs not only when writing data, but as well when I try to read the schema from a delta table: I have the follwing code (see also here;

    deltaTable = DeltaTable(file_uri, storage_options=deltalake_storage_options(storage))
    pyarrow_schema = deltaTable.schema().to_pyarrow()
    return pyarrow_schema_to_sqlalchemy_table(pyarrow_schema, name=table_name, schema=schema_name, metadata=metadata)

This results in the error Parser error: can't parse the string value 0E-16 to decimal :
image

This error did not happen in version 0.14.0, but since 0.15.0 I get this error in my code when the table has a decimal type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working on-hold Issues and Pull Requests that are on hold for some reason
Projects
None yet
Development

No branches or pull requests

5 participants