-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataType::Decimal
Non-Compliant?
#1779
Comments
To add to the confusion
Decimal support appears to have been added back in apache/arrow#8640, perhaps @nevi-me, @alamb or @jorgecarleitao have some recollection of what is going on here |
I don't remember any details -- the discrepancy is likely my ignorance |
Sorry, I am probably misreading this - I am not following what is the non-compliance. The Python documentation exemplifies:
(so, 7 digits, 3 of them are decimal) we offer the example
The spec is https://github.com/apache/arrow/blob/master/format/Schema.fbs#L185 (copied for readability below)
Aren't they all aligned?
The Python docs continue
which also seems aligned? |
Thank you for the link, that is very unfortunately worded but explains where this confusion has originated from. My interpretation of those comments would be that the represented values would have If one takes the python and C++ implementations as correct, what the are actually saying is that the mantissa has To highlight the distinction, consider the number Ultimately I think there are at least these concrete issues that should be fixed:
|
I think "Fixed Width Decimal" is a fairly common concept and is used for some specialized usecases (like currency), and I think the arrow spec basically follow the standard practice. https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Representation Like @jorgecarleitao , I don't see the problem.
I think the better question is "how would these be represented in different Decimal types" For You could also represent
I do not think that is possible. The reason being that |
I agree
I am not sure about this -- I think the documentation could be clarified but I think talking about scale/exponent/mantissa for fixed width representation will be very confusing as they are floating point concepts.
Definitely worth discussing -- I suspect some people would like
👍 |
Apologies this appears to have been my confusion, we should create a separate ticket to track making scale negative and clarifying the docs |
DataType::Decimal
Non-CompliantDataType::Decimal
Non-Compliant?
Describe the bug
DataType::Decimal is defined as
This appears to be at odds with both the C++ and python implementations (I can't actually find the specification).
These define it as
i.e.
unscaledValue * 10^(-scale)
In particular with the current rust definition it is unclear how to represent numbers with more than 38 digits, either because of leading or trailing 0s.
To Reproduce
Inspect code
Expected behavior
We should be conforming to the other arrow implementations
Additional context
Noticed whilst reviewing apache/datafusion#2680
The parquet logical type is similarly defined - https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
The text was updated successfully, but these errors were encountered: