Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Support pretty printing of float16 #36753

Open
datapythonista opened this issue Jul 18, 2023 · 6 comments
Open

[C++][Python] Support pretty printing of float16 #36753

datapythonista opened this issue Jul 18, 2023 · 6 comments

Comments

@datapythonista
Copy link

datapythonista commented Jul 18, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Seems like the representation of float16 values is wrong:

>>> import pyarrow
>>> pyarrow.array([numpy.float16(1)], type=pyarrow.float16())
<pyarrow.lib.HalfFloatArray object at 0x7fbc2d212b00>
[
  15360
]

Instead of showing 1. as the value, an integer 12360 is shown.

Tried with pyarrow 12.0.0.

Component(s)

Python

Related:

@jorisvandenbossche jorisvandenbossche changed the title Incorrect representation of float16 [C++][Python] Support pretty printing of float16 Jul 18, 2023
@jorisvandenbossche
Copy link
Member

Yes, this is just not implemented, but therefore very confusing. You can see through other means that the actual stored values are fine:

>>> arr = pyarrow.array([numpy.float16(1)], type=pyarrow.float16())
>>> arr[0]
<pyarrow.HalfFloatScalar: 1.0>
>>> arr.to_numpy()
array([1.], dtype=float16)

In general, float16 has only limited support in pyarrow. For example also casting to other types is not yet implemented (#32802 (casting to strings), #20213)

Specifically for the repr, this is using the PrettyPrinter defined in Arrow C++, and actually have a note about the float16 support in its implementation:

Status WriteDataValues(const HalfFloatArray& array) {
// XXX do not know how to format half floats yet
StringFormatter<Int16Type> formatter{array.type().get()};
return WritePrimitiveValues(array, &formatter);
}

So we explicitly fallback to printing it as int16 because float16 is not easy to do (so you get the same output as what you would get when doing .view(np.int16) in numpy)

@jorisvandenbossche
Copy link
Member

Some recent discussion about float16 support: #22806

@datapythonista
Copy link
Author

To me personally, if not trivial to visualize the right value, it'd make more sense to fallback to something like <float16 at 0x1234> than to a wrong value (interpreting the bits with a different arbitrary type).

@pitrou
Copy link
Member

pitrou commented Aug 22, 2023

@benibus FYI

@pitrou
Copy link
Member

pitrou commented Jan 26, 2024

We should revive this now that we do have a half-float library available.

@felipecrv
Copy link
Contributor

felipecrv commented Jun 27, 2024

Comment where I'm promising a fix:
#32802 (comment)

@felipecrv felipecrv self-assigned this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants