[C++][Python] Support pretty printing of float16 #36753

datapythonista · 2023-07-18T14:18:35Z

Describe the bug, including details regarding any error messages, version, and platform.

Seems like the representation of float16 values is wrong:

>>> import pyarrow
>>> pyarrow.array([numpy.float16(1)], type=pyarrow.float16())
<pyarrow.lib.HalfFloatArray object at 0x7fbc2d212b00>
[
  15360
]

Instead of showing 1. as the value, an integer 12360 is shown.

Tried with pyarrow 12.0.0.

Component(s)

Python

[C++] vendor a half precision floating point library #22806

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2023-07-18T16:45:32Z

Yes, this is just not implemented, but therefore very confusing. You can see through other means that the actual stored values are fine:

>>> arr = pyarrow.array([numpy.float16(1)], type=pyarrow.float16())
>>> arr[0]
<pyarrow.HalfFloatScalar: 1.0>
>>> arr.to_numpy()
array([1.], dtype=float16)

In general, float16 has only limited support in pyarrow. For example also casting to other types is not yet implemented (#32802 (casting to strings), #20213)

Specifically for the repr, this is using the PrettyPrinter defined in Arrow C++, and actually have a note about the float16 support in its implementation:

arrow/cpp/src/arrow/pretty_print.cc

Lines 223 to 227 in f990406

    
           Status WriteDataValues(const HalfFloatArray& array) { 
        
             // XXX do not know how to format half floats yet 
        
             StringFormatter<Int16Type> formatter{array.type().get()}; 
        
             return WritePrimitiveValues(array, &formatter); 
        
           }

So we explicitly fallback to printing it as int16 because float16 is not easy to do (so you get the same output as what you would get when doing .view(np.int16) in numpy)

jorisvandenbossche · 2023-07-18T16:47:54Z

Some recent discussion about float16 support: #22806

datapythonista · 2023-07-18T16:52:42Z

To me personally, if not trivial to visualize the right value, it'd make more sense to fallback to something like <float16 at 0x1234> than to a wrong value (interpreting the bits with a different arbitrary type).

pitrou · 2023-08-22T16:28:49Z

@benibus FYI

pitrou · 2024-01-26T13:33:58Z

We should revive this now that we do have a half-float library available.

felipecrv · 2024-06-27T14:40:16Z

Comment where I'm promising a fix:
#32802 (comment)

datapythonista added the Type: bug label Jul 18, 2023

github-actions bot added Component: Python and removed Type: bug labels Jul 18, 2023

jorisvandenbossche changed the title ~~Incorrect representation of float16~~ [C++][Python] Support pretty printing of float16 Jul 18, 2023

jorisvandenbossche added the Component: C++ label Jul 18, 2023

mariosasko mentioned this issue Feb 19, 2024

[Python][C++] Data corruption when initializing from float16 NumPy array or pandas Series #40106

Closed

felipecrv self-assigned this Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Python] Support pretty printing of float16 #36753

[C++][Python] Support pretty printing of float16 #36753

datapythonista commented Jul 18, 2023 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented Jul 18, 2023

jorisvandenbossche commented Jul 18, 2023

datapythonista commented Jul 18, 2023

pitrou commented Aug 22, 2023

pitrou commented Jan 26, 2024

felipecrv commented Jun 27, 2024 •

edited

Loading

[C++][Python] Support pretty printing of float16 #36753

[C++][Python] Support pretty printing of float16 #36753

Comments

datapythonista commented Jul 18, 2023 • edited by jorisvandenbossche Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

jorisvandenbossche commented Jul 18, 2023

jorisvandenbossche commented Jul 18, 2023

datapythonista commented Jul 18, 2023

pitrou commented Aug 22, 2023

pitrou commented Jan 26, 2024

felipecrv commented Jun 27, 2024 • edited Loading

datapythonista commented Jul 18, 2023 •

edited by jorisvandenbossche

Loading

felipecrv commented Jun 27, 2024 •

edited

Loading