Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when reading feather file with 16-bit floating point column #3533

Closed
JakobGM opened this issue May 30, 2022 · 6 comments · Fixed by #3940
Closed

Panic when reading feather file with 16-bit floating point column #3533

JakobGM opened this issue May 30, 2022 · 6 comments · Fixed by #3940
Labels
bug Something isn't working

Comments

@JakobGM
Copy link
Contributor

JakobGM commented May 30, 2022

What language are you using?

Python.

What version of polars are you using?

The latest version, as of this writing, version 0.13.39.

What operating system are you using polars on?

MacOS ARM and Ubuntu ARM.

What language version are you using

Python 3.9.

Describe your bug.

Polars panics when reading a Feather file containing a 16-bit floating point column with polars.read_ipc().

What are the steps to reproduce the behavior?

import pandas as pd
import polars as pl

# Create a feather file with a 16-bit floating point column
pandas_df = pd.DataFrame({"column": [1.0]}, dtype="float16")
pandas_df.to_feather("test.ftr")

# Reading this 16-bit column makes polars panic
polars_df = pl.read_ipc("test.ftr")

What is the actual behavior?

Polars panics with the following exception:

  File ".../polars/internals/construction.py", line 569,
 in arrow_to_pydf
    pydf = PyDataFrame.from_arrow_record_batches(tbl.to_batches())
pyo3_runtime.PanicException: internal error: entered unreachable code

What is the expected behavior?

It would be nice if polars automatically upcasts float16 to float32, or possibly allows the user to upcast explicitly in order to be able to represent the column with polars.

@JakobGM JakobGM added the bug Something isn't working label May 30, 2022
@ghuls
Copy link
Collaborator

ghuls commented May 30, 2022

Rust does not support float16 natively.
And that is why arrow2 does not support float16 either.

Theoretically it could be implemented with https://docs.rs/half/latest/half/ .

@ritchie46
Copy link
Member

Note to self. I should stop stripping binaries. Then those panics will be a lot more useful. Now we cannot tell if the panic is in arrow2 or in polars. I agree that we should upcast to f32 here.

@ghuls
Copy link
Collaborator

ghuls commented May 30, 2022

Not sure if this report is still 100% accurate but it seems like pyarrow also can't do much with float16.

https://issues.apache.org/jira/browse/ARROW-13762

import pyarrow as pa
import numpy as np


In [44]: pa.array(np.array([1, 2.0], dtype='float32')).cast(pa.float64())
Out[44]: 
<pyarrow.lib.DoubleArray object at 0x7f8372f1ca00>
[
  1,
  2
]

In [45]: pa.array(np.array([1, 2.0], dtype='float32')).cast(pa.float16())
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-45-abe3f5f71404> in <module>
----> 1 pa.array(np.array([1, 2.0], dtype='float32')).cast(pa.float16())

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.cast()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/compute.py in cast(arr, target_type, safe)
    373     else:
    374         options = CastOptions.unsafe(target_type)
--> 375     return call_function("cast", [arr], options)
    376 
    377 

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/_compute.pyx in pyarrow._compute.call_function()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/_compute.pyx in pyarrow._compute.Function.call()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from float to halffloat using function cast_half_float

In [46]: pa.array(np.array([1, 2.0], dtype='float16')).cast(pa.float16())
Out[46]: 
<pyarrow.lib.HalfFloatArray object at 0x7f8372f1f7c0>
[
  15360,
  16384
]

In [47]: pa.array(np.array([1, 2.0], dtype='float16')).cast(pa.float32())
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-47-ac152adc06f1> in <module>
----> 1 pa.array(np.array([1, 2.0], dtype='float16')).cast(pa.float32())

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.cast()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/compute.py in cast(arr, target_type, safe)
    373     else:
    374         options = CastOptions.unsafe(target_type)
--> 375     return call_function("cast", [arr], options)
    376 
    377 

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/_compute.pyx in pyarrow._compute.call_function()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/_compute.pyx in pyarrow._compute.Function.call()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

~/software/anaconda3/envs/polars_test/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from halffloat to float using function cast_float

@ghuls
Copy link
Collaborator

ghuls commented May 30, 2022

Note to self. I should stop stripping binaries. Then those panics will be a lot more useful. Now we cannot tell if the panic is in arrow2 or in polars. I agree that we should upcast to f32 here.

For me it shows this with the package from pip:

In [3]: polars_df = pl.read_ipc("test.ftr", use_pyarrow=True)
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /github/home/.cargo/git/checkouts/arrow2-945af624853845da/f7c3daf/src/datatypes/mod.rs:240:24
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
<ipython-input-3-469b61d6a200> in <module>
----> 1 polars_df = pl.read_ipc("test.ftr", use_pyarrow=True)

~/software/anaconda3/envs/ctxcore/lib/python3.8/site-packages/polars/io.py in read_ipc(file, columns, n_rows, use_pyarrow, memory_map, storage_options, row_count_name, row_count_offset, rechunk, **kwargs)
    833 
    834             tbl = pa.feather.read_table(data, memory_map=memory_map, columns=columns)
--> 835             return DataFrame._from_arrow(tbl, rechunk=rechunk)
    836 
    837         return DataFrame._read_ipc(

~/software/anaconda3/envs/ctxcore/lib/python3.8/site-packages/polars/internals/frame.py in _from_arrow(cls, data, columns, rechunk)
    441         DataFrame
    442         """
--> 443         return cls._from_pydf(arrow_to_pydf(data, columns=columns, rechunk=rechunk))
    444 
    445     @classmethod

~/software/anaconda3/envs/ctxcore/lib/python3.8/site-packages/polars/internals/construction.py in arrow_to_pydf(data, columns, rechunk)
    567             pydf = pli.DataFrame._from_pandas(tbl.to_pandas())._df
    568         else:
--> 569             pydf = PyDataFrame.from_arrow_record_batches(tbl.to_batches())
    570     else:
    571         pydf = pli.DataFrame([])._df

PanicException: internal error: entered unreachable code

@benjaminrwilson
Copy link

Hi @ritchie46, how difficult would FP16 integration in polars be with this latest PR: jorgecarleitao/arrow2#1051?

@ritchie46
Copy link
Member

Hi @ritchie46, how difficult would FP16 integration in polars be with this latest PR: jorgecarleitao/arrow2#1051?

I don't want the extra code bloat and complexity of yet another type (that can be represented by f32/f64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants