Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading parquet file created by pandas. polars==0.13.1. Working in polars==0.10.5 #2654

Closed
keiv-fly opened this issue Feb 15, 2022 · 6 comments

Comments

@keiv-fly
Copy link
Contributor

Are you using Python or Rust?

Python.

What version of polars are you using?

Error in polars==0.13.1. No error in polars==0.10.5

What operating system are you using polars on?

Windows 10

Describe your bug.

Reading Parquet file created in pandas fails

What are the steps to reproduce the behavior?

>>> import pandas as pd
>>> df_temp = pd.DataFrame({"a":['V', 'V', 'V', 'V', 'V', 'V', 'V', 'V', 'V', 'V', 'V', 'V', 'V','V', None, None, None, None, None, None]})
>>> df_temp.to_parquet("a.parquet", index=False)
>>> import polars as pl
>>> pl.read_parquet("a.parquet")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\serge\anaconda3\envs\py39\lib\site-packages\polars\io.py", line 871, in read_parquet
    return DataFrame._read_parquet(
  File "C:\Users\serge\anaconda3\envs\py39\lib\site-packages\polars\internals\frame.py", line 537, in _read_parquet
    self._df = PyDataFrame.read_parquet(
RuntimeError: Any(ArrowError(OutOfSpec("validity mask length must match the number of values")))

There is a workaround that works: "use_pyarrow=True".

@ritchie46
Copy link
Member

I believe this one is fixed upstream already. I will release tomorrow.

@keiv-fly
Copy link
Contributor Author

Thanks for the info. I tried to install from github master, but it said that it is the same version and did not install anything:

(py39) C:\Users\serge>pip install -U git+https://github.com/pola-rs/polars.git#subdirectory=py-polars
Collecting git+https://github.com/pola-rs/polars.git#subdirectory=py-polars
  Cloning https://github.com/pola-rs/polars.git to c:\users\serge\appdata\local\temp\pip-req-build-gyjw43za
  Running command git clone -q https://github.com/pola-rs/polars.git 'C:\Users\serge\AppData\Local\Temp\pip-req-build-gyjw43za'
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: typing_extensions>=4.0.0 in c:\users\serge\anaconda3\envs\py39\lib\site-packages (from polars==0.13.1) (4.1.1)
Requirement already satisfied: numpy>=1.16.0 in c:\users\serge\anaconda3\envs\py39\lib\site-packages (from polars==0.13.1) (1.21.0)
WARNING: You are using pip version 21.1.2; however, version 22.0.3 is available.
You should consider upgrading via the 'c:\users\serge\anaconda3\envs\py39\python.exe -m pip install --upgrade pip' command.

(py39) C:\Users\serge>python
Python 3.9.4 (default, Apr  9 2021, 11:43:21) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> pl.__version__
'0.13.1'
>>>

@jorgecarleitao
Copy link
Collaborator

Sorry about this, I did a large refactor in arrow2 for parquet and the coverage was sup-par. Fixed in jorgecarleitao/arrow2#844

@keiv-fly
Copy link
Contributor Author

No problem. I believe Rust is the right language for python data transformation library. But the amount of things implemented in pandas and other libraries is huge. So I believe finding bugs is also a way to contribute.

@ritchie46
Copy link
Member

So I believe finding bugs is also a way to contribute.

Indeed. Thanks for your contribution. 😉

I will release tonight. 👍

@ritchie46
Copy link
Member

Fixed and release on the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants