Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_ipc_stream - Read Multiple DataFrames from File, use_pyarrow=True Works, but Default False Raises Error #20816

Open
2 tasks done
ruoyu0088 opened this issue Jan 21, 2025 · 1 comment
Labels
accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@ruoyu0088
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

# Create some DataFrames
dfs = []
for i in range(3):
    df = pl.DataFrame({
        "a": [i * 10 + j for j in range(3)],
        "b": [f"chunk_{i}_{j}" for j in range(3)]
    })
    dfs.append(df)

# Write DataFrames to a file using write_ipc_stream
with open('stream_test.arrow', 'wb') as f:
    for df in dfs:
        df.write_ipc_stream(f)

# Read DataFrames from the file with use_pyarrow=True
with open('stream_test.arrow', 'rb') as f:
    df1 = pl.read_ipc_stream(f, use_pyarrow=True)
    df2 = pl.read_ipc_stream(f, use_pyarrow=True)
    df3 = pl.read_ipc_stream(f, use_pyarrow=True)

# Read DataFrames from the file with use_pyarrow=False (raises OSError)
with open('stream_test.arrow', 'rb') as f:
    df1 = pl.read_ipc_stream(f)
    df2 = pl.read_ipc_stream(f)
    df3 = pl.read_ipc_stream(f)

Log output

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[31], line 20
     18 with open('stream_test.arrow', 'rb') as f:
     19     df1 = pl.read_ipc_stream(f)
---> 20     df2 = pl.read_ipc_stream(f)
     21     df3 = pl.read_ipc_stream(f)

File C:\micromamba\envs\cad\Lib\site-packages\polars\_utils\deprecation.py:92, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     87 @wraps(function)
     88 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     89     _rename_keyword_argument(
     90         old_name, new_name, kwargs, function.__qualname__, version
     91     )
---> 92     return function(*args, **kwargs)

File C:\micromamba\envs\cad\Lib\site-packages\polars\_utils\deprecation.py:92, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     87 @wraps(function)
     88 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     89     _rename_keyword_argument(
     90         old_name, new_name, kwargs, function.__qualname__, version
     91     )
---> 92     return function(*args, **kwargs)

File C:\micromamba\envs\cad\Lib\site-packages\polars\io\ipc\functions.py:287, in read_ipc_stream(source, columns, n_rows, use_pyarrow, storage_options, row_index_name, row_index_offset, rechunk)
    284             df = df.slice(0, n_rows)
    285         return df
--> 287 return _read_ipc_stream_impl(
    288     data,
    289     columns=columns,
    290     n_rows=n_rows,
    291     row_index_name=row_index_name,
    292     row_index_offset=row_index_offset,
    293     rechunk=rechunk,
    294 )

File C:\micromamba\envs\cad\Lib\site-packages\polars\io\ipc\functions.py:312, in _read_ipc_stream_impl(source, columns, n_rows, row_index_name, row_index_offset, rechunk)
    309     columns = [columns]
    311 projection, columns = parse_columns_arg(columns)
--> 312 pydf = PyDataFrame.read_ipc_stream(
    313     source,
    314     columns,
    315     projection,
    316     n_rows,
    317     parse_row_index_args(row_index_name, row_index_offset),
    318     rechunk,
    319 )
    320 return wrap_df(pydf)

OSError: failed to fill whole buffer

Issue description

When running the above code with use_pyarrow=False, the following error occurs:

OSError: failed to fill whole buffer

Expected behavior

both use_pyarrow=False, use_pyarrow=True return the same results.

Installed versions

--------Version info---------
Polars:              1.20.0
Index type:          UInt32
Platform:            Windows-10-10.0.26100-SP0
Python:              3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:40:50) [MSC v.1937 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
<not installed>      
<not installed>ager  
5.5.0r               
<not installed>      
<not installed>      
3.1.0pickle          
<not installed>      
0.24.0ake            
0.12.0cel            
2024.10.0            
24.11.1              
<not installed>      
<not installed>      
3.9.3otlib           
1.6.0asyncio         
1.26.4               
<not installed>      
2.2.3s               
18.0.0w              
<not installed>      
<not installed>      
2.0.36hemy           
<not installed>      
<not installed>      
3.2.0riter   
@ruoyu0088 ruoyu0088 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 21, 2025
@nameexhaustion nameexhaustion added accepted Ready for implementation P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Jan 21, 2025
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jan 21, 2025
@nameexhaustion
Copy link
Collaborator

Note: this does not reproduce on MacOS, but does on Windows

@nameexhaustion nameexhaustion added P-low Priority: low and removed P-medium Priority: medium labels Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

2 participants