read_ipc
and scan_ipc
use more memory than needed.
#17369
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Prepare a big file:
Then read the file and measure memory usage
Log output
No response
Issue description
The bug is that peak memory usage is way above 2 GB, the size of the loaded data.
If you test for different file sizes, you'll see that it is something around 1.5 x the dataframe size for polars 1.0.0. For a previous version I no longer run (thus I don't know the version of pyarrow), 0.20.6, it was 2 x more. This bug is related to #3360, an old issue which was supposedly solved.
Expected behavior
I would expect the memory usage to be equal to the file size plus some constant term.
Installed versions
The text was updated successfully, but these errors were encountered: