This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
Parquet writer stalls at a certain column size for Utf8
dtypes.
#1292
Labels
no-changelog
Issues whose changes are covered by a PR and thus should not be shown in the changelog
Continued from pola-rs/polars#3845, but worked out in
arrow2
only code extracted from the examples.Invoking the example below with
cargo run --release 150_000_000
ran in 6 seconds.But when we invoke with
cargo run --release 175_000_000
the program completely comes to a freeze. The memory is still slightly increasing and will consume much more memory than we'd expect by extrapolating from previous example (some sort of user stack increasing?).I killed the process after 5 minus as it doesn't seem to finish.
Two things I found.
Utf8
columns, not with numerical types.There is a stacktrace in the issue upstream.
Reproducable example
GDB stacktrace
If we ask gdb for a stacktrace on the moment the program freezes we get the following trace. This agrees with the culprits found in the flamegraph.
The text was updated successfully, but these errors were encountered: