Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on compression when writing large parquet files. #3407

Closed
SimonSchneider opened this issue May 16, 2022 · 4 comments
Closed

Panic on compression when writing large parquet files. #3407

SimonSchneider opened this issue May 16, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@SimonSchneider
Copy link
Contributor

What language are you using?

Rust

Which feature gates did you use?

latest:
polars = { git = "https://github.com/pola-rs/polars", rev = "91e089b920b0928eb1dbfa0c1bd7ba0935e92f8f", features = ["lazy", "is_in", "parquet", "json", "strings", "sort_multiple", "round_series"] }

Describe your bug.

Writing "large" parquet files fails

What are the steps to reproduce the behavior?

    #[test]
    fn write_large_parquet() -> Result<()> {
        let n = 1000;
        let vec: Vec<i32> = (0..n).collect();
        let mut df = DataFrame::new(vec![Series::new("num", vec)])?;
        let mut buf = Cursor::new(Vec::new());
        ParquetWriter::new(&mut buf).finish(&mut df)?;
        Ok(())
    }

What is the actual behavior?

Running with n = 100 the test passes but for n=1000 it panics with:

running 1 test
Error: External format error: underlying IO error: Compression failed

I believe this to be fixed in jorgecarleitao/parquet2#140 but Arrow and Polars would have to be updated but I'm not sure if this is actually the cause.

Feel free to close this issue if you don't think it gives value to track this on the Polars repo.

@SimonSchneider SimonSchneider added the bug Something isn't working label May 16, 2022
@ritchie46
Copy link
Member

Yes, this should already been fixed. @jorgecarleitao issued a new parquet2 release a few days ago. It should be automatically used if you remove your lockfile.

@SimonSchneider
Copy link
Contributor Author

SimonSchneider commented May 16, 2022

I'll give it a try, must have missed that, wasn't aware that it was possible to have to ditch the lock file

@SimonSchneider
Copy link
Contributor Author

seems it was indeed the lockfile that had to be removed, sorry about the false positive. Was not aware that the lockfile could cause issues like that.

@ritchie46
Copy link
Member

Was not aware that the lockfile could cause issues like that.

No worries. A lockfile fixes the versions and does not automatically take latest patch releases. I believe cargo update can also help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants