Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Writing a Dictionary encoded array with snappy compression leads to pyarrow error #516

Closed
ritchie46 opened this issue Oct 9, 2021 · 1 comment · Fixed by #523
Closed
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@ritchie46
Copy link
Collaborator

When reading a parquet file in pyarrow it leads to an OS-error saying that it was incorrectly snappy encoded.:

tests/test_io.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/miniconda3/lib/python3.9/site-packages/polars/io.py:539: in read_parquet
    pa.parquet.read_table(
/opt/miniconda3/lib/python3.9/site-packages/pyarrow/parquet.py:1895: in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
/opt/miniconda3/lib/python3.9/site-packages/pyarrow/parquet.py:1744: in read
    table = self._dataset.to_table(
pyarrow/_dataset.pyx:465: in pyarrow._dataset.Dataset.to_table
    ???
pyarrow/_dataset.pyx:3075: in pyarrow._dataset.Scanner.to_table
    ???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Corrupt snappy compressed data.
@ritchie46 ritchie46 changed the title Writing a Dictionary encoded array leads to pyarrow error Writing a Dictionary encoded array with snappy compression leads to pyarrow error Oct 9, 2021
@jorgecarleitao jorgecarleitao transferred this issue from jorgecarleitao/parquet2 Oct 10, 2021
@jorgecarleitao
Copy link
Owner

Thanks! (Migrated to arrow2).

Forgot to compress the dictionary page. I have a refactor of the parquet's API to avoid this, but still needs some work.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Oct 10, 2021
@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Oct 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants