Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Implement dictionary required for binary parquet data #418

Closed
Dandandan opened this issue Sep 17, 2021 · 2 comments · Fixed by #419
Closed

Implement dictionary required for binary parquet data #418

Dandandan opened this issue Sep 17, 2021 · 2 comments · Fixed by #419
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@Dandandan
Copy link
Collaborator

I got this error when trying out arrow2 on Parquet data via DataFusion / TPC-H benchmark.

Error: ArrowError(External("", ArrowError(External("", Execution("Arrow error: Not yet implemented: Decoding \"RleDictionary\"-encoded, dictionary-encoded required V1 pages is not yet implemented for Binary")))))

@jorgecarleitao
Copy link
Owner

Datafusion only benchmarks with ideal data: who uses required fields in parquet in real life? ^_^

@Dandandan
Copy link
Collaborator Author

Datafusion only benchmarks with ideal data: who uses required fields in parquet in real life? ^_^

LOL 😆

Not joking - I think IRL of data can be, especially expires from relational databases.
Also new tech like delta lake has (e.g. non null) constraints (not sure if it changes the parquet output though) now which hopefully means everyone is going to finally have good data quality 😂

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Oct 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants