Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read async #33

Merged
merged 8 commits into from
Aug 9, 2021
Merged

Add support to read async #33

merged 8 commits into from
Aug 9, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 2, 2021

Closes #32

Backward incompatible changes

Thrift code generation had a disruptive change from 0.13 (which parquet-format depends on) to 0.15 (latest thrift): thrift enums are no longer generated into rust enums in Camel case, but to structs with CAPITAL cases. Since this library uses enums for compile-time validation of the parameters (e.g. Compression), and because enums are the de facto way in rust of handling these, this PR had to bridge some of the APIs exposed by the code generation.

This also allowed to review all dependencies on the parquet-format and adjust accordingly. Therefore:

  • Encoding is now only exposed in encoding::Encoding
  • CompressionCodec was renamed to Compression and is only exposed in compression::Compression
  • There is now a trait DataPageHeaderExt to access some attributes from the code-generated DataPageHeader from parquet-format as this crates's enums (i.e. it performs the conversion automatically so that we do not have to worry about that).

Other notes

This replaces thrift and parquet-format by parquet-format-async-temp that contains a minimal thrift library to handle sync and async, together with the parquet-generated code against it using the compiler version that supports async. That crate is basically apache/thrift#2426 and apache/thrift#2426 in a single crate and exists because thrift has 2 releases every year or so and the PR has been 6 days without feedback, suggesting some delay on that front. Once these land, we migrate back to parquet-format-rs and thrift.

@jorgecarleitao jorgecarleitao marked this pull request as draft August 2, 2021 05:27
@jorgecarleitao jorgecarleitao changed the title Add support to read async metadata Add support to read async Aug 8, 2021
@jorgecarleitao
Copy link
Owner Author

update: this now includes reading of both metadata and pages (i.e. data).

@jorgecarleitao jorgecarleitao marked this pull request as ready for review August 8, 2021 15:51
@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2021

Codecov Report

Merging #33 (d36a11a) into main (fd93650) will decrease coverage by 0.94%.
The diff coverage is 54.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
- Coverage   68.79%   67.84%   -0.95%     
==========================================
  Files          61       62       +1     
  Lines        3326     3427     +101     
==========================================
+ Hits         2288     2325      +37     
- Misses       1038     1102      +64     
Impacted Files Coverage Δ
src/encoding/mod.rs 50.00% <ø> (ø)
src/error.rs 18.75% <0.00%> (ø)
src/lib.rs 76.73% <ø> (ø)
src/metadata/row_metadata.rs 42.85% <ø> (ø)
src/metadata/sort.rs 0.00% <ø> (ø)
src/page/page_dict/mod.rs 36.84% <ø> (ø)
src/read/compression.rs 55.93% <ø> (ø)
src/schema/types/basic_type.rs 100.00% <ø> (ø)
src/schema/types/parquet_type.rs 38.33% <ø> (ø)
src/schema/types/spec.rs 61.29% <ø> (ø)
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd93650...d36a11a. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for async read
2 participants