-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved bitpacking #176
Improved bitpacking #176
Conversation
Codecov Report
@@ Coverage Diff @@
## main #176 +/- ##
==========================================
+ Coverage 85.29% 85.32% +0.03%
==========================================
Files 78 82 +4
Lines 7916 8110 +194
==========================================
+ Hits 6752 6920 +168
- Misses 1164 1190 +26
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@tustvold, you may wish to port the packing part to parquet. If it would be ok for you, the packing and unpacking functions that we wrote could live in a separate crate. I think that they are well encapsulated and folks that want this encoding somewhere could benefit from it. I for once would like to use them in https://github.com/DataEngineeringLabs/orc-format, since ORC also has bitpacked runs. |
c5312e9
to
111a240
Compare
6082798
to
1f597af
Compare
This PR ports apache/arrow-rs#2278 to parquet2. Credit to the design and implementation of the unpacking path go to @tustvold - it is 5-10% faster than the bitpacking crate 🚀 Additionally, it adds the corresponding packing code path, thereby completely replacing the dependency on bitpacking. It also adds some traits that allows code to be written via generics. A curious observation is that, with this PR, parquet2 no longer executes unsafe code (bitpacking had some) 🎉 Backward changes: renamed parquet2::encoding::bitpacking to parquet2::encoding::bitpacked parquet2::encoding::bitpacked::Decoder now has a generic parameter (output type) parquet2::encoding::bitpacked::Decoder::new's second parameter is now a usize
This PR ports apache/arrow-rs#2278 to parquet2. Credit to the design and implementation of the unpacking path go to @tustvold - it is 5-10% faster than the bitpacking crate 🚀
Additionally, it adds the corresponding packing code path, thereby completely replacing the dependency on
bitpacking
.It also adds some traits that allows code to be written via generics.
A curious observation is that, with this PR, parquet2 no longer executes
unsafe
code (bitpacking had some) 🎉Backward changes:
parquet2::encoding::bitpacking
toparquet2::encoding::bitpacked
parquet2::encoding::bitpacked::Decoder
now has a generic parameter (output type)parquet2::encoding::bitpacked::Decoder::new
's second parameter is now ausize