-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: try abandon internal parquet2 patches #6067
refactor: try abandon internal parquet2 patches #6067
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
this PR will be postponed for a while, trying to port the |
This comment was marked as off-topic.
This comment was marked as off-topic.
This PR ready for review? @dantengsky |
not yet. hopefully, it will be ready next week. but if anything is blocked by this, I'd like to split this PR into two |
This pull request's title is not fulfill the requirements. @dantengsky please update it 🙏. The title should contain one if the following tags:
|
Oh, sorry just found I mistakenly marked this PR as ready for review. |
…-parquet2-patches
Pull `ArrayRef` out of namespace `arrow::array`
…-parquet2-patches
This PR seems ready now. @dantengsky Are there other issues that need addressing? |
stateful test not passed yet :-( (due to silly conditional compilation flags that I set up in our internal parquet2 repo) |
66a8775
to
662fa16
Compare
sorry, revert back to draft (there are typos to be corrected) |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
partially fixes:
All internal patches except the “legacy lz4”, are abandoned.
Before Fixed LZ4 jorgecarleitao/parquet2#95, there is an implementation of Lz4, which is not compatible with the cpp / java implementations. IMO, it is not reasonable(even if feasible) to ask upstream to keep it.
Unfortunately, at least, some stateful test data are compressed by using the non-standard "legacy" lz4.
Thus this PR still uses a patched version (in repo datafuse-extras). I suggest abandon this non-standard "legacy" lz4 patch in another PR (after we have made sure that no such legacy data exist)
arrow2
upgraded to 0.121.)
type ArrayRef = Arc<dyn Array>
of formerarrow2
has been removed, insteadBox<dyn Array>
is used.so, in databend, type
ArrayRef
is redefined asBox<dyn Array>
(and did the refactors by following rustc).Not sure if the name is still suitable, any suggestions are welcome.
2.) replace deprecated
null_count
withunset_bits
3.)
serialize_batch
now returnsResult<_>
instead of tuple, related codes are adjusted accordingly.parquet2
upgraded to 0.14Changelog
Related Issues
Fixes #6064