-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW2: support avro #1061
Comments
Is this about the dict issue or is it something else ? |
This is something else, we need to port your avro implementation to https://github.com/jorgecarleitao/arrow2/tree/main/src/io/avro for #68 ;) |
@Igosuki if you are interested in taking on this work, please coordinate with @jorgecarleitao . |
@houqp will do |
The approach I took in Since the avro format is relatively simple to read from, I just implemented a reader from bytes directly to arrow. I did not implemented it for all types, only for the basic ones, but the idea stands. So, if I understood, the goal is to generalize the parser to more types. Which ones are needed here? |
I think struct and list are the main ones: https://github.com/apache/arrow-datafusion/blob/5cc4e9f53fab29e81ea7c98baac8ce277a0cb54a/datafusion/src/avro_to_arrow/arrow_array_reader.rs#L530 |
There is enum, array, dictionary (map), and logical types, these are almost
the same as parquet, as parquet a was meant to be a columnar version of
Avro.
Le mer. 6 oct. 2021 à 07:02, QP Hou ***@***.***> a écrit :
… I think struct and list are the main ones:
https://github.com/apache/arrow-datafusion/blob/5cc4e9f53fab29e81ea7c98baac8ce277a0cb54a/datafusion/src/avro_to_arrow/arrow_array_reader.rs#L530
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1061 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADDFBX6BDS7EBDQATLEVDDUFPJ45ANCNFSM5E4QL77A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
ahhh, now it makes sense why they are so similar! Thanks a lot for sharing that. I've implemented the remaining types here: jorgecarleitao/arrow2#493 . We are missing |
@jorgecarleitao Nice one, seems like you got the encoding part almost rounded up |
@Igosuki I believe you have completed the migration for all avro related code right? Is there any left over for this issue? |
No, besides the patch I mentioned which enables arrays to be null https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/read/deserialize.rs#L101 |
Thanks @Igosuki for all your work on Avro :) |
Avro integration needs to be fixed.
The text was updated successfully, but these errors were encountered: