-
Notifications
You must be signed in to change notification settings - Fork 224
io::json enhancements #1178
Comments
Thank you for this issue and for the through explanation. 🙇 I think this makes a lot of sense indeed! Answering the questions:
|
Hey @jorgecarleitao - we've integrated and tested the work on our side, and I'd like to start contributing back. Here are the changes: main...WallarooLabs:arrow2:main Would you prefer this as one large (cleaned up) PR? Or would you prefer me to rebase our changes and submit them in smaller chunks? If we go the smaller route, there are ten commits for super small changes and three PRs for medium-sized reviews... |
Wow, awesome!!! Either is fine. The easiest for documentation purposes is to split it according to the main github labels (which correspond to entries in the change log):
so it is easier to document what happened to the crate when people update. Looking through it, I think I only have minor comments on naming - trait About the comment
one way to address it is to create an auxiliary |
Work has been crazy, but I've finally gotten around to repackaging this. I believe it's a backwards-compatible new feature, not sure if I should slap the |
Closed by #1275 |
Hey!
We'd like to add some functionality to
io::json
, and I wanted to check if it would make sense as a contribution to the project.In particular, we'd like to add support for the
records
-oriented output of theto_json
andfrom_json
Pandas methods. We probably also want to add support for thesplit
orientation in the near future, but we'd like to start withrecords
.This is the rough path I've worked out to do so:
json::write::RecordSerializer
alongsidejson::write::Serializer
. Instead of outputting a batch of arrays, this takes aChunk
and outputs record blobs in the transposed format.infer_records_schema
method injson::read::infer_schema
. This would infer aSchema
from the first row of inputtedjson
in Pandasrecords
format.json::read::deserialize_from_records
that takes the aboveSchema
and outputs the correctChunk
. This is the hardest part to untangle so far. My current rough plan of attack is to modify thedeserialize_xxx
methods todeserialize_into_xxx
, and take a&mut
mutable array as a parameter instead of outputting finalized arrays. We can then "extend" the right data into each array per record, avoiding copies as we go. This is starting to feel like a pretty invasive change, in contrast to the previous two.I'm still figuring the last part out. I don't feel fully stuck, but I'm worried about the scope of the changes. Here's my questions:
io::json
module? I see other modules for other types of JSON in there (ndjson
,json_integration
), but I'm not sure what the cutoff is for a newio
module. This feels closer to the originalio::json
than either of those two formats?Finally: thanks for working so hard on this crate. It's been hugely useful for us at Wallaroo.
The text was updated successfully, but these errors were encountered: