-
Notifications
You must be signed in to change notification settings - Fork 224
Add support to read JSON #712
Comments
Yes, definitely in scope. We currently already support writing to JSON like that. IMO we should aim to drop the |
Definitely agree. Serde is awesome because of its generic targets, but hard to optimize in my experience. |
if there was support for arrays, a lot of libraries support an optional indent parameter. Would it be a lot of extra effort to add something like this. python has |
@universalmind303 , yes, definitely in scope also. It needs some piping of the ident level for struct arrays, but imo nothing blocking. |
arrow supports both json lines and delimited json -- in case it is helpful for API design or implementation: |
Thanks Andrew!
Thanks Andrew! |
I think that both implement the same writer. The main difference between them at this point is #709 |
My random thoughts over this situation: Readingfor historical reasons our parser splits the file in lines via When a file can't be split in chunks, we can't read it in chunks. This means that we can't separate IO from CPU and need to read the whole file at once (e.g. via [
{"a": 1, "b": 2},
{"a": 2, "b": 20},
{
"a": 3,
"b": 30
},
{"a": 4, "b": 40}
] splitting it using the
i.e. we broke the 3rd record into pieces, thus making individual lines invalid JSON. What we want in this case is to be able to split it in something like
AFAIK this is not supported by The quick hack is to read the whole thing into IMO, for something like this, we need the an API similar to that of the This would allow to perform minimal CPU work on read, and support the same APIs we already have for pure CPU work: |
addressed by #712 (quick hack of the above comment) |
Nice, seems like a good starting point to improve upone. |
Currently we write and read JSON lines.
https://jsonlines.org/
I believe it would be a minor modification to also be able to read and write JSON. We could wrap the JSON Line values in an array and separate them with a
,
instead of anew line
char.E.g. now we write:
JSON Lines
JSON
Would this fit the scope of arrow2, as this is something different then what pyarrow does? I don't mean that we should drop the JSON Lines functionality, but that we also allow reading and writing JSON.
The text was updated successfully, but these errors were encountered: