feat(python): Parse JSON data in Utf8
to polars dtype
#6885
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was looking for a native replacement for a simple
apply(json.loads)
UDF that also worked well on lazy frames. I sawstr.json_path_match
but I really wanted a parsed struct (or whatever dtype) back, not a string value.It looks like some initial work on this started back in #3413 and got partially exposed in #5140. A private
Utf8Chunked.json_extract
helper was added, but it never was fully exposed publicly on the Py Series or Expr APIs. So this PR exposes it.The API optionally supports dtype inference on eager frames and series. When used on a lazy frame, the default unknown dtype will properly lead to an error. Additionally, a nice feature over
apply(json.loads)
is that a partial dtype can be supplied to omit struct keys you're not interested in decoding. Seems to be a nice property of usingread_ndjson
under the hood.How does the name
json_extract
still sound? That's the existing name of the internal function so just went with it.parse_json
might have been my first instinct, but deferring to the maintainers preference.