Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load JSON object from string with nested array. #3404

Closed
Yevgnen opened this issue May 16, 2022 · 7 comments · Fixed by #3406
Closed

Load JSON object from string with nested array. #3404

Yevgnen opened this issue May 16, 2022 · 7 comments · Fixed by #3406
Labels
bug Something isn't working

Comments

@Yevgnen
Copy link

Yevgnen commented May 16, 2022

What language are you using?

Python

Which feature gates did you use?

Load JSON object from string with nested array.

Have you tried latest version of polars?

Yes

What version of polars are you using?

0.13.34

What operating system are you using polars on?

macOS 12.3.1

What language version are you using

Python 3.8.13

Describe your bug.

Applying json.loads to string column with text like '[{"x": 1, "y": 2}, {"x": 3, "y": 4}]' gives PanicException.

What are the steps to reproduce the behavior?

import polars as pl

df = pl.DataFrame({"text": ['[{"x": 1, "y": 2}, {"x": 3, "y": 4}]']})
df.select(pl.col('text').apply(json.loads))

What is the actual behavior?

thread '<unnamed>' panicked at 'not implemented for dtype Object("object")', /Users/runner/work/polars/polars/polars/polars-core/src/chunked_array/builder/list.rs:362:5
stack backtrace:
   0:        0x130af73f6 - _rust_eh_personality
   1:        0x12fee407b - _BrotliDecoderVersion
   2:        0x130acba2c - _rust_eh_personality
   3:        0x130af840d - _rust_eh_personality
   4:        0x130af93e8 - _rust_eh_personality
   5:        0x130af8ed4 - _rust_eh_personality
   6:        0x130af8e49 - _rust_eh_personality
   7:        0x130af8e05 - _rust_eh_personality
   8:        0x130c24543 - _rust_eh_personality
   9:        0x1300badf0 - _rust_eh_personality
  10:        0x12f59ff8c - <unknown>
  11:        0x12f64e9a9 - <unknown>
  12:        0x12f882b45 - _PyInit_polars
  13:        0x1014f3ea6 - _method_vectorcall_VARARGS_KEYWORDS
  14:        0x1015c4d6f - _call_function
  15:        0x1015c18e3 - __PyEval_EvalFrameDefault
  16:        0x1015c5c8b - __PyEval_EvalCodeWithName
  17:        0x1014eb35b - __PyFunction_Vectorcall
  18:        0x1014edb4c - _method_vectorcall
  19:        0x1015c4d6f - _call_function
  20:        0x1015c1a3c - __PyEval_EvalFrameDefault
  21:        0x1015c5c8b - __PyEval_EvalCodeWithName
  22:        0x1014eb35b - __PyFunction_Vectorcall
  23:        0x1014eabd4 - _PyVectorcall_Call
  24:        0x12f40dafb - <unknown>
  25:        0x12f5cb8b1 - <unknown>
  26:        0x12f5cc7c9 - <unknown>
  27:        0x12f4179f1 - <unknown>
  28:        0x1308ecd32 - _rust_eh_personality
  29:        0x13097c2b7 - _rust_eh_personality
  30:        0x130975c42 - _rust_eh_personality
  31:        0x13097d1f0 - _rust_eh_personality
  32:        0x130caabcb - _rust_eh_personality
  33:        0x130a53836 - _rust_eh_personality
  34:        0x130a53360 - _rust_eh_personality
  35:        0x130afa5da - _rust_eh_personality
  36:     0x7ff80122b4e1 - __pthread_start
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
File ~/.direnv/python-3.8.13/lib/python3.8/site-packages/polars/internals/expr.py:1547, in Expr.apply.<locals>.wrap_f(x)
   1546 def wrap_f(x: "pli.Series") -> "pli.Series":  # pragma: no cover
-> 1547     return x.apply(f, return_dtype=return_dtype)

File ~/.direnv/python-3.8.13/lib/python3.8/site-packages/polars/internals/series.py:2576, in Series.apply(self, func, return_dtype)
   2574 else:
   2575     pl_return_dtype = py_type_to_dtype(return_dtype)
-> 2576 return wrap_s(self._s.apply_lambda(func, pl_return_dtype))

PanicException: not implemented for dtype Object("object")
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 df.select(pl.col('text').apply(json.loads))

File ~/.direnv/python-3.8.13/lib/python3.8/site-packages/polars/internals/frame.py:4475, in DataFrame.select(self, exprs)
   4433 def select(
   4434     self: DF,
   4435     exprs: Union[
   (...)
   4440     ],
   4441 ) -> DF:
   4442     """
   4443     Select columns from this DataFrame.
   4444 
   (...)
   4472 
   4473     """
   4474     return (
-> 4475         self.lazy()
   4476         .select(exprs)  # type: ignore
   4477         .collect(no_optimization=True, string_cache=False)
   4478     )

File ~/.direnv/python-3.8.13/lib/python3.8/site-packages/polars/internals/lazy_frame.py:586, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, string_cache, no_optimization, slice_pushdown)
    576     projection_pushdown = False
    578 ldf = self._ldf.optimization_toggle(
    579     type_coercion,
    580     predicate_pushdown,
   (...)
    584     slice_pushdown,
    585 )
--> 586 return self._dataframe_class._from_pydf(ldf.collect())

PanicException: Unwrapped panic from Python code

What is the expected behavior?

Should be no error.

@Yevgnen Yevgnen added the bug Something isn't working label May 16, 2022
@cjermain
Copy link
Contributor

It looks like the type that is inferred is an object instead of a list of structs. I'm actually working on a Rust solution for parsing JSON (#3373 and jorgecarleitao/arrow2#989) that will work with json_path_match. Once those changes land, you should be able to deserialize directly. In the meantime, I would suggest trying to pass the DataType explicitly to .apply.
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.apply.html?highlight=apply#polars.DataFrame.apply

@Yevgnen
Copy link
Author

Yevgnen commented May 16, 2022

Hi, thanks for the quick response.

I tired df.select(pl.col('text').apply(json.loads, pl.datatypes.List)) but it gives me null

shape: (1, 1)
┌──────┐
│ text │
│ ---  │
│ str  │
╞══════╡
│ null │
└──────┘

@Yevgnen
Copy link
Author

Yevgnen commented May 17, 2022

Thanks for your work.

I have one more question. The following code works

import polars as pl

df = pl.DataFrame(
    {"text": ['[{"x": 1, "y": 2}]', '[{"x": 1, "y": 2}, {"x": 3, "y": 4}]']}
)
df.select(pl.col("text").apply(json.loads))

while the following code fails

import polars as pl

df = pl.DataFrame(
    {"text": ['[{"x": 1, "y": 2}, {"x": 3, "y": 4}]', "[]"]}
)
df.select(pl.col("text").apply(json.loads))

Is [] not supported?

@cjermain
Copy link
Contributor

I can't reproduce the issue -- try using the latest release.

@Yevgnen
Copy link
Author

Yevgnen commented May 23, 2022

This has been fixed in #3433 . But the following code still does not work

import polars as pl

df = pl.DataFrame(
    {"text": ['[]', '[{"x": 1, "y": 2}, {"x": 3, "y": 4}]', '[{"x": 1, "y": 2}]']}
)
df.select(pl.col("text").apply(json.loads))

Note that the [] is the first element.

@cjermain
Copy link
Contributor

Thanks for raising this. I think the issue is that polars is not correctly combining the schemas. Can you raise this as a separate issue so it gets more visibility?

@Yevgnen
Copy link
Author

Yevgnen commented May 23, 2022

Done! #3478

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants