Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyFrame.schema fails with "Option::unwrap() on a None` value" #16442

Closed
2 tasks done
douglas-raillard-arm opened this issue May 23, 2024 · 6 comments
Closed
2 tasks done
Assignees
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@douglas-raillard-arm
Copy link
Contributor

douglas-raillard-arm commented May 23, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Not really a reproducer but this snippet has more or less the structure of the offending code (irrelevant columns have been left out):

import polars as pl

df = pl.LazyFrame(dict(
    a=[None], # Null
    b=[1], # Int64
))

df = df.with_columns(pl.col('a').cast(pl.Categorical))
df = df.rename({'b': 'b2'})

print(df.schema)

Important notes

  • holding onto the lazyframe before with_columns() seems hides the issue, so ref-counting is possibly involved (might be a coincidence)
  • If print(df.drop('b2').schema) or print(df.drop('a').schema) is used at the end, the issue goes away, so it's something about those columns specifically

Log output

This is the output of our failing test. Multiple dataframes are manipulated, so the verbose log of polars probably shows a bunch of irrelevant operations:


dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
file < 128 rows, no statistics determined
no. of chunks: 1 processed by: 1 threads.
thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs:169:87:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x7fbaa4b9a5f8 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc65a86809eb3aa65
   1:     0x7fbaa2498feb - core::fmt::write::hcd5b8dd8febb96a0
   2:     0x7fbaa4b6913e - std::io::Write::write_fmt::hc422b42d0849f877
   3:     0x7fbaa4b9f659 - std::sys_common::backtrace::print::h286fd4354e2ba39e
   4:     0x7fbaa4b9ef69 - std::panicking::default_hook::{{closure}}::hf9d4b516f8220f92
   5:     0x7fbaa4ba0115 - std::panicking::rust_panic_with_hook::h124d9722759d43e1
   6:     0x7fbaa4b9f9aa - std::panicking::begin_panic_handler::{{closure}}::h1123a3c792c1da95
   7:     0x7fbaa4b9f939 - std::sys_common::backtrace::__rust_end_short_backtrace::h33bd6640824974d0
   8:     0x7fbaa4b9f926 - rust_begin_unwind
   9:     0x7fbaa1355082 - core::panicking::panic_fmt::hea6c49867823d75c
  10:     0x7fbaa1355154 - core::panicking::panic::hfd7eccb65c6169e0
  11:     0x7fbaa1355518 - core::option::unwrap_failed::h86dc8dafdfc76144
  12:     0x7fbaa4754edc - polars_plan::logical_plan::optimizer::optimize::hb3b0736923ab863b
  13:     0x7fbaa37cf70b - polars_lazy::frame::LazyFrame::schema::he0004db7eedc884a
  14:     0x7fbaa233ff7e - polars::lazyframe::PyLazyFrame::__pymethod_schema__::hc6a208d444c5bf17
  15:     0x7fbaa1b9ad67 - pyo3::impl_::trampoline::trampoline::h74d1ea8ded88a273
  16:           0x5aa099 - _PyEval_EvalFrameDefault
  17:           0x65970e - _PyFunction_Vectorcall
  18:           0x5e37a0 - <unknown>
  19:           0x5f11d9 - <unknown>
  20:           0x64c0b9 - <unknown>
  21:           0x65a258 - _PyObject_Call
  22:           0x5abb40 - _PyEval_EvalFrameDefault
  23:           0x657c78 - <unknown>
  24:           0x65a29f - _PyObject_Call
  25:           0x5abb40 - _PyEval_EvalFrameDefault
  26:           0x657c78 - <unknown>
  27:           0x65a29f - _PyObject_Call
  28:           0x5abb40 - _PyEval_EvalFrameDefault
  29:           0x65970e - _PyFunction_Vectorcall
  30:           0x65cfe8 - _PyObject_FastCallDictTstate
  31:           0x65d0da - _PyObject_Call_Prepend
  32:           0x73116d - <unknown>
  33:           0x65a258 - _PyObject_Call
  34:           0x5abb40 - _PyEval_EvalFrameDefault
  35:           0x657c78 - <unknown>
  36:           0x65a29f - _PyObject_Call
  37:           0x5abb40 - _PyEval_EvalFrameDefault
  38:           0x5ba119 - <unknown>
  39:           0x5a8ac1 - _PyEval_EvalFrameDefault
  40:           0x582182 - <unknown>
  41:           0x54bbab - <unknown>
  42:           0x5acceb - _PyEval_EvalFrameDefault
  43:           0x657856 - <unknown>
  44:           0x65a218 - _PyObject_Call
  45:           0x5abb40 - _PyEval_EvalFrameDefault
  46:           0x65970e - _PyFunction_Vectorcall
  47:           0x5e37a0 - <unknown>
  48:           0x5f11d9 - <unknown>
  49:           0x65a6a5 - _PyObject_MakeTpCall
  50:           0x5a7b8f - _PyEval_EvalFrameDefault
  51:           0x657c78 - <unknown>
  52:           0x65a29f - _PyObject_Call
  53:           0x5abb40 - _PyEval_EvalFrameDefault
  54:           0x657c78 - <unknown>
  55:           0x65a29f - _PyObject_Call
  56:           0x5abb40 - _PyEval_EvalFrameDefault
  57:           0x657c78 - <unknown>
  58:           0x65a29f - _PyObject_Call
  59:           0x5abb40 - _PyEval_EvalFrameDefault
  60:           0x659d0e - _PyFunction_Vectorcall
  61:           0x5e37a0 - <unknown>
  62:           0x5f11d9 - <unknown>
  63:           0x65a6a5 - _PyObject_MakeTpCall
  64:           0x5a7b8f - _PyEval_EvalFrameDefault
  65:           0x657c78 - <unknown>
  66:           0x65a29f - _PyObject_Call
  67:           0x5abb40 - _PyEval_EvalFrameDefault
  68:           0x65970e - _PyFunction_Vectorcall
  69:           0x65cfe8 - _PyObject_FastCallDictTstate
  70:           0x65d0da - _PyObject_Call_Prepend
  71:           0x73116d - <unknown>
  72:           0x65a6a5 - _PyObject_MakeTpCall
  73:           0x5a7b8f - _PyEval_EvalFrameDefault
  74:           0x65970e - _PyFunction_Vectorcall
  75:           0x65cfe8 - _PyObject_FastCallDictTstate
  76:           0x65d0da - _PyObject_Call_Prepend
  77:           0x73116d - <unknown>
  78:           0x65a258 - _PyObject_Call
  79:           0x5abb40 - _PyEval_EvalFrameDefault
  80:           0x65970e - _PyFunction_Vectorcall
  81:           0x65cfe8 - _PyObject_FastCallDictTstate
  82:           0x65d0da - _PyObject_Call_Prepend
  83:           0x73116d - <unknown>
  84:           0x65a6a5 - _PyObject_MakeTpCall
  85:           0x5a7b8f - _PyEval_EvalFrameDefault
  86:           0x65970e - _PyFunction_Vectorcall
  87:           0x65cfe8 - _PyObject_FastCallDictTstate
  88:           0x65d0da - _PyObject_Call_Prepend
  89:           0x73116d - <unknown>
  90:           0x65a6a5 - _PyObject_MakeTpCall
  91:           0x5a7b8f - _PyEval_EvalFrameDefault
  92:           0x65970e - _PyFunction_Vectorcall
  93:           0x65cfe8 - _PyObject_FastCallDictTstate
  94:           0x65d0da - _PyObject_Call_Prepend
  95:           0x73116d - <unknown>
  96:           0x65a6a5 - _PyObject_MakeTpCall
  97:           0x5a7b8f - _PyEval_EvalFrameDefault
  98:           0x710209 - PyEval_EvalCode
  99:           0x6e782b - <unknown>
 100:           0x6e78c6 - <unknown>
 101:           0x6e7a6f - <unknown>
 102:           0x6e8495 - _PyRun_SimpleFileObject
 103:           0x6ec4e7 - _PyRun_AnyFileObject
 104:           0x6dfec1 - Py_RunMain
 105:           0x6e013d - Py_BytesMain
 106:     0x7fbac9c11083 - __libc_start_main
                               at /build/glibc-e2p3jK/glibc-2.31/csu/../csu/libc-start.c:308:16
 107:           0x6878fe - _start
 108:                0x0 - <unknown>
FAILED
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

self = <tests.test_trace.TestNestedTraceView testMethod=test_time_range_subscript>

    def test_time_range_subscript(self):
        expected_duration = 4.0
    
>       trace = Trace(
            self.trace_path,
            plat_info=self.plat_info,
            events=self.events,
            normalize_time=False,
            parser=TxtTraceParser.from_txt_file,
        )[76.402065:80.402065]

tests/test_trace.py:435: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
lisa/trace.py:6012: in __init__
    view = self._view_from_user_kwargs(*args, **kwargs)
lisa/trace.py:6061: in _view_from_user_kwargs
    view = trace.get_view(**view_kwargs)
lisa/trace.py:2747: in get_view
    view = _TraceViewBase.make_view(self, **kwargs)
lisa/trace.py:3104: in make_view
    view = _PreloadEventsTraceView(
lisa/trace.py:3408: in __init__
    preloaded = trace._preload_events(events)
lisa/trace.py:3460: in _preload_events
    preloaded = self.base_trace._preload_events(mapping.keys())
lisa/trace.py:3131: in _preload_events
    return self.base_trace._preload_events(*args, **kwargs)
lisa/trace.py:5083: in _preload_events
    df_map = self._load_cache_raw_df(
lisa/trace.py:5618: in _load_cache_raw_df
    df_from_trace = self._load_raw_df(events_to_load)
lisa/trace.py:5834: in _load_raw_df
    df_map = self._parse_raw_events(events)
lisa/trace.py:5665: in _parse_raw_events
    with self._get_parser(events) as parser:
/usr/lib/python3.12/contextlib.py:137: in __enter__
    return next(self.gen)
lisa/trace.py:5457: in cm
    parser = self._parser(
lisa/utils.py:3589: in wrapper
    self = meth(**kwargs)
lisa/utils.py:2015: in wrapper
    return wrapped(*args, **kwargs)
lisa/trace.py:1524: in from_txt_file
    return cls(lines=f, **kwargs)
lisa/utils.py:3456: in __call__
    return cls._make_instance(*args, **kwargs)
lisa/utils.py:3589: in wrapper
    self = meth(**kwargs)
lisa/utils.py:3608: in _make_instance
    return super(cls.__class__, cls.__class__).__call__(cls, *args, **kwargs)
lisa/utils.py:2015: in wrapper
    return wrapped(*args, **kwargs)
lisa/trace.py:1426: in __init__
    events_df, skeleton_df, time_range, available_events = self._eagerly_parse_lines(
lisa/trace.py:1730: in _eagerly_parse_lines
    df = self._postprocess_df(
lisa/trace.py:1940: in _postprocess_df
    if isinstance(df.schema.get('span'), (pl.String, pl.Binary, pl.Categorical)):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <LazyFrame at 0x7FB96C7E0C20>

    @property
    def schema(self) -> OrderedDict[str, DataType]:
        """
        Get a dict[column name, DataType].
    
        Examples
        --------
        >>> lf = pl.LazyFrame(
        ...     {
        ...         "foo": [1, 2, 3],
        ...         "bar": [6.0, 7.0, 8.0],
        ...         "ham": ["a", "b", "c"],
        ...     }
        ... )
        >>> lf.schema
        OrderedDict({'foo': Int64, 'bar': Float64, 'ham': String})
        """
>       return OrderedDict(self._ldf.schema())
E       pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

.lisa-venv-3.12/lib/python3.12/site-packages/polars/lazyframe/frame.py:452: PanicException
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /work/projects/lisa/.lisa-venv-3.12/lib/python3.12/site-packages/polars/lazyframe/frame.py(452)schema()
-> return OrderedDict(self._ldf.schema())
(Pdb) 

Issue description

Polars attempts to None.unwrap() which panics.

Expected behavior

df.schema should give back the schema in all circumstances

Installed versions

--------Version info---------
Polars:               0.20.28
Index type:           UInt32
Platform:             Linux-5.15.0-105-generic-x86_64-with-glibc2.31
Python:               3.12.3 (main, Apr 27 2024, 19:00:26) [GCC 9.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@douglas-raillard-arm douglas-raillard-arm added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 23, 2024
@stinodego stinodego added the A-panic Area: code that results in panic exceptions label May 23, 2024
@stinodego
Copy link
Member

Thanks for the report. I cannot reproduce the panic with your code, could you try to come up with a MRE?

@douglas-raillard-arm
Copy link
Contributor Author

douglas-raillard-arm commented May 23, 2024

I'm currently trying, so far no success. I tried dropping the Null column, pickling and unpickling and the issue disappears.

Also found something fishy on the offending LazyFrame: dropping the Null column "span" hides the issue:

(Pdb) df.drop('span').schema
OrderedDict({'Time': Int64, '__comm': Categorical(ordering='physical'), '__pid': Int64, '__cpu': Int64, 'overutilized': Boolean})

but then if I do that it fails:

df.drop('span').serialize()
*** pyo3_runtime.PanicException: not implemented for dtype Null

which is the same error as when trying that:

pl.LazyFrame(dict(a=[None])).serialize()

So somehow the Null column does not really disappear when it is dropped. This can be reproduced so I'll open another ticket, not sure if this is related.

EDIT: I added a comment regarding that Null serialize problem on issue 15150

@cmdlineluser
Copy link
Contributor

The error is coming from cluster_with_columns which is a new optimization.

thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs:169:87:

There are a couple of new issues just opened regarding this: #16436

(They are probably all the same underlying problem.)

@douglas-raillard-arm
Copy link
Contributor Author

@stinodego I could not make an MRE, but I have a pdb session open with the issue so I can run some snippets on that dataframe if needed.

@coastalwhite
Copy link
Collaborator

Fixed by #16443.

@douglas-raillard-arm
Copy link
Contributor Author

I built the PR and I cannot reproduce the issue anymore, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants