You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The columns directive for pl.read_ipc appears to have a regression in one of the recent updates; if a non-default column order is provided, the column names/data can load in an order that is neither the original order, nor the requested order. (A somewhat similar issue was previously fixed by #3591).
What are the steps to reproduce the behavior?
importpolarsaspldf=pl.DataFrame(
data= [
['x',123, 4.5, 'misc'],
['y',456,10.0,'other'],
['z',789,10.0,'value'],
],
columns= ['a','b','c','d'],
)
print( df )
# ┌─────┬─────┬──────┬───────┐# │ a ┆ b ┆ c ┆ d │# │ --- ┆ --- ┆ --- ┆ --- │# │ str ┆ i64 ┆ f64 ┆ str │# ╞═════╪═════╪══════╪═══════╡# │ x ┆ 123 ┆ 4.5 ┆ misc │# │ y ┆ 456 ┆ 10.0 ┆ other │# │ z ┆ 789 ┆ 10.0 ┆ value │# └─────┴─────┴──────┴───────┘# save frame data to feather/ipc filedf.write_ipc( 'test.feather' )
# load back in requested (different) column order: data gets scrambleddx=pl.read_ipc( 'test.feather', columns=['a','c','d','b'] )
print( dx )
# ┌─────┬───────┬─────┬──────┐# │ a ┆ c ┆ d ┆ b │ << column *names* are in the requested order,# │ --- ┆ --- ┆ --- ┆ --- │ but the associated column *data* is incorrect# │ str ┆ str ┆ i64 ┆ f64 │ # ╞═════╪═══════╪═════╪══════╡ col 'b' should have i64 data, not f64# │ x ┆ misc ┆ 123 ┆ 4.5 │ col 'c' should have f64 data, not str# │ y ┆ other ┆ 456 ┆ 10.0 │ col 'd' should have str data, not i64# │ z ┆ value ┆ 789 ┆ 10.0 │# └─────┴───────┴─────┴──────┘
What is the actual behavior?
Loaded column data is not correct.
What is the expected behavior?
Load the column data in the requested order.
dx=pl.read_ipc( 'test.feather', columns=['a','c','d','b'] )
print( dx )
# ┌─────┬──────┬───────┬─────┐# │ a ┆ c ┆ d ┆ b │# │ --- ┆ --- ┆ --- ┆ --- │# │ str ┆ f64 ┆ str ┆ i64 │# ╞═════╪══════╪═══════╪═════╡# │ x ┆ 4.5 ┆ misc ┆ 123 │# │ y ┆ 10.0 ┆ other ┆ 456 │# │ z ┆ 10.0 ┆ value ┆ 789 │# └─────┴──────┴───────┴─────┘
The text was updated successfully, but these errors were encountered:
Actually this still seems to be misbehaving (in a slightly different way?)
Using the same DataFrame as above ::
# save frame data to feather/ipc file in column-default orderdf.write_ipc( 'test.feather' )
# load back in requested (different) column orderdx=pl.read_ipc( 'test.feather', columns=['a','c','d','b'] )
print( dx )
# ┌─────┬───────┬─────┬──────┐# │ a ┆ d ┆ b ┆ c │ <<< columns not in requested order # │ --- ┆ --- ┆ --- ┆ --- │ (though associated with the correct datatype)# │ str ┆ str ┆ i64 ┆ f64 │ # ╞═════╪═══════╪═════╪══════╡# │ x ┆ misc ┆ 123 ┆ 4.5 │# ├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤# │ y ┆ other ┆ 456 ┆ 10.0 │# ├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤# │ z ┆ value ┆ 789 ┆ 10.0 │# └─────┴───────┴─────┴──────┘
What language/platform are you using?
Python 3.9, macOS 12.4, Polars 0.13.46
Describe your bug.
The
columns
directive forpl.read_ipc
appears to have a regression in one of the recent updates; if a non-default column order is provided, the column names/data can load in an order that is neither the original order, nor the requested order. (A somewhat similar issue was previously fixed by #3591).What are the steps to reproduce the behavior?
What is the actual behavior?
Loaded column data is not correct.
What is the expected behavior?
Load the column data in the requested order.
The text was updated successfully, but these errors were encountered: