-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polars can't read timestamp[s]
typed columns in parquet files made by pyarrow
#2543
Comments
@jorgecarleitao is this maybe a conversion error in arrow2/parquet2? import pyarrow.parquet as pq
seconds = [1600000000, 1700000000]
pt = pa.table([
pa.array(seconds, type=pa.timestamp("s")),
], names=["datetime[s]"])
pq.write_table(pt, "test.parquet")
df = pl.read_parquet("test.parquet")
# undo the conversion done by polars
seconds_read = df.to_series().cast(int) // 1000
for a, b in zip(seconds_read, seconds):
print(a - b)
assert seconds_read.to_list() == seconds
Writing /reading arrow2Writing and reading timestamp with seconds = [1600000000, 1700000000]
df = pl.DataFrame({
"time": seconds
}).with_column(pl.col("time").cast(pl.Datetime))
df.to_parquet("test.parquet")
df = pl.read_parquet("test.parquet")
assert df.to_series().cast(int).to_list() == seconds |
looking into it |
Done in jorgecarleitao/arrow2#803 . Thanks for the ping! |
Thanks for the fix!😉 |
This was referenced Feb 6, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Are you using Python or Rust?
Python
What version of polars are you using?
polars-0.12.20
What operating system are you using polars on?
Linux (Debian 11)
Describe your bug.
Polars cannot accurately read the datetime from Parquet files created with
timestamp[s]
in pyarrow.I have not been able to determine if this is a problem with polars or arrow2, but since multiple Parquet readers other than polars did not have the problem, I do not think it is a problem with the official arrow library. I apologize if this is not a submission to the appropriate repository.
What are the steps to reproduce the behavior?
What is the expected behavior?
It needs to be interpreted as exact time like other tools such as pyarrow.
The text was updated successfully, but these errors were encountered: