-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'read_csv' returns wrong number of records #6865
Comments
Thanks for the bug report; just to clarify, from the obfuscated data it appears that you are loading logged/multi-line SQL queries? As a temporary workaround, you could do the following to get the data into polars: with open( "issue.csv", "r" ) as f:
iter_csv = csv.reader( f )
columns = next( iter_csv )
df = pl.DataFrame(
data = iter_csv,
schema = columns,
)
# ┌──────────┬────────────────────┬─────────────────────┐
# │ type ┆ dbname ┆ dump │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str │
# ╞══════════╪════════════════════╪═════════════════════╡
# │ database ┆ connections-prod ┆ -- │
# │ ┆ ┆ x x x.x_x_x_x_x_x x │
# │ ┆ ┆ x x.x, │
# │ ┆ ┆ ... │
# │ database ┆ content-prod ┆ -- │
# │ ┆ ┆ -- x x x │
# │ ┆ ┆ -- │
# │ ┆ ┆ │
# │ ┆ ┆ -- x x x x 11.16... │
# │ database ┆ notifications-prod ┆ -- │
# │ ┆ ┆ -- x x x │
# │ ┆ ┆ -- │
# │ ┆ ┆ │
# │ ┆ ┆ -- x x x x 11.16... │
# │ database ┆ users-prod ┆ -- │
# │ ┆ ┆ -- x x x │
# │ ┆ ┆ -- │
# │ ┆ ┆ │
# │ ┆ ┆ -- x x x x 11.16... │
# └──────────┴────────────────────┴─────────────────────┘ |
yes, it's multi-line SQL queries generated from pg_dumpall |
Right, polars succeeds if we set This does work: pl.read_csv("/home/ritchie46/Downloads/issue.csv", n_threads=1) |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
I did comparison with CSV python standard library, and the number of records read by polars was not matched.
The file attached is actually database dump, but I already obscure it. But the output still the same.
issue.csv
Reproducible example
Output:
---Version info---
Polars: 0.16.4
Index type: UInt32
Platform: Linux-5.15.0-60-generic-x86_64-with
Python: 3.9.16 (main, Dec 10 2022, 13:47:19)
[GCC 10.3.1 20210424]
---Optional dependencies---
pyarrow:
pandas:
numpy:
fsspec:
connectorx:
xlsx2csv:
deltalake:
matplotlib: ```
The text was updated successfully, but these errors were encountered: