Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings with commas and >= 64 character length breaks quoting with write_csv #3232

Closed
miXwui opened this issue Apr 26, 2022 · 0 comments · Fixed by jorgecarleitao/arrow2#965
Labels
bug Something isn't working

Comments

@miXwui
Copy link

miXwui commented Apr 26, 2022

What language are you using?

Tested on both Python and Node.js

Have you tried latest version of polars?

Yes

What version of polars are you using?

Python: 0.13.25
Node.js: 0.4.1

What operating system are you using polars on?

Fedora 36 Beta/Linux

What language version are you using

Python 3.10.4
Node v16.13.1

Describe your bug.

Any string that is >= 64 characters and also contains a comma , will break proper quoting with write_csv.

What are the steps to reproduce the behavior?

import polars as pl

# >= 64 character string with comma breaks
df_64_char = pl.DataFrame(
{
    "col1": ["foo"],
    "col2": ["bar,123456789012345678901234567890123456789012345678901234567890"],
  }
)

df_64_char.write_csv("broken_64_output.csv")

What is the actual behavior?

col1,col2
foo,bar,123456789012345678901234567890123456789012345678901234567890"

What is the expected behavior?

col1,col2
foo,"bar,123456789012345678901234567890123456789012345678901234567890"

Strings with commas that are <= 63 characters output correctly:

import polars as pl

# <= 63 character string works
df_63_char = pl.DataFrame(
{
    "col1": ["foo"],
    "col2": ["ba,123456789012345678901234567890123456789012345678901234567890"],
  }
)

df_63_char.write_csv("working_63_output.csv")

working_63_output.csv

col1,col2
foo,"ba,123456789012345678901234567890123456789012345678901234567890"

As well as converting polars into a pandas dataframe, then exporting to CSV via pandas:

import polars as pl
import pandas as pd

df_64_char = pl.DataFrame(
{
    "col1": ["foo"],
    "col2": ["bar,123456789012345678901234567890123456789012345678901234567890"],
  }
)

dfp = df_64_char.to_pandas()
dfp.to_csv("pandas_working_64_output.csv", index=False)

pandas_working_64_output.csv

col1,col2
foo,"bar,123456789012345678901234567890123456789012345678901234567890"

What do you think polars should have done?

Correctly quote any strings of any length, including those that contain any number of commas ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant