feature request: allow NaN handling to be specified per column in read_csv() #20877

jowagner · 2018-04-30T09:41:11Z

Setting na_values = [], keep_default_na = False seems to be the way to go to read data with string columns. (The default behaviour is to stay according to a comments in issue #15669.) However, if the data also contains number columns the user may want to process NaNs is those columns, for example:

import pandas
import io
pandas.read_csv(io.StringIO("""col1,col2
1.23,NA
NA,NB
"""), dtype=str, na_values=[], keep_default_na=False)

   | col1 | col2
-- | ---- | ----
 0 | 1.23 | NA
 1 | NA   | NB

The parameters should be extended so that one can specify the NaN treatment for each column, or better for subsets of columns. I see that @HHest also made this suggesting in a comment in issue #15669.

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-04-30T18:23:31Z

Unless I'm missing something with your request this is already supported and mentioned in the read_csv documentation:

In [3]: import pandas
   ...: import io
   ...: pandas.read_csv(io.StringIO("""col1,col2
   ...: 1.23,NA
   ...: NA,NB
   ...: """), dtype=str, na_values={'col1': ['NA'], 'col2': []}, keep_default_na
   ...: =False)

Out[3]: 
   col1 col2
0  1.23   NA
1   NaN   NB

jowagner · 2018-05-01T06:52:43Z

You are right. Thanks.

At least now we have your code example here on a page with keywords that people are likely to search when they have the same problem.

I'll add clarification to my documentation issue #20875.

jowagner closed this as completed May 1, 2018

OzzyXu mentioned this issue Jan 1, 2024

Ticker name "NA" makes the exists_qlib_data function report errors. microsoft/qlib#1720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: allow NaN handling to be specified per column in read_csv() #20877

feature request: allow NaN handling to be specified per column in read_csv() #20877

jowagner commented Apr 30, 2018

WillAyd commented Apr 30, 2018

jowagner commented May 1, 2018

feature request: allow NaN handling to be specified per column in read_csv() #20877

feature request: allow NaN handling to be specified per column in read_csv() #20877

Comments

jowagner commented Apr 30, 2018

WillAyd commented Apr 30, 2018

jowagner commented May 1, 2018