-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_csv (and any other in the write_delim family) can't handle NaN values #1082
Comments
I don't think the readr behavior is likely to change. However you can convert the NaNs to normal NAs yourself prior to writing if this is the behavior you prefer. dat <-
tibble::tibble(a = c(NA, 1:10),
b = c(NA, 11:20),
c = c(NA, 21:25, NaN, 27:30),
d = c(NA, 31:35, NA, 37:40)
)
dat[] <- lapply(dat, function(x) { x[is.nan(x)] <- NA; x })
dat
#> # A tibble: 11 x 4
#> a b c d
#> <int> <int> <dbl> <int>
#> 1 NA NA NA NA
#> 2 1 11 21 31
#> 3 2 12 22 32
#> 4 3 13 23 33
#> 5 4 14 24 34
#> 6 5 15 25 35
#> 7 6 16 NA NA
#> 8 7 17 27 37
#> 9 8 18 28 38
#> 10 9 19 29 39
#> 11 10 20 30 40 Created on 2020-03-13 by the reprex package (v0.3.0)
|
Of course it's possible to work around this. But that assumes at least one extra step prior to every write operation--either just going ahead and doing an adjustment pass over the whole data like you described, or something like any(
vapply(dat, function(x) {any(is.nan(x))}, T)
)
# or
purrr::map_lgl(dat, ~is.nan(.x) %>% any) %>% any just to see if whether any fix is needed. Given that there would be additional steps if the above returns TRUE, in most cases I may as well just run something your lapply code regardless. But, if I as a user end up needing to consistently write custom code so another function can be trustworthy and viable, isn't that a strong argument for that function having something that needs addressing? I'm guessing that it has something to do with NaN being an ISO defined type, the fact that this is all being passed to C for speed, and a situation where internally implementing something like what you described prior to the .Call() would undermine the speed case? If so, I get the thinking. The additional speed is nice, as are the other features like sensible defaults and pipe suitability. Still, it's just really strange to see an R function trying to improve on a basic workflow task while introducing such a low-level deviation from R-like behavior (and more practically, deviating from the explicit behavior of its predecessor One that isn't even documented--I had to do my own manual investigation when my files weren't readable to another program (by Mplus in my case). At the absolute least, surely someone could cut down on future users' confusion with an update to the help page? |
Hi, working with readr version 1.3.1. In this and previous versions, I've found that whenever there are NaN's in the data,
write_csv
presents no options for handling them. They are always written out as NaN, no matter what is passed to the parameter "na".Within R, of course, NaN's function as a type of NA's, and can sometimes emerge organically from normal operations where "true" NA's are present, as shown in the reprex.
The complimentary reading functions within R (
read_csv
andread.csv
) can recognize the NaN's and thus maintain them and any expected behavior, but this presents problems for moving data to any outside environment where the user would expect that data to be marked as missing.I could see preserving NaN's for some users if that avoids information loss in their data, but I'd love to at least have a choice of behavior. And if I'm being opinionated, I think the reasonable default behavior would handle NaN's uniformly with writing all the other classes/types of NA, fitting the expectations of most users who are reasoning from the fact that
is.na(NaN)
evaluates toTRUE
.The text was updated successfully, but these errors were encountered: