-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv() creates "spec" attributes which don't get updated nor lost by some tibble transformations #934
Comments
As for the "making a reprex" side of this ... I would define a small tibble inline: library(tidyverse)
tbl <- tibble(
A = c("a", "b", "c"),
B = 1:3
)
## or
tbl <- tribble(
~A, ~B,
"a", 1,
"b", 2,
"c", 3
) For larger tibbles, there nicer ways than x <- read_csv(readr_example("mtcars.csv"))
#> Parsed with column specification:
#> cols(
#> mpg = col_double(),
#> cyl = col_double(),
#> disp = col_double(),
#> hp = col_double(),
#> drat = col_double(),
#> wt = col_double(),
#> qsec = col_double(),
#> vs = col_double(),
#> am = col_double(),
#> gear = col_double(),
#> carb = col_double()
#> )
x2 <- x %>%
select(mpg, cyl, disp) %>%
head(3) Yes, it is true that attributes defined at read time persist after data manipulation. str(x2)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
#> $ mpg : num 21 21 22.8
#> $ cyl : num 6 6 4
#> $ disp: num 160 160 108
#> - attr(*, "spec")=
#> .. cols(
#> .. mpg = col_double(),
#> .. cyl = col_double(),
#> .. disp = col_double(),
#> .. hp = col_double(),
#> .. drat = col_double(),
#> .. wt = col_double(),
#> .. qsec = col_double(),
#> .. vs = col_double(),
#> .. am = col_double(),
#> .. gear = col_double(),
#> .. carb = col_double()
#> .. ) datapasta and deparse both offer ways to get nice code for x2_tribble_source <- datapasta::tribble_construct(x2)
cat(x2_tribble_source)
#> tibble::tribble(
#> ~mpg, ~cyl, ~disp,
#> 21, 6, 160,
#> 21, 6, 160,
#> 22.8, 4, 108
#> )
x2_df_source <- datapasta::df_construct(x2)
cat(x2_df_source)
#> data.frame(
#> mpg = c(21, 21, 22.8),
#> cyl = c(6, 6, 4),
#> disp = c(160, 160, 108)
#> ) Created on 2018-12-04 by the reprex package (v0.2.1.9000) |
Thank you very much Jenny for your thoughts.
Of course. Me too. The only reason I used
The problem wasn't the size of the tibble, but using
Ah! Great! I was playing with deparse::deparsec(tbl)
#> tibble(A = c("a", "b", "c"), B = 1:3) is certainly a lot nicer than my previous
Definitely updating my workshop right now! All that said, and even if, with those great alternatives to Anyway, thank you very sincerely. This has been very helpful for me. |
tibble methods preserving additional attributes is very new, it seems to be added in tidyverse/tibble@2cabe6d#diff-ccca386aac53cf0029fb15ebff8901d5, which is not yet on CRAN. The original behavior was they were lost as soon as you performed a manipulation. Anyway spec is meant to store how the data was originally read by readr, not how it currently looks, so even if it is preserved by further manipulations I don't think there is an issue. |
Continuing with the "reprex tips" re: workshop, both read.csv(text = "A,B\na,1\nb,2\nc,3")
#> A B
#> 1 a 1
#> 2 b 2
#> 3 c 3
readr::read_csv("A,B\na,1\nb,2\nc,3")
#> # A tibble: 3 x 2
#> A B
#> <chr> <dbl>
#> 1 a 1
#> 2 b 2
#> 3 c 3 Created on 2018-12-05 by the reprex package (v0.2.1.9000) You might want to favour datapasta over deparse, because data pasta is on CRAN. |
This is going truly far afield, but you can use readr::read_csv(glue::trim("
A,B
a,1
b,2
c,3
"))
#> # A tibble: 3 x 2
#> A B
#> <chr> <dbl>
#> 1 a 1
#> 2 b 2
#> 3 c 3 Created on 2018-12-05 by the reprex package (v0.2.1) |
To ensure the spec is dropped once they are subset. Fixes #934
To ensure the spec is dropped once they are subset. Fixes #934
Weird... here is the output of tibble * 1.4.2 2018-01-22 [2] CRAN (R 3.5.0) I don't think I am running the devel version... Anyway, it was not important as you said and thanks for fixing it! |
Thank you Jenny for the additional tips.
Both are amazing. But I was thinking of using deparse for my workshop actually... After playing with both for a bit, I thought that it was particularly clean and simple. Of all the options, this was the one I had settled on: tbl <- tibble::tibble(
A = c("a", "b", "c"),
B = 1:3
)
deparse::deparsec(tbl)
#> tibble(A = c("a", "b", "c"), B = 1:3) |
I won't use this for my workshop, but this can be handy to create toy examples quickly. Thanks 🙂 |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
read_csv()
(and probably the other read_delim funtions, though I haven't tested it) creates"spec"
attributes which do not get updated nor lost when the tibble gets transformed and thus can end up being in total mismatch with the tibble they are associated with.This does not create any problem, but it makes for very weird outputs to
str()
and creates unnecessarily lengthy outputs todput()
.Example:
I read a .csv file with
read_csv()
to create a tibble. I change its names, I select one variable out, I replace the remaining variables with vectors of different types and I end up with this tibble:So now, this is my tibble:
And it gives these
str()
anddput()
outputs:(Now the
"spec"
attribute has a wrong number of variables, wrong variable names, and wrong variable types 🤣 )(That's a long
dput()
output for such a simple tibble 😳 )Of course, using the result of
dput()
to create the object in the first place in this example makes it look very circular and kind of silly, but I have to do that to demonstrate the idea without forcing you to download a .csv file or create one. But this is not just a theoretical point: a real life and very common scenario where this kicks in is this:You create a tibble with
read_csv()
, you want to create a reprex, so you transform your tibble to make it very simple, then you usedput()
to create the data of your very basic tibble and you end up with a ton of silly attributes in the output. Of course, you can simply get rid of the"spec"
attribute from the result ofdput()
and all is good:gives the same tibble and it is easy enough to do. And even if you don't, those wrong attributes don't get in the way of anything. So not a big deal. But because it is somewhat hard to get new users to create good reproducible examples, it is a little quirk that doesn't help with explaining
dput()
.The text was updated successfully, but these errors were encountered: