You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to be able to add an ID column to the output of map_dfr that's not defined by the names of the result, but rather is specified in the .id argument.
Inspired by this Stack Overflow question, where the OP wants to get the min and max of each column of a data frame, and have the results, in rows, with a column labeling the min and max row.
library(purrr)
library(dplyr, warn.conflicts=FALSE)
## this is the idea - but I wish we didn't need the `mutate`
map_dfr(mtcars, ~c(min(.), max(.))) %>%
mutate(stat= c("min", "max")) %>%
select(stat, everything())
#> # A tibble: 2 x 12#> stat mpg cyl disp hp drat wt qsec vs am gear carb#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 min 10.4 4 71.1 52 2.76 1.51 14.5 0 0 3 1#> 2 max 33.9 8 472 335 4.93 5.42 22.9 1 1 5 8## I would love if a named list given to `.id` would add values ## unfortunately, the .id argument seems to be silently ignored in this case
map_dfr(mtcars, ~c(min(.), max(.)), .id=list(stat= c("min", "max")))
#> # A tibble: 2 x 11#> mpg cyl disp hp drat wt qsec vs am gear carb#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 10.4 4 71.1 52 2.76 1.51 14.5 0 0 3 1#> 2 33.9 8 472 335 4.93 5.42 22.9 1 1 5 8## This surprised me - if `.f` returns a named vector, the output is long, not wide. Undocumented feature?
map_dfr(mtcars, ~c("min"= min(.), "max"= max(.)))
#> # A tibble: 11 x 2#> min max#> <dbl> <dbl>#> 1 10.4 33.9 #> 2 4 8 #> 3 71.1 472 #> 4 52 335 #> 5 2.76 4.93#> 6 1.51 5.42#> 7 14.5 22.9 #> 8 0 1 #> 9 0 1 #> 10 3 5 #> 11 1 8## When `.f` returns a named vector, the `.id` arg is necessary to make sense of the output
map_dfr(mtcars, ~c("min"= min(.), "max"= max(.)), .id="stat")
#> # A tibble: 11 x 3#> stat min max#> <chr> <dbl> <dbl>#> 1 mpg 10.4 33.9 #> 2 cyl 4 8 #> 3 disp 71.1 472 #> 4 hp 52 335 #> 5 drat 2.76 4.93#> 6 wt 1.51 5.42#> 7 qsec 14.5 22.9 #> 8 vs 0 1 #> 9 am 0 1 #> 10 gear 3 5 #> 11 carb 1 8
This starts to feel a bit like code golf, in the sense that I'm not sure how easy it is to revisit and modify this code. But here is one more way to get what you want:
library(tidyverse)
list(min=min, max=max) %>%
map_dfr(~ map_dfc(mtcars, .x), .id="stat")
#> # A tibble: 2 x 12#> stat mpg cyl disp hp drat wt qsec vs am gear carb#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 min 10.4 4 71.1 52 2.76 1.51 14.5 0 0 3 1#> 2 max 33.9 8 472 335 4.93 5.42 22.9 1 1 5 8
Created on 2020-08-26 by the reprex package (v0.3.0.9001)
If you want one row per summary stat, then what you're really mapping over is min() and max().
The convention of using .id to extract column values from names is now implemented pretty deeply in vctrs, and I don't think we want to expand that convention at the moment.
I'd like to be able to add an ID column to the output of
map_dfr
that's not defined by the names of the result, but rather is specified in the.id
argument.Inspired by this Stack Overflow question, where the OP wants to get the min and max of each column of a data frame, and have the results, in rows, with a column labeling the
min
andmax
row.Created on 2020-08-26 by the reprex package (v0.3.0)
I think it would be intuitive and useful if specifying
.id = list(stat = c("min", "max"))
created a column namedstat
taking on values"min"
"max"
.The text was updated successfully, but these errors were encountered: