Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map_dfr() column binds vectors #472

Closed
garrettgman opened this issue Mar 9, 2018 · 8 comments · Fixed by #912
Closed

map_dfr() column binds vectors #472

garrettgman opened this issue Mar 9, 2018 · 8 comments · Fixed by #912
Labels
feature a feature request or enhancement map 🗺️
Milestone

Comments

@garrettgman
Copy link
Member

map_dfr() does the same thing as map_dfc() when given a list of vectors: it column binds the vectors into a data frame.

list_of_vecs <- list(a = c(1,1,1, 1), b = c(2, 2, 2, 2), c = c(3, 3, 3, 3))
list_of_vecs %>% map_dfr(~.x)
## # A tibble: 4 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     2     3
## 2     1     2     3
## 3     1     2     3
## 4     1     2     3

I'd expect map_dfr() to row bind the vectors into a data frame with one long column, or to throw an error.

@jennybc
Copy link
Member

jennybc commented Mar 9, 2018

Past thread re: difficulty of row binding: #179

@hadley hadley added feature a feature request or enhancement map 🗺️ labels May 5, 2018
@lionel-
Copy link
Member

lionel- commented Nov 29, 2018

I'd expect map_dfr() to row bind the vectors into a data frame with one long column, or to throw an error.

Perhaps taking each vector as a row would be more natural? They'd need consistent internal names.

Right now they are taken as columns because bind_rows() and bind_cols() have somewhat sloppy semantics and interpret lists as data frames. This will need to wait on the new vctrs-based tools.

@lionel- lionel- added this to the vctrs milestone Nov 29, 2018
@jennybc
Copy link
Member

jennybc commented Nov 29, 2018

tibble is also in a holding pattern on a related matter. I might add a discussion of this to my existing slot in Monday's group meeting.

@CottonRockwood
Copy link

My expectation would be that map_dfr would r_bind the results as rows if they are vectors while map_dfc would combine the the vectors as columns.

@DavisVaughan DavisVaughan mentioned this issue Jun 18, 2019
2 tasks
@hadley
Copy link
Member

hadley commented Jan 24, 2020

Should switch to vec_rbind() and vec_cbind() and see what breaks.

@leungi
Copy link

leungi commented Apr 13, 2020

Faced similar issues when trying to get a tidy tibble from a vector of file paths off fs::dir_ls().

Ended up with purrr::map() + tibble::enframe() + tidyr::unnest() combo.

> library(dplyr)
> 
> file_ls <- fs::dir_ls("./raw_data/Guidelines/",
 regexp = "*.md"
 )
> 
> file_ls %>%
 .[1:2] %>%
 purrr::map_dfr(readr::read_file) %>%
 bind_rows(.id = "doc_id")
# A tibble: 1 x 3
  doc_id `./raw_data/Guidelines/ `./raw_data/Guidelines/
  <chr>  <chr>                                   <chr>                                 
  1      **Coiled~                **Operation~
> 
> file_ls %>%
 .[1:2] %>%
 purrr::map(readr::read_file) %>% 
 tibble::enframe() %>%
 tidyr::unnest()
# A tibble: 2 x 2
  name                                       value                                     
  <chr>                                      <chr>                                     
  ./raw_data/Guidelines/                     **Coiled~
  ./raw_data/Guidelines/                     **Operation~

@iagogv3
Copy link

iagogv3 commented Apr 24, 2020

I have a problem I believe it is related to this issue. Let me know if it is. I start with:

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared")

Let me test a variation. I am using CRAN version of dplyr, so I use yet group_nest instead of nest_by, so later it will be probably easier to do than this:

mtcars %>%
  group_nest(cyl) %>%
  mutate(model = purrr::map(.data$data, ~lm(mpg ~ wt, data = .x)),
         smodl = purrr::map(.data$model, summary),
         ramod = purrr::map_dbl(.data$smodl, "r.squared"),
         aramod = purrr::map_dbl(.data$smodl, "adj.r.squared"))

Then, my goal is: why have one to use 2 times map_dbl inside mutate when it should be possible to use once map_dfc or map_dfr inside bind_cols?

What follows is my attempt to get it. First, I define 3 functions and write in another way the last previous code:

rlm.sq <- function(slmod){
  slmod$r.squared
}
arlm.sq <- function(slmod){
  slmod$adj.r.squared
}
frlm.sq <- function(slmod){
  data.frame(r = slmod$r.squared, a =  slmod$adj.r.squared)
}
mtcars %>%
  group_nest(cyl) %>%
  mutate(model = purrr::map(.data$data, ~lm(mpg ~ wt, data = .x)),
         smodl = purrr::map(.data$model, summary),
         ramod = purrr::map_dbl(.data$smodl, ~rlm.sq(.x)),
         aramod = purrr::map_dbl(.data$smodl, ~arlm.sq(.x)))
# next, a test
mtcars %>%
  group_nest(cyl) %>%
  mutate(model = purrr::map(.data$data, ~lm(mpg ~ wt, data = .x)),
              smodl = purrr::map(.data$model, summary)) %>%
              slice(1) %>%
              pull(smodl) %>%
             extract2(1) %>%
             frlm.sq()

It works, but when I try

mtcars %>%
  group_nest(cyl) %>%
  mutate(model = purrr::map(.data$data, ~lm(mpg ~ wt, data = .x)),
         smodl = purrr::map(.data$model, summary)) %>%
         bind_cols(purrr::map_dfr(.data$smodl, ~frlm.sq(.x)))

I do not get anything. Maybe I should use map_dfc instead of map_dfr?:

mtcars %>%
  group_nest(cyl) %>%
  mutate(model = purrr::map(.data$data, ~lm(mpg ~ wt, data = .x)),
         smodl = purrr::map(.data$model, summary)) %>%
         bind_cols(purrr::map_dfc(.data$smodl, ~frlm.sq(.x)))

Same result.

Actually, in my real code I use neither lm function nor frlm.sq, but, instead of .data$smodl I have a data frame which I summarise to 3 scalar variables besides the grouping variables.

Thank you!

@hadley
Copy link
Member

hadley commented Sep 2, 2022

Summary of weirdness:

library(purrr)
c(a = 1, b = 2) %>% map_dfr(~ .x)
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2
c(a = 1, b = 2) %>% map_dfc(~ .x)
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

1:2 %>% map_dfr(~ .x)
#> Error in `dplyr::bind_rows()`:
#> ! Argument 1 must be a data frame or a named atomic vector.

#> Backtrace:
#>     ▆
#>  1. ├─1:2 %>% map_dfr(~.x)
#>  2. └─purrr::map_dfr(., ~.x)
#>  3.   └─dplyr::bind_rows(res, .id = .id)
#>  4.     └─rlang::abort(glue("Argument {i} must be a data frame or a named atomic vector."))
1:2 %>% map_dfc(~ .x)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> # A tibble: 1 × 2
#>    ...1  ...2
#>   <int> <int>
#> 1     1     2

1:2 %>% map_dfc(~ setNames(list(.x), letters[.x]))
#> # A tibble: 1 × 2
#>       a     b
#>   <int> <int>
#> 1     1     2
1:2 %>% map_dfr(~ setNames(list(.x), letters[.x]))
#> # A tibble: 2 × 2
#>       a     b
#>   <int> <int>
#> 1     1    NA
#> 2    NA     2

Created on 2022-09-02 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement map 🗺️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants