-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_dfc() fails when the result contains list columns #376
Comments
The problem is that dplyr::bind_cols(y = list(3, 4))
#> # A tibble: 1 x 2
#> V1 V2
#> <dbl> <dbl>
#> 1 3 4
dplyr::bind_cols(list(y = 3:4, z = 4:5))
#> # A tibble: 2 x 2
#> y z
#> <int> <int>
#> 1 3 4
#> 2 4 5 So list-columns are spliced as well unless we wrap all inputs: dplyr::bind_cols(x = 1:2, y = list(3, 4))
#> Error: Argument 2 must be length 2, not 1
dplyr::bind_cols(list(x = 1:2, y = list(3, 4)))
#> # A tibble: 2 x 2
#> x y
#> <int> <list>
#> 1 1 <dbl [1]>
#> 2 2 <dbl [1]> These functions need to replaced with versions that use explicit splicing with In the meantime, we can wrap list-columns in lists to protect them. This might break existing code, but I think it's worth moving towards a consistent handling of lists in data frames. |
Will need to wait until a vctrs replacement for |
I think we can probably now switch from library(purrr)
nested <- list(
col1 = list(
c("Apple", "Banana"),
c("Orange")
),
col2 = list(
c("Baseball", "Soccer"),
c("Football")
)
)
str(vctrs::vec_cbind(!!!map(nested, map, sprintf, fmt = "I like %s")))
#> 'data.frame': 2 obs. of 2 variables:
#> $ col1:List of 2
#> ..$ : chr "I like Apple" "I like Banana"
#> ..$ : chr "I like Orange"
#> $ col2:List of 2
#> ..$ : chr "I like Baseball" "I like Soccer"
#> ..$ : chr "I like Football" Created on 2022-08-24 by the reprex package (v2.0.1) |
The key idea is to introduce a new family of "combining" functions: `list_c()`, `list_rbind()`, and `list_cbind()`, which replace `flatten_lgl()`, `flatten_int()`, `flatten_dbl()`, `flatten_chr()` (now `list_c()`), `flatten_dfc()` (`list_cbind()`), and `flatten_dfr()` (`list_rbind()`). The new functions are straightforward wrappers around vctrs functions, but somehow feel natural in purrr to me. This leaves `flatten()`, which had a rather idiosyncratic interface. It's now been replaced by `list_flatten()` which now always removes a single layer of list hierarchy (and nothing else). While working on this I realised that this was actually what `splice()` did, so overall this feels like a major improvement in naming consistency. With those functions in place we can deprecate `map_dfr()` and `map_dfc()` which are actually "flat" map functions because they combine, rather than simplify, the results. They have never actually belonged with `map_int()` and friends because they don't satisfy the invariant `length(map(.x, .f)) == length(.x)`, and `.f` must return a length-1 result. This also strongly implies that `flat_map()` would just be `map_c()` and is thus not necessary. * Fixes #376 by deprecating `map_dfc()` * Fixes #405 by clearly ruling against `map_c()` * Fixes #472 by deprecating `map_dfr()` * Fixes #575 by introducing `list_c()`, `list_rbind()`, and `list_cbind()` * Fixes #757 by deprecating `flatten_dfr()` and `flatten_dfc()` * Fixes #758 by introducing `list_rbind()` and `list_cbind()` * Part of #900
Sometimes we have a data.frame-like list and want to apply some function and harvest the result as data.frame.
map_dfc()
is quite useful for this purpose:But it fails if the result contains list columns:
Is it possible to get the result as data.frame like bellow?
(As the error indicates, this seems up to
cbind_all()
. Should I file this issue to dplyr's repo?)The text was updated successfully, but these errors were encountered: