-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan for flatten()
, simplify()
and friends
#900
Comments
I think it'd be worth clarifying the notion of "flattening". In other languages and in rlang, flattening is a tool for nested structures that transforms
|
library(purrr)
list_flatten <- function(x) {
stopifnot(vctrs::vec_is_list(x))
is_nested <- map_lgl(x, vctrs::vec_is_list)
x[!is_nested] <- map(x[!is_nested], list)
unlist(x, recursive = FALSE)
}
str(list_flatten(list(1:2, list(3))))
#> List of 2
#> $ : int [1:2] 1 2
#> $ : num 3
str(list_flatten(list(list(1), list(2, 3), list(list(4)))))
#> List of 4
#> $ : num 1
#> $ : num 2
#> $ : num 3
#> $ :List of 1
#> ..$ : num 4
str(list_flatten(list(data.frame(x = 1), data.frame(y = 2))))
#> List of 2
#> $ :'data.frame': 1 obs. of 1 variable:
#> ..$ x: num 1
#> $ :'data.frame': 1 obs. of 1 variable:
#> ..$ y: num 2 Created on 2022-08-29 by the reprex package (v2.0.1) |
There are two possible directions we could take to replace
In either case we want to avoid both the name and the interface of the existing
|
I think
[1] I prefer |
Concrete proposal to consider: #909 |
The key idea is to introduce a new family of "combining" functions: `list_c()`, `list_rbind()`, and `list_cbind()`, which replace `flatten_lgl()`, `flatten_int()`, `flatten_dbl()`, `flatten_chr()` (now `list_c()`), `flatten_dfc()` (`list_cbind()`), and `flatten_dfr()` (`list_rbind()`). The new functions are straightforward wrappers around vctrs functions, but somehow feel natural in purrr to me. This leaves `flatten()`, which had a rather idiosyncratic interface. It's now been replaced by `list_flatten()` which now always removes a single layer of list hierarchy (and nothing else). While working on this I realised that this was actually what `splice()` did, so overall this feels like a major improvement in naming consistency. With those functions in place we can deprecate `map_dfr()` and `map_dfc()` which are actually "flat" map functions because they combine, rather than simplify, the results. They have never actually belonged with `map_int()` and friends because they don't satisfy the invariant `length(map(.x, .f)) == length(.x)`, and `.f` must return a length-1 result. This also strongly implies that `flat_map()` would just be `map_c()` and is thus not necessary. * Fixes #376 by deprecating `map_dfc()` * Fixes #405 by clearly ruling against `map_c()` * Fixes #472 by deprecating `map_dfr()` * Fixes #575 by introducing `list_c()`, `list_rbind()`, and `list_cbind()` * Fixes #757 by deprecating `flatten_dfr()` and `flatten_dfc()` * Fixes #758 by introducing `list_rbind()` and `list_cbind()` * Part of #900
I suggest we at least we mark
as_vector()
,simplify()
,simplify_all()
,flatten()
, andflatten_*()
as questioning and remove from the function index. The semantics of these functions are not super clear, and I'm not sure it's worth completely nailing them down because many of the problems they were originally designed to solve have moved into tidyr. Their use in their purrr docs reveals few compelling uses cases:as_vector()
is only used in its own examples.simplify()
(a version ofas_vector()
that returnsx
if it can simplify) isn't used at all.simplify_all()
is used with the output oftranspose()
. The most compelling use case is withsafely()
to get lists of error and results. But I suspect we could tackle this a different way, perhaps with alist_transpose()
inspired by the work we've done inunnest_longer()
/unnest_wider()
or possibly powered byvec_simplify()
.flatten()
is only used in its own examples. I can seeflatten()
being useful in conjunction withmap()
but it might be easier to create theflat_map()
functions directly.Long term, if we do discover that
flatten()
andsimplify()
are useful enough to bother fully fleshing out their semantics, I'd suggest deprecating them and introducing newlist_flatten()
andlist_simplify()
.Additionally, it might be worth deprecating
flatten_dfc()
andflatten_dfr()
as they have poor semantics (#648, #757). They are similar to bothdplyr::bind_rows(x)
andvctrs::vec_rbind(!!!x)
. If we think it's unappealing to recommend a vctrs function we could addlist_rbind <- function(x) vctrs::vec_rbind(!!!x)
(and similar forlist_cbind()
). I think we'd probably require individual elements to be data frames.The text was updated successfully, but these errors were encountered: