-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
block method to assemble multi-dimensional arrays #46003
base: master
Are you sure you want to change the base?
Conversation
I haven't test all cases, but looks like we can do this by: awfulstack(a::AbstractArray{<:AbstractArray}) = Base.hvncat(size(a), false, a...) |
@N5N3 That looks nice. In my opinion, this would ideally be a thin layer over Another way I have in mind would be having an iterator that goes through all the values in the correct order, and in the end this gets collected and reshaped. |
#21672 In May 2017 "Despacito" was topping the charts |
I now realized there may actually be demand for two different things. This function would now be capable to work as the inverse of I imagine it would be possible to infer the height from the array type. |
… lengths at each dimension. in the end we collect and reshape everything
I have now modified Tuples would now not be traversed, as well as RGB{x}, only abstractarrays or anything with |
Not sure there may be any support for this in Base, so I'm making it a little external package. https://github.com/nlw0/ArrayAssemblers.jl |
And when exactly does this do something different to the |
The It looks like
Julia's array syntax supports this in in the same way, concatenating rows first. This
|
OK. The penalty for splatting is certainly not as bad as it was... [Edit -- collapsed old benchmarks] julia> vm = [rand(10,10) for _ in 1:1000];
julia> m1 = @btime reduce(vcat, $vm);
min 77.375 μs, mean 129.687 μs (2 allocations, 781.30 KiB)
julia> m2 = @btime vcat($vm...);
min 118.708 μs, mean 166.544 μs (6 allocations, 804.94 KiB)
julia> m3 = @btime awfulstack($vm);
min 57.842 ms, mean 61.589 ms (19960 allocations, 382.39 MiB)
julia> m4 = @btime awfulstack_hvn($vm);
min 427.166 μs, mean 1.331 ms (33 allocations, 868.80 KiB)
julia> m1 == m2 == m3 == m4
true |
My understanding is that splat might cause issues beyond anything related to performance, such as stack overflows, and should be avoided for large inputs, I don't know if that has changed. |
I've done more tests with the |
IIUC this is intended as exploration and not to be merged, if that is the case, it's nice to mark it as a draft and/or tag it with |
It's intended to be merged. |
Oops! My mistake. |
Why the name |
"lol" means "list of lists". "cat" follows the existing pattern of cat vcat hvncat etc. I think this is a good name but I really don't mind changing. I could make a list of alternatives, but I'd rather listen first to ideas from other people if that's not a good name. I'm more concerned just with having the functionality available, that I believe is very important. |
When I saw the name lolcat, the first thing that came to mind was the meme (https://en.wikipedia.org/wiki/Lolcat); the python package lolcat (and I believe the ruby package by the same name) are both for creating such memes. I thought that the name was a joke and so assumed the PR was not intended to be merged.
|
You got me, "list" is a more generic term, and the function actually takes anything that is linearly iterable, such as generators. I'd rather avoid using the term array or vector since it implies those Julia data-structures, even though it always produces an Array. "List" might be too specific, though, since it's not just strictly linear collections, it actually works on anything "map-able" such as multidimensional arrays, eg
It's a recursive form of |
Maybe one interesting point about these functions is that the first, |
This is tragic, point taken. If |
This is often called Another common building block is something A simple example of how the latter is useful: tbl = [(name=1, measurements=[1, 2, 3]), ...]
flatmap(obj -> obj.measurements, (obj, meas) -> (; obj.name, x=meas), tbl)
# [(name=1, x=1), (name=1, x=2), (name=1, x=3), ...] |
Indeed, the main motivation for this PR is just to have
Thanks, I'm sure to take something from it if you don't mind, if necessary. I have no idea what will be the minimal requirements to push this, though. I'd be satisfied with very little.
Seems interesting, but I think it might make it impossible to use |
I'm not sure why it's not viable apart from being funny, but I'd point out the term "list of lists" is pretty common in general, especially in sparse arrays, where for whatever reason the abbreviation "LIL" was used in SuiteSparse and catched on... I think there's even more need to deal with this kind of data-structure, of implicit trees stored as lists of lists, but I'm really not sure what to call it other than that. I've even created this other small package because of it. But in the case here the idea is to have a dedicated function, without the need for a special wrapping class... https://github.com/nlw0/ArrayTrees.jl And btw I'd not propose |
Added a couple more examples. I'm still not entirely sure, but
The nested Extending on the issue of do-syntax, thinking back on the example from @aplavin it would look like
By the way, I don't think you can do Anyways, my point is just to make clear this is about bringing important existing language features that today are available pretty much exclusively through comprehensions, and offer them through functions, which on top of that can be used with do-syntax. I'll accept whatever name I need to in order to have these features available. I'm partial to |
Another proposal to bring flatmap functionality to Base is #45985, where we also remove the offensive new verb |
I have now realized |
Yes to recursion,
No to anyone being offended by #45985, but pushing two independent things into one PR makes it much less likely that reviewers look closely. Here too, separating (or deleting) the recursive one to focus on this |
I have removed the second method. It would still be great to have and iterator based |
By "other functions", do you mean
Kinda, but it allocates an extra array for each And it's far from intuitive what happens here, takes some time to parse visually. This operation is pretty common in data processing, so I enjoy having a concise form to write it. Given that other widely used languages also have the function with the same semantics, it must be quite general.
You mean, with
That PR mixes up two completely unrelated changes (rename Iterators.flatmap and add Base.flatten). I think |
Before about 1.5, it was stricter than the version with
On the original PR, some people objected that this was too trivial a feature to bother giving a whole new name, since it just composes Is (All off-topic really, sorry!) |
Well, More composable and DRY alternative to these methods would be something like julia> @btime sum(-, 1:10^5)
713.820 ns (0 allocations: 0 bytes)
-5000050000
julia> @btime sum(mapview(-, 1:10^5))
698.014 ns (0 allocations: 0 bytes)
-5000050000 Then, no
I'm aware of a wide range of languages/libraries that have None seem to have |
I do use the pattern
Most existing function calls can be replaced by their underlying implementation, it's called "inlining", and it's also the basis of lambda calculus. On the other hand, a wise man once said programming is about building useful abstractions. I think this is a useful one that I'm bound to use more and more, and I'm trying to share the thought with my fellow Julia programmers. If everybody thinks it's dumb, then that's it. About the relation between flatten and hcat, you guys are telling me the two functions I proposed are unrelated. You're missing the point, I'm precisely trying to highlight how they are actually related, trying to figure out what is in common between those functions. Looking for some fundamental truth here. When you do hcat(vector_of_vectors), it's the same thing as flattening, and then reshaping. That's pretty much what I'm looking for an "ultimate flatmap", something that will be able to deal with multi-dimensional arrays, compose efficiently with iterators, and allow use with do-syntax so we can write multi-line blocks and be able to use variables in the surrounding scope. I'm not really sure what it's going to look like in the end, but I do miss it. |
This is an alternative implementation to #43334. This PR was created to investigate a slight variation on the original proposal. We refer to the present function as
awfulstack
to avoid ambiguities during any discussions. There has been no investigation regarding the performance ofawfulstack
whatsoever.awfulstack
was implemented based onreduce
andcat
, iterating over the dimensions of the input array from major to minor, in a DFS kind of recursion. With a simple vector of vectors,awfulstack
implementsreduce(vcat, data)
. A 1xN array of vectors will result inreduce(hcat, data)
.awfulstack
cannot be the simple inverse ofeachcol
oreachrow
, since these functions produce vector-of-vectors with no dimensional information. Although it's easy to adapt the data to recover the original matrix by using eg.reshape
orpermutedims
. The input is strictly treated as a block-matrix, or block-array.awfulstack
can deal with non-uniform sub-array sizes and with concatenating arrays into higher dimensions (2x2 ++ 2x2 -> 2x2x2). "flatmap" functionality is also available.All credit goes to #43334, I only made this PR because there are a few details I felt important to investigate, and I find it easier to talk over code than to just talk. I hope the community finds this a worthy exploration.