You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when sorting a collection with sort(collection; by=transformation), the by callable can be called multiple times on the same element (essentially whenever it is compared to another element within the sort algorithm).
MWE
using Random
struct Counter
d::DictendCounter() =Counter(Dict{Any, Int}())
updatecount(counter::Counter, x) = counter.d[x] =get(counter.d, x, 0) +1getcount(counter::Counter, x) =get(counter.d, x, 0)
functioncounted_transformation(x; counter)
updatecount(counter, x)
return x
end@show collection =shuffle(1:10)
counter =Counter()
sort(collection; by=x->counted_transformation(x; counter=counter))
@show [getcount(counter, x) for x in collection]
As you can see, for some elements by has been called 7 or 8 times.
This can be quite expensive if the by transformation involves some non-trivial computation.
IMHO, it would be preferable if sort transforms these values only once before sorting.
From the high-level, this could look like this (this allocates a bunch, so this is not a good workaround)
# single transformation sortfunctionsort_custom(collection; by=identity, kwargs...)
collection_with_transformation =map(x->x=>by(x), collection)
sort!(collection_with_transformation; by=x->last(x))
returnmap(x ->first(x), collection_with_transformation)
end
The text was updated successfully, but these errors were encountered:
lassepe
changed the title
sort(collection; by=transformation) should only invoke transformation once per elementsort(collection; by=transformation) should invoke transformation only once per element
Mar 1, 2020
This is just a space vs. time tradeoff. Often by is a very simple function. I think the right way to do this is sortperm, or a variation that sorts two arrays together using one for the keys. I don't believe we have that function in Base but it might be in a package.
Okay. I agree that this case may be rare enough that it should maybe not be provided by Base. Though, I believe, the fact that by is called more than once per element should be documented (i.e. by should not have any side-effects). This first bit when passing a method to by that involved sampling from a distribution parameterized by the element (i.e. non-deterministic transformation from element to a value). In that case, all sorts of other things break in the sort method (which is okay because non-deterministic transformations are a very esoteric use case).
Currently, when sorting a collection with
sort(collection; by=transformation)
, theby
callable can be called multiple times on the same element (essentially whenever it is compared to another element within the sort algorithm).MWE
Output:
As you can see, for some elements
by
has been called 7 or 8 times.This can be quite expensive if the
by
transformation involves some non-trivial computation.IMHO, it would be preferable if
sort
transforms these values only once before sorting.From the high-level, this could look like this (this allocates a bunch, so this is not a good workaround)
The text was updated successfully, but these errors were encountered: