Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort(collection; by=transformation) should invoke transformation only once per element #34947

Closed
lassepe opened this issue Mar 1, 2020 · 2 comments · Fixed by #48387
Closed
Labels
docs This change adds or pertains to documentation sorting Put things in order

Comments

@lassepe
Copy link
Contributor

lassepe commented Mar 1, 2020

Currently, when sorting a collection with sort(collection; by=transformation), the by callable can be called multiple times on the same element (essentially whenever it is compared to another element within the sort algorithm).

MWE

using Random
struct Counter
    d::Dict
end
Counter() = Counter(Dict{Any, Int}())
updatecount(counter::Counter, x) = counter.d[x] = get(counter.d, x, 0) + 1
getcount(counter::Counter, x) = get(counter.d, x, 0)

function counted_transformation(x; counter)
    updatecount(counter, x)
    return x
end

@show collection = shuffle(1:10)
counter = Counter()
sort(collection; by=x->counted_transformation(x; counter=counter))
@show [getcount(counter, x) for x in collection]

Output:

collection = shuffle(1:10) = [2, 6, 9, 10, 3, 5, 7, 8, 1, 4]
[getcount(counter, x) for x = collection] = [3, 7, 8, 7, 7, 6, 6, 5, 8, 7]

As you can see, for some elements by has been called 7 or 8 times.
This can be quite expensive if the by transformation involves some non-trivial computation.
IMHO, it would be preferable if sort transforms these values only once before sorting.

From the high-level, this could look like this (this allocates a bunch, so this is not a good workaround)

# single transformation sort
function sort_custom(collection; by=identity, kwargs...)
    collection_with_transformation = map(x->x=>by(x), collection)
    sort!(collection_with_transformation; by=x->last(x))
    return map(x -> first(x), collection_with_transformation)
end
@lassepe lassepe changed the title sort(collection; by=transformation) should only invoke transformation once per element sort(collection; by=transformation) should invoke transformation only once per element Mar 1, 2020
@JeffBezanson
Copy link
Member

This is just a space vs. time tradeoff. Often by is a very simple function. I think the right way to do this is sortperm, or a variation that sorts two arrays together using one for the keys. I don't believe we have that function in Base but it might be in a package.

@lassepe
Copy link
Contributor Author

lassepe commented Mar 1, 2020

Okay. I agree that this case may be rare enough that it should maybe not be provided by Base. Though, I believe, the fact that by is called more than once per element should be documented (i.e. by should not have any side-effects). This first bit when passing a method to by that involved sampling from a distribution parameterized by the element (i.e. non-deterministic transformation from element to a value). In that case, all sorts of other things break in the sort method (which is okay because non-deterministic transformations are a very esoteric use case).

@StefanKarpinski StefanKarpinski added the docs This change adds or pertains to documentation label Mar 1, 2020
@LilithHafner LilithHafner added the sorting Put things in order label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation sorting Put things in order
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants