Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unstable sort to libcore #1884

Merged
merged 9 commits into from Mar 16, 2017
Merged

Add unstable sort to libcore #1884

merged 9 commits into from Mar 16, 2017

Conversation

ghost
Copy link

@ghost ghost commented Feb 3, 2017

Every other systems programming language has a fast, non-allocating, unstable sort in standard library. Rust still doesn't. This is a proposal to add one to libcore.

The proposed implementation is very fast, faster than std::slice::sort and std::sort in C++ by a wide margin.

This topic was discussed before in the issue #790

Rendered

**Q: How much faster can unstable sort be?**<br>
A: Sorting 64-bit integers using [pdqsort][stjepang-pdqsort] (an
unstable sort implementation) is **40% faster** than using `slice::sort`.
Detailed benchmarks are [here](https://github.com/stjepang/pdqsort#extensive-benchmarks).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is this compared against the recent improvements to the existing sort implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

@sfackler
Copy link
Member

sfackler commented Feb 3, 2017

The conflation of API stability and order stability is a bit unfortunate, but I agree that trying to name this something other than unstable_sort would probably be even more confusing.

👍

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 3, 2017
@mgattozzi
Copy link
Contributor

As long as it's clearly documented I would think unstable_sort would be fine, even if the name seems counter to Rust's normal goals of safety. Having this in the std library would be a huge boon to those looking to use it.

@arthurprs
Copy link

👍 overall.

@alexcrichton
Copy link
Member

This is an excellent RFC, thank you for taking the time to write it up and be so thorough @stjepang!

I wonder if we could consider perhaps sort_unstable as the name? That way it'd group nicely with sort in the documentation today and you could easily scan over and see it.

Other than that, the only question I'd have is to clarify if you have an idea in mind for the concrete implementation of this API in libcore. Are you thinking pdqsort or perhaps some other implementation?

@liigo
Copy link
Contributor

liigo commented Feb 5, 2017

I think many QAs in motivation section are out of topic. eg. those about 'stable sort'.

@ghost
Copy link
Author

ghost commented Feb 5, 2017

Other than that, the only question I'd have is to clarify if you have an idea in mind for the concrete implementation of this API in libcore. Are you thinking pdqsort or perhaps some other implementation?

Exactly - proposed implementation lives in the pdqsort crate.

I wonder if we could consider perhaps sort_unstable as the name? That way it'd group nicely with sort in the documentation today and you could easily scan over and see it.

Makes sense, even though sort_unstable doesn't roll off the tongue as smoothly, I think. That said, I'm pretty ambivalent and wouldn't be against such naming scheme.

Let's see what others think - can you please vote this comment:

  • +1 if you prefer sort_unstable
  • -1 if you prefer unstable_sort
  • :/ if you like both

@nagisa
Copy link
Member

nagisa commented Feb 6, 2017

IMO unstable_sort is not exactly a good name as it does not sound like a verb/action. That is, slice.sort reads a lot like doing some action on a slice. unstable_sort in that sense does not sound verb-y at all. Instead it is a dry description of algorithm. One could still read sort_unstable as if it was an action action e.g. if read as slice.sort (unstably).

@alexcrichton
Copy link
Member

@stjepang oops thanks for the clarification! I clearly need to read more closely :)

@clarfonthey
Copy link

clarfonthey commented Feb 6, 2017

I think it'd be good to also specialise slice::sort to an unstable sort for types for which stability is known to make no difference, i.e. integers and str. That way, users of the regular sort still reap the benefits of this implementation.

@SimonSapin
Copy link
Contributor

types for which stability is known to make no difference, i.e. integers and str.

(Stability of sorting [&str] does make an observable difference with str::as_ptr.)

@ghost
Copy link
Author

ghost commented Feb 6, 2017

I think it'd be good to also specialise slice::sort to an unstable sort for types for which stability is known to make no difference, i.e. integers and str.

I strongly disagree. Please take a look at this surprising benchmark in C++:

$ clang++ -std=c++14 -O3 -x c++ - <<EOF
> #include <algorithm>
> int a[int(1e7)];
> int main() {
>     for (int i = 0; i < 1e7; ++i) a[i] = i % 2500000;
>     std::sort(a, a + int(1e7));
>     return 0;
> }
> EOF
$ time ./a.out

real    0m0.616s
user    0m0.601s
sys     0m0.013s
$ clang++ -std=c++14 -O3 -x c++ - <<EOF
> #include <algorithm>
> int a[int(1e7)];
> int main() {
>     for (int i = 0; i < 1e7; ++i) a[i] = i % 2500000;
>     std::stable_sort(a, a + int(1e7));
>     return 0;
> }
> EOF
$ time ./a.out

real    0m0.241s
user    0m0.216s
sys     0m0.023s

In C++, std::stable_sort is several times faster than std::sort on this particular case! How is that possible? The reason is that they use totally different algorithms. While std::sort is generally faster, there are still some exceptional cases where std::stable_sort wins. If std::stable_sort was specialized for integers, then users wouldn't reap the benefits - in fact, they might be worse off! The same story applies to slice::sort (timsort) vs pdqsort in Rust.

By specializing slice::sort, you take the choice away from users. I'd strongly prefer giving users sorts with clear, straightforward, unsurprising performance characteristics. If we specialize slice::sort for integers, does that mean sorting struct Meters(u64)s will be wildly different than sorting u64s, even though they're the same thing? Users might be bewildered by the difference in performance, and rightly so.

Note that by not specializing we're giving users a choice. We're providing two different sorts with consistent, predictable, and reliable performance. If they want speed, or in other words, if they don't care about stability, then high performance sort is a few keystrokes away: just add "unstable".

Unstable sort is generally much faster than stable sort, with a few exceptions. This will be clearly explained in the documentations. Users who understand the difference will choose the benefits they want to reap by themselves. The difference will be explained here, under "Current implementation".

The whole reason why the default slice::sort in Rust is stable in the first place is that we eschew surprises. Let's not ruin this goal now. :)

@ghost
Copy link
Author

ghost commented Feb 7, 2017

Thanks everyone for your inputs! It's pretty clear now that renaming unstable_sort to sort_unstable is the way to go. I've updated the RFC accordingly.

@sfackler
Copy link
Member

It seems like his has reached a steady state and there isn't too much controversy over the approach, so I'll FCP to merge!

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 15, 2017

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jongiddy
Copy link

Why is libcore the appropriate place to put this, rather than keeping it as a separate crate?

@ghost
Copy link
Author

ghost commented Feb 16, 2017

@jongiddy

After rewriting slice::sort for better performance, I received this feedback:

  1. Why don't we have radix sort?
  2. Are we now as fast as C++?
  3. Should we have sort_unstable?

After implementing pdqsort, the feedback was:

  1. I never cared about stability. Will use this in the future.
  2. Is it worth adding this to std lib?
  3. Would be nice to have unstable sort. Only pay for what you use.

There was also a question on StackOverflow complaining that the fact that slice::sort allocates doesn't fit the advertised "zero cost abstraction" bill.

These complaints are understandable considering that the default sorts in C, C++, Swift, Go, and D don't allocate nor sacrifice speed.

Generally speaking, I'm in favor of keeping libstd and libcore small, but sorting is such a common operation. Today, almost every call to slice::sort you can find in any Rust code is a substandard solution. This is a sign of a missing built-in feature.

@cristicbz
Copy link

@stjepang Regarding specialisation: how about adding sort_stable which is always stable, and making sort specialised. I'd love it if sort did the right thing if it can.

Also as far as I can tell your example isn't really about stable vs unstable sort, it's more about particular sort implementations; you can always do better on a specific example by picking a specific sort impl. Since the method names don't say timsort and pdqsort (and you can't rely on their perf characteristics to be backwards-compatibly preserved), it isn't really correct to pick sort_stable vs sort_unstable to see what's faster. Is that correct?

@ghost
Copy link
Author

ghost commented Feb 20, 2017

@cristicbz

Regarding specialisation: how about adding sort_stable which is always stable, and making sort specialised. I'd love it if sort did the right thing if it can.

The problem with specialization is that it's a half-baked solution. Suppose you specialize sort for [u64] so that it falls back to sort_unstable behind the scenes. Now you create a struct Meters(u64). Sorting [Meters] is much slower because it's not specialized, even though it's essentially the same thing as [u64]. Again, someone will complain that this doesn't fit the zero-cost abstractions bill. :)

Also as far as I can tell your example isn't really about stable vs unstable sort, it's more about particular sort implementations; you can always do better on a specific example by picking a specific sort impl.

Well, stability is a constraint. Because sort is stable and performs a strictly harder task, it will be slower than unstable_sort, by definition. Unfortunately, writing an unstable sort that will be strictly faster than our current sort implementation in practice is pretty difficult. A few rare cases where that is not true always pop up.

Since the method names don't say timsort and pdqsort (and you can't rely on their perf characteristics to be backwards-compatibly preserved), it isn't really correct to pick sort_stable vs sort_unstable to see what's faster. Is that correct?

No, sort_unstable is faster (except in several exceptional cases). If you want speed and don't care about stability, then by all means use sort_unstable. Let me put it this way: among BTreeMap and HashMap, what is the correct choice if you care about performance? Obviously, the answer is HashMap. It has both theoretical and practical advantage in performance. However, some situations where BTreeMap is faster can still crop up:

$ echo "fn main() { let mut m = std::collections::HashMap::new(); for x in 0..5_000_000 { m.insert(x, x); } }" | rustc -O - && time ./rust_out
real    0m0.961s
$ echo "fn main() { let mut m = std::collections::BTreeMap::new(); for x in 0..5_000_000 { m.insert(x, x); } }" | rustc -O - && time ./rust_out
real    0m0.669s

Interestingly, in this case the names of these data structures actually do represent particular implementations (they are not called Map and OrderedMap). However, since we already have the sort method and need one that isn't constrained by stability, it will be called sort_unstable. We probably don't want to set the algorithm choice in stone. After all, our merge sort recently became a timsort and we didn't have to rename the function. :)

@cristicbz
Copy link

@stjepang

Now you create a struct Meters(u64). Sorting [Meters] is much slower because it's not specialized, even though it's essentially the same thing as [u64].

It depends on how exactly you do this, I was imagining something like an trait UnstableSortSafe {} with #[derive(Ord)] struct X { member1: Type1, member2: Type2 } adding an impl UnstableSortSafe for X where Type1: UnstableSortSafe, Type2: UnstableSortSafe {}.

What do you think?

Well, stability is a constraint. Because sort is stable and performs a strictly harder task, it will be slower than unstable_sort, by definition

I agree! I think maybe I wasn't clear. It's a safe bet that unstable_sort is faster than stable_sort except for a handful of pathological cases, therefore making the default unstable_sort where stability doesn't matter is the right call (hence my support for the marker trait).

The counterpoint I was making to "specialising-removes-choice" is that the particular pathological cases are not generally applicable to stable-vs-unstable, but depend on the particular implementations of stable_sort and unstable_sort.

If you hit one of these pathological cases, switching to stable_sort is not the best idea (since its implementation may change such that the particular case becomes slower than unstable_sort). Instead you should use an algorithm with a fixed implementation which you know performs well on your data.

@ghost
Copy link
Author

ghost commented Feb 21, 2017

@cristicbz

You made some very good points, but I'm still not in favor of specialization :)

The counterpoint I was making to "specialising-removes-choice" is that the particular pathological cases are not generally applicable to stable-vs-unstable, but depend on the particular implementations of stable_sort and unstable_sort.

Absolutely, I agree.

You might want to take a look at rust-lang/rust#38524. So, in this issue a discussion was brought up about what kind of guarantees we should or shouldn't make about the sort algorithm. It's a really difficult call. This culminated with @aturon's conclusion:

We discussed the question of normativity for std docs in the lang team meeting today. There are good arguments in both directions. But by and large, most items we add to the standard library provide behavior which is intended to be reliable -- changes to observable behavior are extremely rare in std.

I think our current discussion is similar to that one. There are good arguments both ways, but ultimately I think we shouldn't make too hard guarantees (let's use the stable/unstable dichotomy rather than timsort/pdqsort), while at the same time documenting particular implementations and making them clear and reliable (but not guaranteeing them). Although specialization improves performance, it makes it less reliable and more surprising.

If you hit one of these pathological cases, switching to stable_sort is not the best idea (since its implementation may change such that the particular case becomes slower than unstable_sort).

Again, let me quote @aturon here:

FWIW, part of the calculation here is that people are likely to rely on observable behavior whether documented/guaranteed or not. So we're likely to be prevented from making behavioral changes just to avoid breakage in practice, let alone because of what the docs happen to say.

Whatever guarantees we make, people will rely on observable behavior. Let's keep it simple and allow users to rely on those implementations if they wish. Let's not try to outsmart them with specialization.

That's my take. Hope it makes some sense.

@cristicbz
Copy link

That's a reasonable view: if C++ is any guide, observable behaviour is almost as set in stone as the specifics of the standard. I can see that it's a trade-off, and I'm really happy about getting unstable sort in the standard library regardless :)

@bluss
Copy link
Member

bluss commented Feb 21, 2017

To clarify the message about zero cost abstractions:

1 .This feature item means that in Rust zero-cost abstractions are possible to express
2. This does not mean that every abstraction is zero cost, or that every abstraction can be zero cost.

@alexcrichton alexcrichton self-assigned this Feb 28, 2017
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 7, 2017

OFFTOPIC


@stjepang

There was also a question on StackOverflow complaining that the fact that slice::sort allocates doesn't fit the advertised "zero cost abstraction" bill.

Often I find my self doing multiple stable sorts in a loop over items of the same type and I always wondered if it would be possible to "split" sort into two functions (not really split, sort would still be there :D) :

  • (&mut [T]).sort_with_buffer(&mut [T]) that takes a buffer of a given length (possibly zero; also, adding sort_with_buffer_by_key and friends). That is, a stable sort that would work without memory allocation and put this in core,
  • and a std::sort_get_suitable_buffer([T]) -> Box<[T]> (with a better name? I like it to start with sort so that is grouped together, but obtain_suitable_sort_buffer sounds better), that gives me a suitable buffer to use with sort_with_buffer.

That way I could:

  • stable sort in core! without memory allocations! (slow!)
  • stable sort in core, using a buffer on the stack, or static memory! (fast!)
  • reuse buffer between stable sort calls! (faster that what we currently have!)
  • reuse buffer between stable sort of sequences of different types (using unsafe! yay! or not! the power?!)

Would it make sense to purse this in a different follow-up RFC?

@arthurprs
Copy link

@gnzlbg really good points, worth discussing in a separated thread.

@ghost
Copy link
Author

ghost commented Mar 7, 2017

@gnzlbg

The sorting machinery you are proposing makes sense, but I believe it's way outside the scope of this RFC, and also outside the focus of libcore and libstd. Unless all that can be covered under an elegant API, it's better to provide it as a separate crate.

If this RFC gets implemented, we'll have six (!!) methods for sorting. This is already a large number, although I believe a necessary number nonetheless - hence the RFC. But having even more of them would be really getting out of hand, don't you think? :)

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 7, 2017

@stjepang

But having even more of them would be really getting out of hand, don't you think? :)

You are arguing here that we should provide an unstable sorting algorithm that never allocates in libcore, but somehow are arguing as well that providing an stable sorting algorithm that never allocates is out-of-scope for libcore?

Cleaning up the libstd implementation of merge_sort, and splitting the logic in "sorting a buffer" and "allocating a suitable buffer" is probably worth doing anyways (and is how all C++ implementations are structured, since they need to call std::get_temporary_buffer. Once merge_sort's logic is split into two separate functions, it remains an issue of making them public under suitable names, and maybe moving one of them to libcore.

But this increases the API of libstd with at most two functions that arguably should already be there, and this saves users for having to reimplement stable sort for embedded applications (which is arguably a non-trivial task).

I don't know, I see a net win :)

@ghost
Copy link
Author

ghost commented Mar 7, 2017

@gnzlbg

But this increases the API of libstd with at most two functions

If I understood correctly, you are proposing to add at least 3 new functions, right?

You are arguing here that we should provide an unstable sorting algorithm that never allocates in libcore, but somehow are arguing as well that providing an stable sorting algorithm that never allocates is out-of-scope for libcore?

Well, kind of. :) It's a matter of balancing the costs and benefits.

A stable sort in libcore would (as proposed) introduce yet another sort_* / sort_*_by / sort_*_by_key triple, which is pretty unfortunate. I consider those methods "nice to have". High cost (3 functions), little benefit (niche use cases). If the cost were lower, I wouldn't be against this idea.

Unstable sort is not only sometimes nice to have, but the correct choice for the vast majority of sorting needs. Just because it is so sorely needed is why introducing a new triple of methods is forgivable. I consider those methods "essential". High cost (3 functions), high benefit (almost all use cases).

I'd like to reiterate that the methods you are proposing overall make sense. They would certainly fill a gap in libcore and make it more complete. It's just that they introduce a lot of functions for niche use cases - that's the only reason why I'm hesistant in accepting the idea.

If you could design a cleaner/smaller API, I'd be totally up for it. Perhaps Rust should get some blame here - if we had default and optional arguments, maybe we could do something along the lines of:

fn sort(&mut self, stable: bool = true, buffer: Option<&mut [u8]> = None) where Self::Item: Ord {
    // ...
}

Does that make sense? What does the libs team think?

@clarfonthey
Copy link

Also an important thing to note is that libcore will be able to have a stable sort if #1909 gets accepted.

@ghost
Copy link
Author

ghost commented Mar 7, 2017

@clarcharr Are you suggesting that stable sorting in libcore could allocate the buffer on the stack rather than on the heap? If so, I'm afraid that's not a viable option - it could easily overflow the stack.

@clarfonthey
Copy link

@stjepang: it could overflow the stack for large slices, but it'd still be worthwhile to have an upper limit and document it as panicking on no_std.

We can still just do [Default::default(); 4096] on stable now and that size covers a majority of use cases.

@ghost
Copy link
Author

ghost commented Mar 7, 2017

@clarcharr Hmm, there are two problems with that:

  1. [Default::default(); 4096] can be unexpectably too large, e.g. if Default::default() itself returns a really large struct, or even an array.
  2. Sorting more than 4096 objects is not that uncommon. E.g. files in a directory, words in a text document, or HTML elements in a web page can easily outnumber that limit. It'd be unfortunate for such an important function to panic on medium-sized slices and longer.

We need something more robust. Admittedly, designing a clean and uncompromising sort interface is challenging. :)

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 7, 2017

@stjepang

If I understood correctly, you are proposing to add at least 3 new functions, right?

A stable sort in libcore would (as proposed) introduce yet another sort_* / sort_by / sort_by_key triple,

The minimal viable set is to add two single functions to lib core:

  • (&mut [T]).sort_with_buffer_by(buffer: &mut [T], comp: Comp).
  • (&[T]).sort_buffer_len() -> usize (currently (&[T]).len() / 2).

Anything else can be build efficiently on top of that, and doing so is trivial (at least when compared to the cost of implementing your own stable sorting algorithm). I think that also having a:

  • (&mut [T]).sort_with_buffer_by_key(buffer: &mut [T], key: Key)

would be nice, but since it is a two-liner, we don't need to add it to libcore if there are real concerns about API bloat.

I actually don't really care if these are available in std:: or not, as you say, and I agree with you, these are not required often.

The issue IMO is that when you need them, reimplementing a correct stable sort algorithm that is both performant and correct is a titanic task. We can spare this task to our users by providing the building blocks for the hard parts in libcore.

Whether you then want to stable sort with a Vec<T>, Box<[T; 4096]>, a [T; 256] on the stack or static memory, or something else (e.g. sort by key), can be trivially build on top with ~3-5 lines of code.

In particular, this code is already in std::, we would just need to refactor std::merge_sort into two functions, and move them to lib core, and build std::merge_sort using them (in ~3 lines of code).

@clarcharr

Also an important thing to note is that libcore will be able to have a stable sort if #1909 gets accepted.

Note that the default buffer length for stable sort in std:: is length / 2. That can easily overflow the stack, or we would need to choose a smaller length to avoid that. It also does not solve the problem of, e.g., wanting to reuse the same buffer across std::sort calls, or I am missing something?


@stjepang

fn sort(&mut self, stable: bool = true, buffer: Option<&mut [u8]> = None)

I think that we would at least need an extra parameter to tell sort whether it can heap allocate or not... It also puts all the options on everybody's faces. I don't know, I think that just refactoring std::merge_sort into two functions, providing those in libcore, and then build std::merge_sort with heap allocation on top of them, would be enough.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 7, 2017

I opened an internal threads for this, the discussion here is derailing, and we should just merge this RFC and discuss future extensions somewhere else: https://internals.rust-lang.org/t/pre-pre-rfc-stable-sorting-building-blocks-in-libcore/4928

@brson
Copy link
Contributor

brson commented Mar 8, 2017

Stable sorting, although a good default, is very rarely useful.

Please remove or change this text. It is self-contradictory and I believe incorrect. There are tradeoffs between stable and unstable sorting. This is a hostile way to begin an RFC.

I'm fine with the technical content of this RFC. Thanks @stjepang.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 8, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Mar 8, 2017
@cristicbz
Copy link

@brson I think that's not contradictory. It's often the case of conservative defaults: they're not useful most of the time, but when they are it would be an expensive / hard to track problem; that still makes it the right default. SipHash is another example of this.

@ghost
Copy link
Author

ghost commented Mar 8, 2017

@brson Sorry, I should've worded that sentence better. I gave the motivation section another try so hopefully it's clearer and more balanced now.

What I wanted to say is: stability as a property is rarely desired in practice, and some other characteristics are desired more often.

So far I only gave anecdotal evidence for that statement, but if someone is seeking empirical evidence, here's an easy way to find some: simply search C++ code on GitHub for std::sort and std::stable_sort (the former gives 3M hits, and the latter 173K hits).

@alexcrichton
Copy link
Member

It's been a week now since FCP start (not sure where fcpbot is) and nothing major has coming up, so I'm going to merge!

Thanks again for the RFC @stjepang!

@alexcrichton
Copy link
Member

Tracking issue: rust-lang/rust#40585

@ghost ghost deleted the unstable-sort branch March 16, 2017 21:01
@ghost ghost restored the unstable-sort branch May 12, 2017 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.