Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo could use a persistent hash map in it's resolver #26

Closed
Eh2406 opened this issue May 31, 2018 · 10 comments
Closed

cargo could use a persistent hash map in it's resolver #26

Eh2406 opened this issue May 31, 2018 · 10 comments

Comments

@Eh2406
Copy link

Eh2406 commented May 31, 2018

Hi,

Cargo's resolver has a long standing TODO:
"We should switch to persistent hash maps if we can..."
So when I watched your talk at RustFest Paris I thought I'd make the introductions. So if/when this crate is ready, perhaps a pr thare?

The comment is somewhat out of date. Thanks to a lot of profiling most things are wrapped in RC/ARC and are bespoke custom persistent data structures. So I suspect it will improve ergonomics more then speed, but I'd be happy to be wrong.

@bodil
Copy link
Owner

bodil commented Jun 6, 2018

Heh, in fact the original OrdMap was written exactly because we needed it for a constraint solver. I'm guessing persistent hash maps would be a huge win on very large constraint sets, but mostly equivalent for the more normal sized problems.

For me, it'd be really interesting to see if the API is there yet for real world problems like this - I might actually give it a go before I tag the next major release, there's probably a lot of valuable learning to be had from doing this before I settle on a stable API.

@Eh2406
Copy link
Author

Eh2406 commented Jun 6, 2018

Using RC has improved performance in the past, so persistence matters, but we probably have most of it. We have also done algorithmic improvements, so I don't know of any real world cases where the resolver is the bottleneck. I'd mostly love to be using a library for ergonomics, all the RCs/custom RC wrappers make the code harder to follow.

I might actually give it a go before I tag the next major release

I'd be happy to help anyway I can! For most projects, I'd add a dependency without asking the dependency for permission. But for inclusion in the oficial Rust distribution, I just wanted to make sure you are willing to get the extra attention that may bring.

there's probably a lot of valuable learning to be had from doing this before I settle on a stable API.

I have asked other persistence libraries in the past and they have found it a useful exercise. They decided that there API was not quite ready.
Specifically they wanted to add:

  • Use make_mut to avoid clones when ref count is one. ( I think you already have this.)
  • Adding mutable methods to the data structures. ( I think you already have this.)
  • Skip boxing keys and values in another Arc inside for cheaply cloneable things. ( Similar to your Improving performance for Copy keys and values #12 I think.)
  • Allow the use of RC internally instead of ARC for single threaded uses.

@bodil
Copy link
Owner

bodil commented Jun 6, 2018

Yes, basically I've got your first two items already in released versions, and the other two are already done in unreleased code. And yes, this is a full time work project for me currently, I can definitely cope with the extra attention. 😊

@Eh2406
Copy link
Author

Eh2406 commented Jun 6, 2018

Wonderful to hear! So I did a quick audit of clones in the resolver, just to remind myself what is worth changing. The important clones happen hear, when constructing a BacktrackFrame . The biggest component of which is the Context (although the BinaryHeap<DepsFrame> may be worth looking at as well.)
Which brings us back to the original comment, Activations is a huge HashMap that is expencive to clone.

I'd start with that Activations and see where to go from there.

Another approach is a quick audit of Rcs in the resolver. For example RcList can just be replaced with a ConsList. And similarly, most use of RC could be replaced with less bespoke structures.

@Eh2406
Copy link
Author

Eh2406 commented Jun 21, 2018

Working from master the diff is very small to get Activations moved over. (Congratulations on that!) and it comes with a ~5% improvement.

Edit:
resolve_features was almost as straightforward to convert, and was approximately a performance wash.
(The fact that code like set.insert(...); modified the data in place using std but silently does nothing with im took me some time to debug. May I recommend ether saving the names used by std for things that do exactly the same, or atleast adding a #[must_use] to the immutable variants.)

Converting links was more difficult as I did not find a replacement for std::HashSet::insert, im::HashSet::insert returns the new copy, and im::HashSet::insert_mut does not return an indicator of whether the key was already present. (The workaround was to do Hashing/lookup twice, check contains then insert.)

@Eh2406
Copy link
Author

Eh2406 commented Jun 25, 2018

I was trying to get a sense of the overhead of using im for small cases verses the better asymptotics for big cases. To investigate I made a criterion benchmark, of keeping a stack, of all the versions of a HashSet as Items are added one at a time. I think this is the best case for immutable data structures, as using std and clone should be O(stack_size^2) and im should be O(stack_size).

I started with a usize as the value. The shape is as predicted, and the crossover is in the low 600s.
copy
Then I realized that is unfair to im as usize is copy so its clone is an almost free memcpy. So I tried with Rc<usize> with a clone of one cache miss. The crossover is in the mid 200s.
rc

#[macro_use]
extern crate criterion;
extern crate im;

use criterion::{Criterion, ParameterizedBenchmark};
use std::rc::Rc;

fn bench(c: &mut Criterion) {
    let parameters = vec![500, 600, 700, 800];
    let benchmark = ParameterizedBenchmark::new("std-HashSet",
                                                |b, &size| {
                                                    b.iter(|| {
                                                        let mut vec = Vec::with_capacity(size);
                                                        let mut seq = std::collections::HashSet::new();
                                                        for i in 0..size {
                                                            seq.insert(i);
                                                            vec.push(seq.clone());
                                                        }
                                                    })
                                                }, parameters)
        .with_function("im-HashSet",
                       |b, &size| {
                           b.iter(|| {
                               let mut vec = Vec::with_capacity(size);
                               let mut seq = im::HashSet::new();
                               for i in 0..size {
                                   seq.insert_mut(i);
                                   vec.push(seq.clone());
                               }
                           })
                       });

    c.bench("clone", benchmark);

    let parameters = vec![200, 300, 400, 500];
    let benchmark = ParameterizedBenchmark::new("std-HashSet",
                                                |b, &size| {
                                                    b.iter(|| {
                                                        let mut vec = Vec::with_capacity(size);
                                                        let mut seq = std::collections::HashSet::new();
                                                        for i in 0..size {
                                                            seq.insert(Rc::new(i));
                                                            vec.push(seq.clone());
                                                        }
                                                    })
                                                }, parameters)
        .with_function("im-HashSet",
                       |b, &size| {
                           b.iter(|| {
                               let mut vec = Vec::with_capacity(size);
                               let mut seq = im::HashSet::new();
                               for i in 0..size {
                                   seq.insert_mut(Rc::new(i));
                                   vec.push(seq.clone());
                               }
                           })
                       });

    c.bench("clone-rc", benchmark);
}

criterion_group!(benches, bench);
criterion_main!(benches);

So given this benchmark I think for cargo:

  • activations should be faster as an im structure. (and that is what we see in experiment.)
  • resolve_features should be faster with the outer HashMap using im but with the inner HashSet using Rc and std. Common to have more than 200 crates in the dependency tree, rare to have 600 features in a crate.
  • links should be faster with std as it will be very uncommon to have more the 200 sys crates.

@bodil
Copy link
Owner

bodil commented Jul 5, 2018

Update here: I've stabilised the API and am about ready to release - I didn't actually have this thread in mind when working on it, but it looks like I've accidentally managed to address the rough bits you mentioned, probably because I've modelled the final API very closely on std::collections.

I've run the API through a couple of local projects, and it seems sound enough, and looking at the Cargo code I think a port should be very straightforward, so I think I'm going to go ahead and tag the release after agonising about it for just a little while longer.

@Eh2406
Copy link
Author

Eh2406 commented Jul 5, 2018

Thanks the new api looks good! I really like the guaranteed drop in for std style, it makes trying im very eazy! ❤️

One small thing, features are global and additive. So if I rely on im types being Send or Sync then I don't set no_arc and all is well. But if later one of my dependencies adds im with no_arc then I will stop compiling. Or for that matter, any other crate in the same projects dependency tree.
It would be best if ARC/RC was a property of each instance of a im type, but I think we would need HKT to do that correctly.
Another option is to switch the default. So we would have a use_arc feature. If I opt out of it, and I am forced in by another user of im then I will be slower than expected but would still compile.

@bodil
Copy link
Owner

bodil commented Jul 9, 2018

Good call on the feature, I've changed it to arc: 35e91e2.

bors added a commit to rust-lang/cargo that referenced this issue Nov 21, 2018
Persistent data structures by im-rs

There has been a long standing "TODO: We should switch to persistent hash maps if we can" in the resolver. This PR introduces a dependency on [im-rs](bodil/im-rs#26) to provide persistent hash maps. It then uses `im_rc::OrdSet` to store the priority list of dependencies, instead of `std::BinaryHeap`, as cloning the `Heap` was one of the most expensive things we did. In Big O terms these changes are very big wins, in practice the improvement is small. This is do to a number of factors like, `std::collections` are very fast, N is usually only in the hundreds, we only clone when the borrow checker insists on it, and we use `RC` and other tricks to avoid cloning.

I would advocate that we accept this PR, as:
- Things are only going to get more complicated in the future. It would be nice to have Big O on our side.
- When developing a new feature finding each place to add `RC` should be an afterthought not absolutely required before merge. (cc #6129 witch is stuck on this kind of per tick performance problem as well as other things).
- As the code gets more complex, making sure we never break any of the tricks becomes harder. It would be easier to work on this if such mistakes are marginal instead of show stoppers. (cc #5130 a bug report of one of these bing removed and affecting `./x.py build` enough to trigger an investigation).
- The resolver is already very complicated with a lot of moving parts to keep in your head at the same time, this will only get worse as we add  features. There are a long list of tricks that are *critical* before and `~5%`  after, it would be nice to consider if each one is worth the code complexity. (I can list some of the ones I have tried to remove but are still small wins, if that would help the discussion).

Overall, I think it is worth doing now, but can see that if is not a slam dunk.
@Eh2406
Copy link
Author

Eh2406 commented Nov 21, 2018

Cargo is now using im-rs. Congratulations, you are part of the rust distribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants