Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement get_many_mut #238

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

NiklasJonsson
Copy link

std::collections::HashMap is adding get_many_mut in rust-lang/rust#97601 and I'm working on replacing some uses of that hash map with this one where some of the code uses get_many_mut (currently only a single location in rustc) so I would appreciate it if indexmap::IndexMap could provide the same interface.

This code uses a decent amount of unsafe compared to the rest of the file and I saw the comment in lib.rs on having almost all unsafe code in raw.rs but I couldn't figure out how to move parts of my implementation there as it is mostly dealing with initializing the array. I tried to use array::map to avoid all unsafe but couldn't quite get it to work with the lifetimes and the lambda.

If you want this change, I'll also write some docs.

@cuviper
Copy link
Member

cuviper commented Jul 15, 2022

I think ideally we would want two things for this, especially the first:

  1. Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.
  2. Stabilizing <[T]>::get_many_mut (still new in Add slice methods for indexing via an array of indices. rust-lang/rust#83608) to deal with all the unsafe aspects. We might also want to wrap that into our own get_many_index_mut, just like we have get_index and get_index_mut for slice-like indexing.

I tried to use array::map to avoid all unsafe but couldn't quite get it to work with the lifetimes and the lambda.

I suspect part of this is because you should be using a raw pointer to offset/index each of the elements, otherwise you have the main &mut [T] aliasing each of the &mut T you pull out while you're working. And of course, using raw pointers will still need unsafe to form the final references.

@NiklasJonsson
Copy link
Author

Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.

Yeah, this makes a lot of sense. Is it okay to just leave the PR open until then?

Stabilizing <[T]>::get_many_mut

Haven't seen this before but it would indeed fit pretty well. It seems to have been open for quite a while without resolution though. Also, if they opt for a result-based return value, that'd not match the hashmap api (as it is now) so we'd have to do conversions in the implementation.

I suspect part of this is because you should be using a raw pointer to offset/index each of the elements, otherwise you have the main &mut [T] aliasing each of the &mut T you pull out while you're working. And of course, using raw pointers will still need unsafe to form the final references.

Yeah, exactly, the lambda captures the whole array lifetime mutably and can't be invoked multiple times. I did not consider pointer arithmetic though, I'll experiment a bit and see how it looks. Probably similar to the <[T]>::get_many_mut examples 🙂

src/map.rs Outdated
// SAFETY: OK to discard mutable borrow lifetime as each index is unique
#[allow(unsafe_code)]
unsafe {
&mut *(&mut entries[i].value as *mut V)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still creates aliasing &mut between entries and the previous iterations, which is not allowed. I suggest using self.as_entries_mut().as_mut_ptr() to start with a pointer, then &mut (*entries_ptr.add(i)).value for each index.

We probably need a manual bounds-check too, but that can be part of the loop scanning for duplicates. In theory, get_index_of should always be in-bounds, but better safe than sorry...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how it can create aliasing &mut for entries. I only get a &mut to a specific element of entries (only the value member of that element) and while that references has a lifetime of entries, it is not mutably borrowing all of entries, no? Maybe I'm misunderstand reference/lifetime semantics and what is allowed. Using the pointer doesn't affect the guarantee of the function: each &mut reference that comes out of this function is unique so what difference will it make?

We probably need a manual bounds-check too...

Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I did run miri on the tests like:

# rustup +nightly component add miri
cargo clean
cargo +nightly miri test

but I know miri doesn't catch all cases of undefined behaviour.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe I understand. The variable entries is &mut [Bucket<K, V>] so it is a mutable reference the whole slice but the array out also holds mutable references to individual elements, which alias with entries. Is this what you are referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's it. I'm a little surprised that miri doesn't see any problem, but I think it's because slice indexing is an intrinsic operation in MIR, so miri doesn't really see any use of the entire entries. If you force it to go through the IndexMut trait, entries.index_mut(i), then it does complain:

test map::tests::many_mut_multi_success ... error: Undefined Behavior: trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
    --> /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cmp.rs:1531:27
     |
1531 |             PartialEq::eq(*self, *other)
     |                           ^^^^^
     |                           |
     |                           trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
     |                           this error occurs as part of a reborrow at alloc159164[0xc..0x10]
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <390898> was created by a retag at offsets [0xc..0x10]
    --> src/map.rs:502:19
     |
502  |           let out = indices.map(|i| {
     |  ___________________^
503  | |             // SAFETY: OK to discard mutable borrow lifetime as each index is unique
504  | |             #[allow(unsafe_code)]
505  | |             unsafe {
506  | |                 &mut *(&mut entries.index_mut(i).value as *mut V)
507  | |             }
508  | |         });
     | |__________^
help: <390898> was later invalidated at offsets [0x0..0x40]
    --> src/map.rs:506:29
     |
506  |                 &mut *(&mut entries.index_mut(i).value as *mut V)
     |                             ^^^^^^^^^^^^^^^^^^^^
     = note: backtrace: [...]
    --> src/map/tests.rs:449:5
     |
449  |     assert_eq!(map.get_many_mut([&1, &1123]), Some([&mut 10, &mut 100]));
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We probably need a manual bounds-check too...

Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?

Yeah, if get_index_of returns a bad index, then we have internal errors, so a panic is fine.

@cuviper
Copy link
Member

cuviper commented Jul 15, 2022

Is it okay to just leave the PR open until then?

Yeah, that's fine. I should probably make a label for this...

@cuviper cuviper added the waiting-for-std Changes waiting for stabilization in the standard library, so we can match the API. label Jul 15, 2022
Avoid mutable aliasing by using a pointer to entries
src/map.rs Outdated
for idx in indices {
let idx = idx?;
if idx >= len {
panic!("Index is out of range! Got '{idx}' but length is '{len}'")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One drawback of sharing the code with get_many_index_mut is that we have to duplicate the in-bounds check here to provide the panic behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since this is internal "shouldn't happen" territory, I would also be fine with letting such bugs fall through to get_many_index_mut -> None. Maybe just add a debug_assert!(idx < len) for good measure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure, that makes sense!

@jyn514
Copy link

jyn514 commented Jul 16, 2022

I think ideally we would want two things for this, especially the first: Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.

Can IndexMap add it under a nightly or unstable opt-in cargo feature? Then people will have to opt in to breaking changes, and you can document that changes to this API won't result in a new major version.

(Or if you don't feel strongly, you can just make a new major release if the HashMap API changes for some reason.)

@cuviper
Copy link
Member

cuviper commented Jul 19, 2022

I don't like using nightly/unstable features, in part because it's an extra maintenance burden, but also because the feature may be opted-in on your behalf, arbitrarily deep in your dependency tree.

I'd rather not set us up for needing a semver bump either. If this is something that folks really need, let's apply that pressure on std stabilization. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-std Changes waiting for stabilization in the standard library, so we can match the API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants