Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety #765

Merged
merged 47 commits into from
Sep 24, 2023
Merged

Safety #765

merged 47 commits into from
Sep 24, 2023

Conversation

Imberflur
Copy link
Contributor

@Imberflur Imberflur commented Mar 1, 2023

Fixes #647

See API changes section for description.

Checklist

  • I've added tests for all code changes and additions (where applicable)
  • I've added a demonstration of the new feature to one or more examples
  • I've updated the book to reflect my changes
  • Usage of new public items is shown in the API docs

Todo

  • Add more to PR description
  • Fix this Safety #765 (comment)
  • LendJoin uses nougat since I started working on it before GATs stabilized. I kept it because I thought it might be useful to add more iterator combinators (some combinators are currently impossible to implement with actual GATs). However, I think it would probably be prudent to remove this and use regular GATs for simplicity. So I wanted to remove nougat but it turns out that I can't implement for_each without it (without either running into requiring Self: 'static due to for<'_> hrtb or not being able to have lifetimes in the GAT (LendJoin::Type<'next>) that aren't 'next and aren't 'static.
  • May want to create a quick 0.19.0 release first? See 0.19.0 Release? #764
  • Compare performance to make sure no significant regressions were introduced (and to satisfy my curiosity)
    • Benchmarks (I compared several versions of specs using ecs_bench_suite, there is maybe a 3% increase in add_remove, there was a significant increase (10+%) in deletion times due to shred::MetaTable changes, so I introduce a nightly feature that enables a more efficient implementation using ptr_metadata, fetch heavy code shows improvements due to the use of atomic_refcell in shred instead of the custom atomic cell implementation)
    • Veloren server
  • Publish new versions of hibitset and shred to crates.io and use those instead of git dependencies.
  • Update changelog

API changes

This is a breaking change.

This PR fixes several soundness issues that I encountered when working on #737. These include:

  • The Join implementation for specs::storage::Entries allows creating aliasing &mut references to the underlying storage (i.e. when not using the JoinIter as a lending/streaming iterator).
  • The Join implementation and API of specs::storage::RestrictedStorage similarly can create mutable aliasing references to storages as well as to a specific component.
  • The Join implementation of Storage allowed DerefFlaggedStorage to create aliasing &mut references to the internal events channel.
  • JoinIter::get allows creating aliasing mutable refs, see JoinIter::get allows mutable aliasing without the user writing any unsafe code. #647.
  • Parallel iteration creates aliasing &mut Join::Value references.
  • Under stacked borrows certain Join::get implementations were invalidating previously returned mutable references (e.g. by creating a slice reference to the whole underlying storage).

Many (but not all) of these issues stem from artificially lengthening the lifetime of the &mut Join::Value within Join::get implementations. We now avoid this and leverage alternative mechanisms including interior mutability and lending iteration.

In more detail, we:

  • Introduce LendJoin trait or lending (aka streaming) iteration of joined values.
  • Make some things like Entries now only implement LendJoin (i.e. remove unsound Join implementations).
  • Implement LendJoin for everything that implements Join(manually, not a blanket impl since there are difference in the implementation).
  • Move JoinIter::get to JoinLendIter::get.
  • Add the RepeatableLendGet trait (which JoinLendIter::get requires). This is to facilitate destructive implementations of LendJoin where getting the same index more than once would be unsound (i.e. literally just the owned implementation of LendJoin for ChangeSet which removes values as it iterates).
  • Refactor ParJoin trait to not rely on Join implementation so we have additional flexibility to use different implementations and keep &mut Join::Value in Join::get.
  • Make Join an unsafe trait. The new LendJoin trait is also unsafe to implement.
  • Storage no longer implements mutable Joins (the non-lending variant) for all UnprotectedStorages. Instead this implementation requires that the storage implements SharedGetMutStorage. The trait which provides a shared_get_mut(&self, id: Index) method. Notice this takes a shared reference. Storages that can soundly implement this wrap their components in UnsafeCell to allow constructing a mutable reference from a shared reference.
  • Revamp many pieces of safety documentation.
  • Add #![deny(unsafe_op_in_unsafe_fn)] to make it easier to identify and document unsafe operations.

To try to catch any remaining UB, I ran the available tests under Miri . This also identified some issues in dependencies for which I have submitted PRs to fix:

(We need to publish new versions of these to crates.io)

This PR adds Miri to the CI.

Additionally, specs exposed a nightly cargo feature that enabled additional APIs using GATs. Since GATs are now stabilized, I bumped the MSRV to be able to eliminate this feature and remove a bunch of cfg complexity.

Also I introduced AccessMut trait which is similar to DerefMut except it requires explicit use. The associated typeUnprotectedStorage::AccessMut<'_> now requires AccessMut instead of DerefMut. This is to faciliate my work in #737 where I am exploring a flagged storage type that makes generating modification events more explicit. A blanket implementation of AccessMut for anything implementing DerefMut is included.

@Imberflur Imberflur marked this pull request as ready for review March 1, 2023 03:50
src/storage/mod.rs Outdated Show resolved Hide resolved
@Imberflur Imberflur mentioned this pull request Jun 10, 2023
Imberflur added 27 commits July 23, 2023 17:08
This is the first step for addressing a soundness issue where parallel joins
create aliasing mutable references to the storage and where in regular joins for
some storages previously returned references will be invalidated by calling
`get_mut` (at least under stacked borrows afaict).

The internals of each storage are adjusted to store components within a
`SyncUnsafeCell` to allow handing out mutable references. `SyncUnsafeCell` is a
wrapper of `UnsafeCell` that provides `Sync` by default.

Other various details:
* Edge cases with `as` casts on 16-bit and 32-bit platforms addressed to avoid
  UB in `UnprotectedStorage` impls.
* Safety documentation added to unsafe code usage within `UnprotectedStorage` impls.
* Safety documentation added to unsafe impls of DistinctStorage (only in
  storages.rs)
* Started introduction of `#[deny(unsafe_op_in_unsafe_fn)]` lint in various
  modules.
* Safety requirements on `UnprotectedStorage::get/get_mut` updated.
* `NullStorage` internals updated to handle ZSTs better (including properly
  dropping them when `clean` is called and not dropping them in `insert`) and
  not require `T: Default`.
* In `Storage::insert` add the `id` to the mask after calling `inner.insert()`
  to protect against unwinding from the `insert` call.
…hange how

`UnprotectedStorage::clean` works.

* `clean` now always clears all components even if dropping the storage would
  have dropped them automatically (this helps address an edge case with
  `DenseVecStorage::insert` overflowing a `u32`).
* Add safety requirement to `clean` that indicates the caller should ensure
  the mask has been cleared even if unwinding occurs from the `clean` call.
* Ensured uses of `clean` met this requirement.
* Also continue expanding application of `unsafe_op_in_unsafe_fn` lint.
* Fixed typo from previous commit where FlaggedStorage::get was using get_mut
  internally.
…oned since

we most likely want to transition this to a streaming only storage (which is
thus given &mut access).
Implementations and uses of these traits are not yet changed. However, hopefully
this is the final form needed to have storages that only support
lending/streaming joins while also allowing some storages to support regular
`Iterator` like joins and parallel joins as well as allowing storages where
`UnprotectedStorage<T>::AccessMut` doesn't implement `DerefMut<Target =T>` (like
a planned variant of the flagged storage).
…ed/used:

(NOTE: nothing compiling yet since implementation of these traits have not been
updated)

* Introduced `LendJoin` trait that is like the lending iterator version of
  `Join`. This is useful for types that need to return aliasing mutable
  references from `get` calls with distinct `id`s (e.g. `Entries`,
  `DerefFlaggedStorage`, `RestrictStoraged`). `LendJoin` uses `nougat` crate to
  provide a GAT based API on stable rust.
* Removed unsound `JoinIter::get`/`JoinIter::get_unchecked` but these methods
  are present on `JoinLendIter` where they can be soundly implemented.
* Since there is a single `MaybeJoin` type used for all joins, the convenient
  `.maybe()` method was moved to `LendJoin` which should be the common
  denominator of implemented join traits (if we put this method on multiple
  traits, rust might start wondering which one you want to call, which isn't
  convenient...).
* `ParJoin` trait is no an longer empty trait that relies on the implementation
  in `Join`. The new `ParJoin::get` takes a shared reference so the
  `ParallelIterator` implementation no longer creates aliasing exclusive
  references to call `Join::get`.
* `Join` is now an `unsafe` trait to require that the mask/values returned from
  `Join::open` are properly associated.
* Extended application of `deny(unsafe_op_in_unsafe_fn)` to the `join` module
  and added safety documentation to calls to unsafe functions there.
* Removed `Clone` implementation for `JoinIter<J> where J::Mask: Clone,
  J::Value: Clone`. Nothing, in `Join::get` safety requirements implies that
  this is safe, in the cases where this is safe, the user can just call
  `.join()` twice for similar effect.

Other misc changes:

* `BitAnd` helper trait and `MaybeJoin` struct moved to their own files to
  declutter `join/mod.rs`.
…hese traits

and add LendJoin implementations.

Compiles again!!!

* Several Join implementors where commented out (marked with `D-TODO`) so that I
  can update them in a separate batch. Want to make sure the changes were
  working first.
* Remove `where Self: 'next'` bound from `LendJoin::Type<'next>'` since this was
  causing issues and an unnecessary bound.
* Fix several other errors related to usage of `LendJoin`'s GAT.
* Fix other misc errors from the last few commits
* `deny(unsafe_op_in_unsafe_fn)` now covers the whole crate.
* Add safety comments to unsafe code used in `Generation` methods.

Still need to:
* Implement `SharedGetAccessMutStorage` for relevant storages.
* Update commented out types that implement `Join`.
* Update some safety comments.
`SharedGetAccessMutStorage` -> `SharedGetMutStorage`, rename
`shared_get_access_mut` -> `shared_get_mut`.
* Start work on implementing LendJoin and safely re-implementing Join for
  `&ChangeSet`, `&mut ChangeSet`, and `ChangeSet`.
* Add `AccessMut` trait as a replacement for a few cases that were using
  `DerefMut` (since we don't want to require that
  `UnprotectedStorage::AccessMut<'a>' has to implement `DerefMut`). IIRC the
  cases were originally missed because they are behind feature flags.
* Modify `SharedGetMutOnly` to also be generic over the storage type so that we
  don't have to require `T: Component` (since we were getting the storage type
  from the associated `Component::Storage`). IIRC this is to support use in
  `ChangeSet<T>` which doesn't require `T: Component`.
…ngeSet<T>

where iterating it removes items.

* Added additional requirement to Join::get/LendJoin::get that it can not be
  called multiple times with the same ID.
* Added unsafe `RepeatableLendGet` trait to allow opt-out of this requirement so
  that a safe `JoinLendIter::get` method can remain exposed.
* Updated relevant safety comments for uses/impls of `LendJoin::get`.
* TODO for next commit: update all uses/impls of `Join::get` to ensure they
  correspond with the requirement changes.
* Replace `Join` impl with `LendJoin` (to avoid creating aliasing mutable
  references to the storage).
* Create new `Storage::not_present_insert` method that requires that the `id`
  not be present in the mask. This is used by both `Storage::insert` and
  `VacantEntry::insert` so we can centralize documenting the safety of calling
  `UnprotectedStorage::insert` and the handling of potential unwinding from
  `BitSet::add`.
…lated

changes:

* SharedGetMutOnly::get_mut changed from method to associated function to make
  its use more apparent (e.g. compared to calling UnprotectedStorage::get_mut).
* New requirement added to ParJoin trait implementation to facilitate
  callers of ParJoin::get that need to ensure they don't repeat indices.
* `ShareGetMutStorage::shared_get_mut` requirements tweaked to allow calling
  this in conjuction with `UnprotectedStorage::get` when the `id`s used don't
  overlap. This facilitates `Join`/`ParJoin` impls for `RestrictedStorage` which
  can allow getting a component for one entity mutably while immutably getting
  the component for another entity at the same time.
* Marker types used for restricted storage implementation replaced with
  producing distinct types for different types of joins: `PairedStorageRead`
  (for any read only join), `PariedStorageWriteExclusive` (for mutable
  LendJoin), and `PairedStorageWriteShare` (for mutable Join/ParJoin).
* Renamed `PairedStorage` (which was replaced with the 3 types above) methods
  `get_unchecked`/`get_unchecked_mut` to `get`/`get_mut` since `unchecked` often
  is used to indicate some safety requirement hasn't been checked which isn't
  the case here. Renamed existing `get`/`get_mut` methods to
  `get_other`/`get_mut_other`.
* Other misc changes that were missed in previous commits.
Cargo.toml Outdated Show resolved Hide resolved
src/bitset.rs Outdated Show resolved Hide resolved
@Imberflur
Copy link
Contributor Author

Exciting news! I finally got a chance to update veloren to use this and profile it and I don't see any particular regressions. Might try to post some tracy pictures later.

I think this should be ready to merge soon.

@Imberflur
Copy link
Contributor Author

Here are some profiling results in veloren, "this trace" is before changes here and "external trace" is after. I focused on two systems. "character_behavior" which has a join over a lot of component types and "phys" which has multiple joins over fewer component types of which some joins are parallel ones. It seems like there are no regressions and potentially a slight improvement (the profiling conditions have room to be more strictly controlled so I would not trust that this improvement is as significant as it appears here).

image

image

.github/workflows/deny.yml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Imberflur Imberflur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some notes that will hopefully add some helpful context for reviewers

Comment on lines +81 to +94
miri:
name: "Miri"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Miri
run: |
rustup toolchain install nightly --component miri
rustup override set nightly
cargo miri setup
- name: Install latest nextest release
uses: taiki-e/install-action@nextest
- name: Test with Miri
run: ./miri.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI now includes running miri.

Comment on lines +120 to +121
if comp.get().condition < 5 {
let mut comp = comp.get_mut();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed get_unchecked -> get on the paired storage types since it could imply that an important check is missing, but it is just that this gets the component value for the current entity in a join so it doesn't need to check that it is present in the storage. Paired storages also allow getting the component for a different entity than the current one, that was renamed to get_other.

Copy link

@cpetig cpetig Sep 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _unchecked suffix always reminds me of unsafe access (e.g. https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked ), so this rename is good.

Update: I later realized that get indeed remains unsafe, and the restriction that the item must exists sounds very similar to the unchecked variant of slice. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I later realized that get indeed remains unsafe, and the restriction that the item must exists sounds very similar to the unchecked variant of slice.

The method discussed here has always been safe. There are other unsafe methods with the same name that this PR touches, so maybe you are thinking of one of those.

Comment on lines +223 to +244
unsafe impl<T> LendJoin for ChangeSet<T> {
type Mask = BitSet;
type Type<'next> = T;
type Value = DenseVecStorage<T>;

unsafe fn open(self) -> (Self::Mask, Self::Value) {
(self.mask, self.inner)
}

unsafe fn get<'next>(value: &'next mut Self::Value, id: Index) -> Self::Type<'next> {
// NOTE: This impl is the main reason that `RepeatableLendGet` exists
// since it moves the value out of the backing storage and thus can't
// be called multiple times with the same ID!
//
// SAFETY: Since we require that the mask was checked, an element for
// `id` must have been inserted without being removed. Note, this
// removes the element without effecting the mask. However, the caller
// is also required to not call this multiple times with the same `id`
// value and mask instance. Because `open` takes ownership we don't have
// to update the mask for futures uses since the `ChangeSet` is
// consumed.
unsafe { value.remove(id) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we have the unsafe RepeatableLendGet marker trait is for this case where we can't get the same spot twice because this removes the value.

@@ -0,0 +1,57 @@
use hibitset::{BitSetAnd, BitSetLike};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/join/bit_and.rs

is just factoring some existing code out of the main join module

Comment on lines +222 to +232
/// Calls a closure on each entity in the join.
pub fn for_each(mut self, mut f: impl FnMut(LendJoinType<'_, J>)) {
self.keys.for_each(|idx| {
// SAFETY: Since `idx` is yielded from `keys` (the mask), it is
// necessarily a part of it. `LendJoin` requires that the iterator
// doesn't repeat indices and we advance the iterator for each `get`
// call in all methods that don't require `RepeatableLendGet`.
let item = unsafe { J::get(&mut self.values, idx) };
f(item);
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the method that can't work without nougat (at least with my best efforts).

/// Used by the framework to quickly join components.
pub trait UnprotectedStorage<T>: TryDefault {
/// The wrapper through with mutable access of a component is performed.
#[cfg(feature = "nightly")]
type AccessMut<'a>: DerefMut<Target = T>
type AccessMut<'a>: AccessMut<Target = T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this from DerefMut to a new AccessMut trait to allow types that are automatically deferenceable. This is for a flagged storage variant that I'm planning that would be more explicit about when a component is marked as mutated. Most cases aren't effected by this unless they are generic over the component type. AccessMut also still has Deref as a super trait like DerefMut does.

Comment on lines +860 to +862
/// Used by the framework to mutably access components in contexts where
/// exclusive access to the storage is not possible.
pub trait SharedGetMutStorage<T>: UnprotectedStorage<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base UnprotectedStorage trait means that we can implement LendJoin for it. SharedGetMutStorage enables implementing Join. And finally DistinctStorage enables ParJoin.

Comment on lines 431 to 256
pub struct PairedStorageWriteShared<'rf, C: Component> {
index: Index,
storage: SharedGetOnly<'rf, C, C::Storage>,
}

impl<'rf, 'st, C, S, B, Restrict> PairedStorage<'rf, 'st, C, S, B, Restrict>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the additional complexity of adding the LendJoin case it was much easier to have several separate paired storage types rather than using generic marker parameters.

/// });
/// b.get_mut();
/// ```
fn _dummy() {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_fail test to check for mistakes

Comment on lines -218 to -220
impl<T> UnprotectedStorage<T> for NullStorage<T>
where
T: Default,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullStorage no longer requires T: Default, since we can soundly conjure up ZSTs provided that they represent instances that this has previously taken ownership of

tests/no_parallel.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@xMAC94x xMAC94x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard to check such a massive PR. i had a look through all of it and am positive that its an improvement. LGTM

@Imberflur
Copy link
Contributor Author

@xMAC94x thanks for taking a look ❤️

I will wait till this weekend in case anyone else is interested in reviewing. If anyone interested needs more time just let me know.

@Imberflur Imberflur merged commit 437e720 into amethyst:master Sep 24, 2023
15 of 19 checks passed
@Imberflur Imberflur deleted the safety branch September 24, 2023 23:54
@zesterer
Copy link
Contributor

I just wanted to say: thanks for pushing this all the way to completion :) I would have reviewed, but I've been busy with other things recently. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JoinIter::get allows mutable aliasing without the user writing any unsafe code.
4 participants