Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global and local pinning #31

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions text/2018-04-08-global-and-local-pinning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# Summary

Introduce the concept of *global pinning*, which allows one to pin the current thread without going through a thread-local handle.

In addition to that, we'll slightly tweak the general interface of `crossbeam-epoch` in order to fix several reported problems with it.

# Motivation

The following problems with the current `Collector`/`Handle`/`Guard` interface have been reported:

1. Users might mistakenly expect `Handle::clone()` to register a brand-new handle rather than returning a reference to the same handle. The method can be dangerously misleading.

2. Handles are not `Send`. This is an annoying problem and it's not obvious how to cheat around it correctly when you really have to.

3. Handles are not `Sync`. Sometimes it'd be nice to call `pin()` without registering a handle for each thread. In `SkipMap`, we need to pin inside `self.collector`, but creating a handle per-thread *and* per-instance is difficult. In `elfmalloc` there are situations where having to register handles is a nuisance.

4. `epoch::pin()` will panic if the thread-local handle is destroyed. Fortunately, `LocalKey::try_with` will be soon stabilized in Rust 1.26. But how do we work around the panics then - what should we do exactly in case the handle is destroyed?

# Detailed design

### Global pinning

Let's introduce something I've been referring to as *anonymous pinning* in the past, and has been briefly explained in [this comment](https://github.com/crossbeam-rs/crossbeam-epoch/pull/21#discussion_r144353198).

The idea is to have two atomic counters in the `Collector` (more concretely, in the `Global` struct). They count how many `Guards` are currently pinned in the global epoch and the epoch preceding it:

```rust
pin_counters: [AtomicUsize; 2],
```

We introduce a new method `Collector::pin()`, which is just like `Handle::pin()`:

```rust
impl Collector {
fn pin(&self) -> Guard;
}
```

We'll have to distinguish between guards pinned globally and locally. Since we're turning `Guard` into an `enum`, let's also add a special case for the unprotected guard:

```rust
enum Guard {
Global(*const Global),
Local(*const Local),
Unprotected,
}
```

Now, the `Global::pin()` method might be implemented like this:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Collector::pin()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, Collector::pin() would simply call Global::pin(), just like Handle::pin() now calls Local::pin(). It's the same thing :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, fsr I hadn't realized Global is also a type and not only a variant.)


```rust
fn pin(&self) -> Guard {
// 0 if the epoch is even, or 1 if the epoch is odd.
let parity = self.epoch.load(Ordering::Relaxed).parity();

self.pin_counters[parity].fetch_add(1, Ordering::Relaxed);
atomic::fence(Ordering::SeqCst);

Guard::Global(self)
}
```

Of course, unpinning will simply decrement the counter.

**@jeehoonkang** [raised a concern](https://github.com/crossbeam-rs/crossbeam-epoch/pull/21#discussion_r144719080) with correctness of this method of pinning.

Consider this. What happens if we read a stale value of `self.epoch` and increment a counter just after the epoch advanced one step forward, and execute the fence just before the epoch is advanced one step forward again? That means we're effectively two epochs behind, while other threads might think we're zero epochs behind.

I think the problem is easily fixable by changing when a bag becomes expired. Rather than defining a bag is expired if it's at least 2 epochs old, we should say it's expired if it's at least 3 epochs old. That's it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive my ignorance, but as I understand, the benefit of using parity over a single counter is that with parity the epoch can still be advanced by 1 when there is a global pin present, right? If a larger counter is used, say mod 4, and then the epoch age required for destruction extended to 5, can the epoch advance by up to 3 with a global pin present? I'm not really proposing anything like that, but I'm curious about the relationship between the number of counters and the minimal free-able epoch age.


Finally, an important insight ought to be made about the difference between global (through `Collector`) and local (through `Handle`) pinning. Think of `Collector::pin()` as the *simple & easy* way of using the GC, and consider `Handle::pin()` just an optimized variant that avoids contention by writing to a special thread-local structure (a handle). In that sense, handles are a completely optional optimization, nothing more!

### Adjustments to handles

Let's remove `Handle::clone()` because it's misleading. It hasn't proved to be very useful so far, and is only used by `epoch::default_handle()`. Moreover, `epoch::default_handle()` hasn't been useful either so let's remove it, too.

On the other hand, `epoch::default_collector()` really is useful, and is used by `SkipMap` to initialize `self.collector` (of type `Collector`) to the default collector.

The omission of `epoch::default_handle()` isn't a big problem since we already have `epoch::pin()` and `epoch::flush()`. Note that `epoch::pin()` is a much better option than `epoch::default_handle().pin()` anyway - it can automatically fall back to global pinning if the thread-local handle is destroyed:

```rust
fn pin() {
HANDLE.try_with(|h| h.pin()).unwrap_or_else(|| COLLECTOR.pin())
}
```

Next, we implement `Send` for `Handle` and mark `Handle::pin()` as unsafe:

```rust
unsafe impl Send for Handle {}

impl Handle {
unsafe fn pin(&self) -> Guard;
}
```

The caller of `Handle::pin()` must pinky promise that the handle will not be sent to another thread while the `Guard` (or its clones) is alive. Unless, of course, the guard (and its clones) is simultaneously sent to the same thread, too.

### Remove `is_pinned()`

Another small tweak we'll do is remove `epoch::is_pinned()` and `Handle::is_pinned()`. These functions are only used by `crossbeam-deque` to make sure a fence is executed even if the thread is already pinned.

Instead, we'll add functions `epoch::pin_fence()`, `Handle::pin_fence()`, and `Collector::pin_fence()`, which are equivalent to `pin()`, except they guarantee to execute a `SeqCst` fence.

# Drawbacks

* `epoch::default_handle()` is gone. But it doesn't seem to be useful and we have `epoch::pin()` and `epoch::flush()` anyway. We can reintroduce it in the future if a need actually comes up.

* `Handle::clone()` is gone. However, nobody is using it and when it is used, it's used mistakenly. In short, it comes with practically zero benefits and a serious drawback.

* `Handle::pin()` is now unsafe. See the *Alternatives* section for what we can do about it.

# Alternatives

### Add safe `Handle::pin_with()` and remove `Guard::clone()`

We can add a safe alternative to `Guard::pin()`:

```rust
impl Handle {
fn pin_with<F, R>(&self, f: F) -> R
where
F: FnOnce(&mut Guard) -> R;
}
```

This function is safe because it borrows the handle for the duration of pinning, which means it cannot be sent to another thread in the meantime. However, the user mustn't be able to clone the `Guard` and send the handle to another thread afterwards, as in this example:

```rust
let guard = handle.pin_with(|guard| guard.clone());
thread::spawn(move || {
handle.pin_with(|_| ());
});
drop(guard); // Oops, the handle now belongs to another thread!
```

Therefore, we'd have to remove the `Clone` impl for `Guard`.

### Make `Handle::pin()` safe, but add a lifetime to `Guard`

Another alternative is to add a lifetime to `Guard` so that it becomes `Guard<'a>`. The lifetime `'a` borrows the `Handle` and therefore prevents it from being moved at all, which means it cannot be sent to another thread while such a guard exists.

This way we don't have to remove `Handle::clone()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this alternative, but if we do go with it, I suggest to remove Handle::clone() anyway, since as you mentioned it is not clear what the method should do.


Note that `Collector::pin()` and `epoch::pin()` simply return a `Guard<'static>`:

```rust
impl Collector {
// We're not using a `Handle` - there's nothing to borrow, so we can use `'static`.
fn pin(&self) -> Guard<'static>;
}

impl Handle {
// The returned guard borrows `self`, preventing it from being sent to another thread.
fn pin<'a>(&'a self) -> Guard<'a>;
}

// This function uses the default thread-local handle, but it cannot be sent to another thread,
// so we don't even have to borrow it.
fn pin() -> Handle<'static>;
```

The main drawback of this solution is that the lifetime might be at times annoying, but note that the lifetime can be simply elided in most real-world situations.

### Two kinds of handles: `SharedHandle` and `LocalHandle`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any drawbacks to this alternative besides it being somewhat more complex to implement and use?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, that's the only drawback.


We could generalize the concept of global pinning and have a type of handle that can be shared between threads. It would work just like global pinning and holding two counters:

```rust
struct SharedHandle { // Send + Sync
shared: *const internal::Shared,
}

mod internal {
struct Shared {
// ...
pin_counters: [AtomicUsize; 2],
}
}

struct LocalHandle { ... } // Send

impl Collector {
fn register_shared(&self) -> SharedHandle;
fn register_local(&self) -> LocalHandle;
}

impl SharedHandle {
fn pin(&self) -> Guard;
}

impl LocalHandle {
unsafe fn pin(&self) -> Guard;
}
```

Note that now we don't need to have `Collector::pin()` and `Collector::flush()` because one can just create a `SharedHandle` and use it for global pinning.

This approach has one more important benefit. Consider how we might use global pinning in `SkipMap`:

```rust
struct SkipMap<K, V> {
// ...

collector: Collector,
}

impl SkipMap<K, V> {
fn pin(&self) -> Guard {
self.collector.pin()
}

fn insert(&self, k: K, v: V) -> Entry<K, V> {
let guard = self.pin();
// ...
}
}
```

Updating the same set of counters in `self.collector.pin()` will cause contention and slow things down. How much it actually affects performance needs to be measured, but it is a concern.

A possible optimization is to create an array of distinct shared handles and choose one depending on the current thread, thus reducing the unwanted effects of contention:

```rust
struct SkipMap<K, V> {
// ...

collector: Collector,
handles: [SharedHandle; 16],
}

impl SkipMap<K, V> {
fn pin(&self) -> Guard {
let id = thread_id();
self.collector[id % 16].pin()
}

fn insert(&self, k: K, v: V) -> Entry<K, V> {
let guard = self.pin();
// ...
}
}
```

# Unresolved questions

None so far.