Stacked Borrows violation in macOS RwLock #121626

RalfJung · 2024-02-26T08:58:12Z

Today's run of the std test suite in Miri failed on macOS:

  3.314536   test sync::rwlock::tests::frob ... error: Undefined Behavior: trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000053      --> /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18
  0.000015       |
  0.000009   401 |         unsafe { &*self.as_ptr().cast_const() }
  0.000010       |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       |                  |
  0.000010       |                  trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000008       |                  this error occurs as part of retag at alloc9742021[0x0..0x30]
  0.000009       |
  0.000009       = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  0.000009       = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
  0.000010   help: <42368726> was created by a SharedReadOnly retag at offsets [0x0..0x8]
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:348:32
  0.000009       |
  0.000009   348 |                 let mut next = ptr::from_ref(&node)
  0.000009       |                                ^^^^^^^^^^^^^^^^^^^^
  0.000011   help: <42368726> was later invalidated at offsets [0x8..0x10] by a write access
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:347:17
  0.000008       |
  0.000009   347 |                 node.prev = AtomicLink::new(None);
  0.000009       |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       = note: BACKTRACE (of the first span):
  0.000009       = note: inside `core::ptr::NonNull::<sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18: 401:46
  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^

The diff to the last successful run is this, but since this is a concurrency test, there's a high chance that the bug already existed before and only surfaces randomly. So #110211 is the most likely culprit.

So far I haven't managed to reproduce this outside the context of the standard library test suite.

The text was updated successfully, but these errors were encountered:

RalfJung · 2024-02-26T09:37:06Z

Looking at the code, what seems to be happening is that one iteration through the loop we create a shared reference here

rust/library/std/src/sys/locks/rwlock/queue.rs

Lines 348 to 350 in b58f647

    
           let mut next = ptr::from_ref(&node) 
        
               .map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED)) 
        
               as State;

I am surprised that the diagnostic says "created by a SharedReadOnly retag at offsets [0x0..0x8]". This type does have interior mutability, so parts of the retag should be SharedReadWrite. This is the type in question:

rust/library/std/src/sys/locks/rwlock/queue.rs

Lines 179 to 187 in b58f647

    
           #[repr(align(8))] 
        
           struct Node { 
        
               next: AtomicLink, 
        
               prev: AtomicLink, 
        
               tail: AtomicLink, 
        
               write: bool, 
        
               thread: OnceCell<Thread>, 
        
               completed: AtomicBool, 
        
           }

Everything except the bool here is an UnsafeCell. So how can there be 8 bytes of SharedReadOnly (i.e., outside UnsafeCell)?

miri-test-libstd uses -Zrandomize-layout. Maybe that has something to do with it. Maybe that's why we couldn't reproduce it yet.

RalfJung · 2024-02-26T09:49:04Z

So with random layout, I guess the bool field comes first and then an 8-aligned field follows, meaning we have 8 bytes of "not interior mutable" memory at the start. So far, so good.

Then apparently later it got invalidated at offsets 8 to 16 here

rust/library/std/src/sys/locks/rwlock/queue.rs

Line 347 in b58f647

node.prev = AtomicLink::new(None);

So that pointer must be the second field.
Then later we are using the original shared reference again and 💥 .

I think this is morally equivalent to the following:

use std::cell::UnsafeCell;

fn main() {
    let mut c = UnsafeCell::new(42);
    let ptr = c.get();
    c = UnsafeCell::new(13);
    unsafe { ptr.read(); }
}

This is rejected by Miri with Stacked Borrows but accepted with Tree Borrows.

The "blessed" way to do this is to never mix direct accesses to a local with accesses through derived pointers. Fixing that in queue.rs shouldn't be too hard.

tree borrows: add a test to sb_fails This is something that happens in the wild (rust-lang/rust#121626), so TB accepting this is good. Let's make sure we notice if this ever changes.

RalfJung · 2024-02-26T10:38:50Z

From what @joboet said, the actual bug is that there's anyone even still using the old reference when we are doing the write that does the invalidation.

According to the Miri log, we have these events in this order:

A reference gets created and turned into a raw pointer

rust/library/std/src/sys/locks/rwlock/queue.rs

Lines 348 to 350 in b58f647

    
           let mut next = ptr::from_ref(&node) 
        
               .map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED)) 
        
               as State;

The memory that reference points to is written to via direct access to the local variable

rust/library/std/src/sys/locks/rwlock/queue.rs

Line 347 in b58f647

node.prev = AtomicLink::new(None);

Someone uses the pointer derived from the reference

rust/library/std/src/sys/locks/rwlock/queue.rs

Line 265 in b58f647

next.as_ref().prev.set(Some(current));

We have a full backtrace only for the last event.

Backtrace

  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^

apiraino · 2024-02-26T16:05:27Z

AFAIK T-libs doesn't use the I-prioritize label, so probably fine to remove it.

@rustbot label -I-prioritize

RalfJung · 2024-02-26T16:24:13Z

I did managed to reproduce this on my machine once with that flag (with seed 2852 on Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d), but when I run the testcase again with that seed it doesn't happen again.

This is the test:

use rand::Rng;
use std::sync::mpsc::channel;
use std::sync::{Arc, RwLock};
use std::thread;

fn main() {
    const N: u32 = 10;
    const M: usize = if cfg!(miri) { 100 } else { 1000 };

    let r = Arc::new(RwLock::new(()));

    let (tx, rx) = channel::<()>();
    for _ in 0..N {
        let tx = tx.clone();
        let r = r.clone();
        thread::spawn(move || {
            let mut rng = rand::thread_rng();
            for _ in 0..M {
                if rng.gen_bool(1.0 / (N as f64)) {
                    drop(r.write().unwrap());
                } else {
                    drop(r.read().unwrap());
                }
            }
            drop(tx);
        });
    }
    drop(tx);
    let _ = rx.recv();
}

Full command I used:

CARGO_EXTRA_FLAGS="--profile=release" MIRI_SEED_START=2189 MIRI_SEED_END=4096 MIRIFLAGS="-Zrandomize-layout -Zmiri-strict-provenance -Zmiri-disable-isolation" ./miri many-seeds ./miri run --dep rwlock.rs --target aarch64-apple-darwin

So... it uses actual randomness, and thus of course it does not reproduce even with the same seed. :/ (Disabling isolation means we give the program access to host randomness, which will be used to initialize the thread_rng.)

RalfJung · 2024-02-26T16:25:31Z

The backtrace was a bit different this time:

error: Undefined Behavior: trying to retag from <527506> for SharedReadWrite permission at alloc280832[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <527506> for SharedReadWrite permission at alloc280832[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc280832[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <527506> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <527506> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:257:26: 257:42
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:468:28: 468:51
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read_unlock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:440:22: 440:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read_unlock` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:416:36: 416:69
    = note: inside `<std::sync::RwLockReadGuard<'_, ()> as std::ops::Drop>::drop` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:715:13: 715:42
    = note: inside `std::ptr::drop_in_place::<std::sync::RwLockReadGuard<'_, ()>> - shim(Some(std::sync::RwLockReadGuard<'_, ()>))` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:515:1: 515:56
    = note: inside `std::mem::drop::<std::sync::RwLockReadGuard<'_, ()>>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:992:24: 992:25
note: inside closure
   --> rwlock.rs:22:21
    |
22  |                     drop(r.read().unwrap());
    |                     ^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

So the entry point of the bad use of the pointer is read this time, not write. But from unlock_queue onwards it's the same.

RalfJung · 2024-02-26T20:45:18Z

I finally found some seeds. :) 2998, 5308, 7680. On Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d:

$ MIRIFLAGS="-Zmiri-seed=2998 -Zmiri-track-weak-memory-loads -Zmiri-track-pointer-tag=669222" ./miri run --dep rwlock.rs --target aarch64-apple-darwin
[...]
note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5
    |
758 |     r
    |     ^ created tag 669222 with SharedReadOnly/SharedReadWrite permission for frozen/non-frozen parts (pointee type std::sys::locks::rwlock::queue::Node) at alloc322898[0x0..0x28] derived from <669221>
    |
    = note: inside `std::ptr::from_ref::<std::sys::locks::rwlock::queue::Node>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5: 758:6
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:348:32: 348:52
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16
    |
203 |     fn prepare(&mut self) {
    |                ^^^^^^^^^ popped tracked tag for item [SharedReadOnly for <669222>] due to Unique retag from <690408> (that retag created <690409>)
    |
    = note: inside `std::sys::locks::rwlock::queue::Node::prepare` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16: 203:25
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:340:17: 340:31
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17
    |
346 |                 node.next.0 = AtomicPtr::new(state.mask(MASK).cast());
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <669222>] due to Write access for <571708>
    |
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17: 346:70
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17
    |
347 |                 node.prev = AtomicLink::new(None);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <669222>] due to Write access for <571708>
    |
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17: 347:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

error: Undefined Behavior: trying to retag from <669222> for SharedReadWrite permission at alloc322898[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <669222> for SharedReadWrite permission at alloc322898[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc322898[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <669222> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <669222> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:265:17: 265:30
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:383:25: 383:48
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:295:13: 295:39
    = note: inside `std::sync::RwLock::<()>::read` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:273:13: 273:30
note: inside closure
   --> rwlock.rs:22:26
    |
22  |                     drop(r.read().unwrap());
    |                          ^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

RalfJung · 2024-03-02T15:03:58Z

Turns out this diff fixes all the 3 seeds that I found above:

--- a/library/std/src/sys/locks/rwlock/queue.rs
+++ b/library/std/src/sys/locks/rwlock/queue.rs
@@ -317,7 +317,7 @@ pub fn write(&self) {
     fn lock_contended(&self, write: bool) {
         let update = if write { write_lock } else { read_lock };
         let mut node = Node::new(write);
-        let mut state = self.state.load(Relaxed);
+        let mut state = self.state.load(Acquire);
         let mut count = 0;
         loop {
             if let Some(next) = update(state) {

Alternatively, this diff also does it:

--- a/library/std/src/sys/locks/rwlock/queue.rs
+++ b/library/std/src/sys/locks/rwlock/queue.rs
@@ -286,7 +286,7 @@ pub const fn new() -> RwLock {
 
     #[inline]
     pub fn try_read(&self) -> bool {
-        self.state.fetch_update(Acquire, Relaxed, read_lock).is_ok()
+        self.state.fetch_update(Acquire, Acquire, read_lock).is_ok()
     }
 
     #[inline]

So I think something is wrong with the memory orderings used somewhere. I can't tell which of these state accesses should be strengthened though. (And there might more 1-line diffs that suffice, I didn't try all the Relaxed.)

I also found some other suspicious orderings, namely RMW operations where the success case has a weaker ordering than the failure case:

rust/library/std/src/sys/locks/rwlock/queue.rs

Line 405 in b58f647

match self.state.fetch_update(Release, Acquire, |state| {

rust/library/std/src/sys/locks/rwlock/queue.rs

Lines 492 to 497 in b58f647

    
           match self.state.compare_exchange_weak( 
        
               state, 
        
               state.mask(!QUEUE_LOCKED), 
        
               Release, 
        
               Acquire, 
        
           ) {

rust/library/std/src/sys/locks/rwlock/queue.rs

Line 537 in b58f647

self.state.compare_exchange_weak(state, UNLOCKED, Release, Acquire)

IMO these justify a comment explaining why the orderings as so surprising, but they don't seem involved in this bug, at least not in the cases we have found so far.

print thread name in miri error backtraces; add option to track read/write accesses This came up while debugging rust-lang/rust#121626. It didn't end up being useful there but still seems like good tools to have around.

RalfJung · 2024-03-03T07:30:42Z

Turn out I went down the wrong rabbit hole. Changing these orders means the program took a different path somewhere earlier in the execution so we didn't hit the bad interleaving any more. I found a seed that reproduces this issue even without weak memory emulation, so the orderings can't be the problem:

Miri 9272474b755dca7dfea4f96fd8344dd12e223ae8, -Zmiri-seed=11408 -Zmiri-disable-weak-memory-emulation

$ ./miri run --dep rwlock.rs --target aarch64-apple-darwin -Zmiri-seed=11408 -Zmiri-disable-weak-memory-emulation -Zmiri-track-pointer-tag=1389599
note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5
    |
758 |     r
    |     ^ created tag 1389599 with SharedReadOnly/SharedReadWrite permission for frozen/non-frozen parts (pointee type std::sys::locks::rwlock::queue::Node) at alloc702309[0x0..0x28] derived from <1389598>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::ptr::from_ref::<std::sys::locks::rwlock::queue::Node>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5: 758:6
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:348:32: 348:52
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16
    |
203 |     fn prepare(&mut self) {
    |                ^^^^^^^^^ popped tracked tag for item [SharedReadOnly for <1389599>] due to Unique retag from <1399547> (that retag created <1399548>)
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::Node::prepare` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16: 203:25
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:340:17: 340:31
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17
    |
346 |                 node.next.0 = AtomicPtr::new(state.mask(MASK).cast());
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <1389599>] due to Write access for <1272013>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17: 346:70
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17
    |
347 |                 node.prev = AtomicLink::new(None);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <1389599>] due to Write access for <1272013>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17: 347:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

error: Undefined Behavior: trying to retag from <1389599> for SharedReadWrite permission at alloc702309[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <1389599> for SharedReadWrite permission at alloc702309[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc702309[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1389599> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <1389599> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span) on thread `unnamed-9`:
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:265:17: 265:30
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:383:25: 383:48
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

What makes this particularly strange is that everything relevant seems to happen on the same thread... not sure what to make of that.

…=RalfJung tree borrows: add a test to sb_fails This is something that happens in the wild (rust-lang#121626), so TB accepting this is good. Let's make sure we notice if this ever changes.

…RalfJung print thread name in miri error backtraces; add option to track read/write accesses This came up while debugging rust-lang#121626. It didn't end up being useful there but still seems like good tools to have around.

joboet · 2024-03-03T15:21:27Z

I found the bug! It's an ABA problem. When a new node is added, its thread basically executes:

let next = state.load();
node.next = next;
state.compare_exchange(next, &node);

In the failing executions, the thread gets interrupted for a long time, long enough that the thread at the head of the queue gets woken up (invalidating all shared references to it) and gets readded to the queue (with a new shared reference). The CAS succeeds, because the pointer has the same bitpattern but once we try to perform an access through the pointer, we obviously incur UB.

This is very hard to fix, since the bug can occur anytime a Node allocation has the same address as a previous one. At the moment, the only solution I see is to use lossy provenance and make miri guess the right borrow permissions.

RalfJung · 2024-03-03T22:14:13Z

Nice catch!

Closing in favor of #121950.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 26, 2024

RalfJung mentioned this issue Feb 26, 2024

tree borrows: add a test to sb_fails rust-lang/miri#3327

Merged

RalfJung mentioned this issue Feb 26, 2024

rwlock: avoid Stacked Borrows violation #121630

Closed

RalfJung mentioned this issue Feb 26, 2024

try to reproduce nightly error rust-lang/miri-test-libstd#42

Closed

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 26, 2024

rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 26, 2024

RalfJung mentioned this issue Mar 2, 2024

print thread name in miri error backtraces; add option to track read/write accesses rust-lang/miri#3338

Merged

joboet mentioned this issue Mar 3, 2024

ABA-problem with pointer provenance in lockless queues #121950

Open

RalfJung closed this as completed Mar 3, 2024

RalfJung mentioned this issue Mar 19, 2024

Update seize to 0.3 jonhoo/flurry#123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stacked Borrows violation in macOS RwLock #121626

Stacked Borrows violation in macOS RwLock #121626

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

apiraino commented Feb 26, 2024

RalfJung commented Feb 26, 2024 •

edited

Loading

RalfJung commented Feb 26, 2024 •

edited

Loading

RalfJung commented Feb 26, 2024

RalfJung commented Mar 2, 2024 •

edited

Loading

RalfJung commented Mar 3, 2024 •

edited

Loading

joboet commented Mar 3, 2024

RalfJung commented Mar 3, 2024

Stacked Borrows violation in macOS RwLock #121626

Stacked Borrows violation in macOS RwLock #121626

Comments

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

RalfJung commented Feb 26, 2024

apiraino commented Feb 26, 2024

RalfJung commented Feb 26, 2024 • edited Loading

RalfJung commented Feb 26, 2024 • edited Loading

RalfJung commented Feb 26, 2024

RalfJung commented Mar 2, 2024 • edited Loading

RalfJung commented Mar 3, 2024 • edited Loading

joboet commented Mar 3, 2024

RalfJung commented Mar 3, 2024

RalfJung commented Feb 26, 2024 •

edited

Loading

RalfJung commented Feb 26, 2024 •

edited

Loading

RalfJung commented Mar 2, 2024 •

edited

Loading

RalfJung commented Mar 3, 2024 •

edited

Loading