Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Peripheral singletons are not multi-core friendly #149

Open
japaric opened this issue Jun 19, 2019 · 2 comments
Open

[discussion] Peripheral singletons are not multi-core friendly #149

japaric opened this issue Jun 19, 2019 · 2 comments

Comments

@japaric
Copy link
Member

japaric commented Jun 19, 2019

Background

Private Peripheral Bus (PPB)

All the peripherals in the cortex_m::Peripherals struct sit on the Private
Peripheral Bus and have addresses of the form 0xE00x_xxxx. Most of these
peripherals are interfaces to internal resources -- there's one instance of
those resources per core. Examples of internal PPB peripherals are the
NVIC, MPU, SYST (system timer), ITM, etc. The rest of peripherals are
interfaces to external resources meaning that all cores access the same
resources through these interfaces. An example of an external resource is the
TPIU.

The bottom line here is that interacting with a singleton like CPUID on one
core is different from interacting with it from another core even though they
are supposed to be the same singleton. For example, the CPUID.base.read()
operation can return different values depending on which core is executed.

Multi-core API

In RTFM land we are exploring multi-core applications in two modes: homogeneous
mode and heterogeneous mode. Of course, it's very early days so we still don't
know what the ecosystem will adopt but these APIs let us show the problems with
the peripheral singletons.

Homogeneous

In this mode a binary crate is compiled using a single (compilation) target and
a single ELF image is produced. The image contains the entry points for all
cores and static variables have global visibility (visible to all cores) by
default.

#![no_std]
#![no_main]

// visible to all cores
static X: AtomicBool = AtomicBool::new(false);

// core #0 user entry point
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    // ..
}

// core #1 user entry point
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    // ..
}

If this program is compiled for the thumbv8m.main-none-eabi then the resulting
image can be flashed, for example, on a 2x Cortex-M33 device.

Heterogeneous

In this mode a binary crate is compiled for multiple (compilation) targets so
multiple ELF images are produced. There's one image for each core and each image
contains the entry point and static variables for that core. There's a special
linker section named .shared used to share static variables between cores:
make them visible to all cores. One opts into this .shared section using the
#[shared] attribute; the default is that static variables are visible only
to the core where it was defined.

#![no_std]
#![no_main]

// visible to all cores
#[shared]
static X: AtomicBool = AtomicBool::new(false);

// visible only to core #0
#[cfg(core = "0")]
static Y: AtomicBool = AtomicBool::new(false);

// each core has a *copy* of this static variable
static Z: AtomicBool = AtomicBool::new(false);

// core #0 user entry point
#[cfg(core = "0")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
    // ..
}

// core #1 user entry point
#[cfg(core = "1")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
    // ..
}

If this program is compiled for the thumbv7em-none-eabihf and
thumbv6m-none-eabi targets then the resulting image can be flashed on a
Cortex-M4F + Cortex-M0+ device, for example.

Issues

A. Send is wrong

By definition, Send means that it's (memory) safe to transfer ownership of a
resource from one thread / core to another. In the case of these peripheral
singletons transferring them from one core to another is wrong because that
changes the meaning of the value.

// homogeneous mode

use cortex_m::peripheral::DWT;

// used as a channel
static X: spin::Mutex<Option<DWT>> = Mutex::new(None);

#[no_mangle]
fn main_0() -> ! {
    let p: cortex_m::Peripherals = ..;

    // this refers to core #0's DWT
    let dwt = p.DWT;
    *X.lock() = Some(dwt);

    // ..
}

#[no_mangle]
fn main_1() -> ! {
    loop {
        if let Some(x) = X.lock().take() {
            // now this refers to core _#1_'s DWT
            let dwt: DWT = x;
        }
    }

    // ..
}

The other issue with Send is that makes it possible to break the singleton
invariant: one core can send another instance of e.g. DWT to a core that
already has one such instance.

B. take() is unsound

The current implementation of cortex_m::Peripherals::take looks like this:

static mut CORE_PERIPHERALS: bool = false;

impl Peripherals {
    pub fn take() -> Option<Self> {
        interrupt::free(|_| {
            if unsafe { CORE_PERIPHERALS } {
                None
            } else {
                Some(unsafe { Peripherals::steal() })
            }
        })
    }

}

This is unsound in homogeneous multi-core mode because interrupt::free doesn't
synchronize multi-core access to static mut variables; it only synchronizes
accesses from the same core.

Using AtomicBool.compare_swap or a similar API would make this multi-core
memory safe but that would not work on ARMv6-M because that CAS API doesn't
exist on thumbv6m-none-eabi.

C. take() is wrong

The following program panics in homogeneous multi-core mode but should work.

#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    let p = cortex_m::Peripherals::take().unwrap();
    let now = p.DWT.cyccnt.read();

    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    let p = cortex_m::Peripherals::take().unwrap();
    let now = p.DWT.cyccnt.read();

    // ..
}

This panics because both Peripherals::take are accessing the same guard.
However, it is OK for each core to access its own DWT peripheral /
cycle counter instance.

Potential countermeasures

!Send

To avoid issue (A) we could remove the Send implementation from all the
peripheral singletons. The downside of this approach is we would also forbid
sending a peripheral singleton from main or a interrupt handler to another
within the same core.

LocalSend

Another alternative to avoid (A) is to remove the Send implementation from the
singletons and instead implement a new LocalSend trait that means safe to send
within execution contexts running on the same core. Then frameworks like RTFM
can require the LocalSend for message passing within one core and the Send
trait for cross-core message passing.

The downside of this approach is that it requires using the nightly channel
because auto traits, which are required to bridge Send and LocalSend,
are unstable.

// crate: local-send
pub unsafe auto trait LocalSend {}

// all cross-core Send-safe types are also core-local Send safe
unsafe impl<T> LocalSend for T where T: Send {}

// crate: cortex-m
pub struct DWT {
    _not_send: PhantomData<*mut ()>,
}

impl !Send for DWT {}
unsafe impl LocalSend for DWT {}

Core-local take

One way to deal with (B) and (C) is to have one guard static variable per
core. Assuming a dual core system take would be written as follows:

// crate: cortex-m
// for core #0
static mut PERIPHERALS0: bool = false;
// for core #1
static mut PERIPHERALS1: bool = false;

impl Peripherals {
   pub fn take() -> Option<Self> {
       interrupt::free(|_| unsafe {
          let guard = if core_id() == 0 {
              &mut PERIPHERALS0
          } else {
              &mut PERIPHERALS1
          };

          if *guard {
              None
          } else {
              *guard = true;
              Some(Peripherals { .. })
          }
       })
   }
}

fn core_id() -> u8 {
    // returns `0` on core #0
    // returns `1` on core #1
}

// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    let p = Peripherals::take().unwrap();
    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    let p = Peripherals::take().unwrap();
    // ..
}

The problem is implementing core_id in homogeneous mode. AFAICT, there's no
Cortex-M memory mapped register that returns a "core id" (cf. RISC-V mhartid);
nor there is a processor register that can be used to hold a "core id" (cf.
RISC-V registers: x3 (global pointer) and x4 (thread pointer)) -- though one
could use the usually unused PSP (Process Stack Pointer) register for this
purpose.

Global singletons

A completely different approach to peripheral access that avoids the three
aforementioned issues is a global singleton API. For example:

// crate: cortex-m
use bare_metal::CriticalSection;

// A global singleton
pub struct DWT;

// the actual peripheral
pub struct DWT_ {
    _not_send_or_sync: PhantomData<*mut ()>,
    pub cyccnt: CYCCNT,
    // .. other registers ..
}

impl DWT_ {
    // NOTE private
    unsafe fn new() -> Self {
        ..
    }
}

impl DWT {
    /// Grants temporary, synchronized access to the DWT peripheral
    pub fn borrow(cs: &CriticalSection, f: impl FnOnce(&DWT_)) {
        unsafe { f(&DWT_::new()) }
    }

    /// Grants temporary, unsynchronized access to the DWT peripheral
    pub unsafe fn borrow_unchecked(f: impl FnOnce(&DWT_)) {
        f(&DWT_::new())
    }
}

// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    interrupt::free(|cs| {
        DWT::borrow(cs, |dwt| {
            // accesses its own DWT
            dwt.cyccnt.write(0);
        });
    });

    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    interrupt::free(|cs| {
        DWT::borrow(cs, |dwt| {
            // accesses its own DWT
            dwt.cyccnt.write(0);
        });
    });

    // ..
}

The downside of this approach is that because there's no ownership it's hard to
build abstractions on top of peripherals. One could add a panicky API to seal
peripherals:

impl DWT {
    /// Seals this peripheral; all future calls to `borrow` will panic
    ///
    /// This function panics if it's called twice
    pub fn seal(cs: &CriticalSection) {
        // ..
    }
}

Ownership could be emulated by first sealing the peripheral and then having the
abstraction access the peripheral exclusively through the borrow_unchecked
API.

pub struct Timer {
    // not Send because this semantically owns `SYST_` which is also not `Send`
    _not_send: PhantomData<* mut()>,
}

impl Timer {
    pub fn new() -> Self {
        // this operation effectively turns this type into an owned singleton
        SYST::seal();

        Self { _private: () }
    }

    pub fn set_timeout(&mut self, dur: Duration) {
         unsafe {
            SYST::borrow_unchecked(|syst| {
                // ..
            })
         }
    }
}

Internal vs external resources

In the case of external resources like the TPIU I think we want to keep the
existing owned singleton / take API, with the semantics that only one core can
take these peripherals, because more than one core should not access these
resources in an unsynchronized fashion.

@thejpster
Copy link
Contributor

AFAICT, there's no Cortex-M memory mapped register that returns a "core id"

Yeah, on the homogenous dual Cortex-M4 part of my Beagleboard X15 demo, I found the TI example using the Cortex-M4 Peripheral ID 0 register to determine which core was running. See https://github.com/cambridgeconsultants/rust-beagleboardx15-demo/blob/master/bare-metal/ipu-demo/src/main.rs#L435. Unfortunately it's vendor defined as to what they put in there :/ Perhaps it's something we can bring out in the chip crates, instead of being able to solve it for all Cortex-M cores.

The 'core local take' would work if the lock variable was in a core-specific address space, like the peripheral, right? So, like a thread-local variable. I think the AM5728 had a block of RAM for each core, each mapped to the same address range, but I don't know if that's a universal thing or just an oddity of this implementation. You'd also need to ensure each core initialised its own memory correctly.

@jonas-schievink
Copy link
Contributor

I think the current implementations of Send are already correct. In standard Rust, Send models transferring data between threads within the same process (which have access to all of its resources), which is akin to moving between interrupt handlers running on the same core, or between cores that share a single set of resources (that is, all memory and peripherals).

Since most (all?) multi-core MCUs have core-local peripherals or memory, transferring objects between the cores is more similar to cross-process communication in standard Rust and should not be modeled with Send, but a custom trait.

adamgreig pushed a commit that referenced this issue Jan 12, 2022
149: reject duplicate `static mut` variables r=therealprof a=japaric

after #140 landed the entry, exception and interrupt attributes started
accepting code like this:

``` rust
 #[entry]
fn main() -> ! {
    static mut FOO: u32 = 0;
    static mut FOO: i32 = 0;
}
```

because that code expands into:

``` rust
fn main() -> ! {
    let FOO: &'static mut u32 = unsafe {
        static mut FOO: u32 = 0;
        &mut FOO
    };
    // shadows previous variable
    let FOO: &'static mut u32 = unsafe {
        static mut FOO: i32 = 0;
        &mut FOO
    };
}
```

this commit adds a check that rejects `static mut`s with duplicated names to
these three attributes.

Co-authored-by: Jorge Aparicio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants