Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus Fault with Xilinx UART Lite #45302

Closed
yashi opened this issue May 3, 2022 · 0 comments · Fixed by #45460
Closed

Bus Fault with Xilinx UART Lite #45302

yashi opened this issue May 3, 2022 · 0 comments · Fixed by #45460
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@yashi
Copy link
Collaborator

yashi commented May 3, 2022

Describe the bug

Xilinx AXI UART Lite v2.0 has the following clause for both RX and TX FIFO (emphasis mine):

When a read request is issued to an empty FIFO, a bus error (SLVERR) is generated and the result is undefined.
When a write request is issued while the FIFO is full, a bus error (SLVERR) is generated and the data is not written into the FIFO.

To avoid this we have xlnx_uartlite_read_status(dev) & STAT_REG_RX_FIFO_VALID_DATA and xlnx_uartlite_read_status(dev) & STAT_REG_TX_FIFO_FULL but this is not enough for multi-threaded app.

Consider this:

Scheduler App 1 App 2
xlnx_uartlite_poll_out()
xlnx_uartlite_read_status()
& STAT_REG_TX_FIFO_FULL => true
swap
xlnx_uartlite_poll_out()
xlnx_uartlite_read_status()
& STAT_REG_TX_FIFO_FULL => true
swap
xlnx_uartlite_write_tx_fifo()
swap
xlnx_uartlite_write_tx_fifo()

Two apps might be not enough but the running samples/philosophers generates the error messages below.

To Reproduce

  1. rm -rf build && west build -b arty_a7_arm_designstart_m3 zephyr/samples/philosophers -- -DDEBUG_PRINTF=1 -DCONFIG_LOG=y
  2. west flash
  3. wait for a few minutes

Expected behavior

samples/philosophers runs without a bus error.

Impact
showstopper for apps with high uart usage

Logs and console output

Philosopher 1 [P: 2]  THINKING [  75 ms ] 
Philosopher 0 [P: 3]   EATING  [  50 ms ] 
Philosopher 0 [P: 3]    DROPPED ONE FORK   
Philosopher 0 [P: 3]  THINKING [  75 ms ] 
Philosopher 1 [P: 2]        STARVING       
Philosopher 1 [P: 2]    HOLDING ONE FORK   
Philosopher 2 [P: 1]    DROPPED ONE FORK   
Philosopher 2 [P: 1]  THINKING [  175 ms Philosopher 3 [P: 0]        STARVING     Philosopher 4 [C:-1]    DROPPED ONE FORK   
Philosopher 4 [C:-1]  THINKING [  400 ms ] 
[00:35:44.314,000] <err> os: ***** BUS FAULT *****
[00:35:44.314,000] <err> os:   Imprecise data bus error
[00:35:44.314,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000002  r2/a3:  0x00007be8
[00:35:44.314,000] <err> os: r3/a4:  0x00002cb5 r12/ip:  0x00000000 r14/lr:  0x00002ccb
[00:35:44.314,000] <err> os:  xpsr:  0x41004000
[00:35:44.314,000] <err> os: Faulting instruction address (r15/pc): 0x00002cda
[00:35:44.314,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:35:44.314,000] <err> os: Current thread: 0x7510 (Philosopher 3)
[00:35:44.338,000] <err> os: Halting system

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK
  • Commit SHA: zephyr-v3.0.0-3283-g83c79d10510

Additional context

It's easier to disable the whole interrupt by irq_lock() / irq_unlock() while in poll_out and poll_in, or more precisely right before calling xlnx_uartlite_read_status() to after xlnx_uartlite_write_tx_fifo(dev, c);. But that might be too long for the system interrupts to be disabled, especially with a slow baud rate.

Should we disable the scheduler instead?

Or does it make sense to do something like:

while (true) {
    key = irq_lock();
    if (!(read_status() & STAT_REG_TX_FIFO_FULL)) {
        write_tx_fifo();
        irq_unlock(key);
        break;
    }
    irq_unlock(key)
}
@yashi yashi added the bug The issue is a bug, or the PR is fixing a bug label May 3, 2022
@mbolivar-nordic mbolivar-nordic added the priority: low Low impact/importance bug label May 3, 2022
yashi added a commit to yashi/zephyr that referenced this issue May 12, 2022
Xilinx AXI UART Lite v2.0[1] has the following clause for both RX and TX
FIFO respectively:

    When a read request is issued to an empty FIFO, a bus error (SLVERR) is
    generated and the result is undefined.

    When a write request is issued while the FIFO is full, a bus
    error (SLVERR) is generated and the data is not written into the FIFO.

To protect this, we have:

    xlnx_uartlite_read_status(dev) & STAT_REG_RX_FIFO_VALID_DATA, and
    xlnx_uartlite_read_status(dev) & STAT_REG_TX_FIFO_FULL

but these are not enough for multi-threaded apps.  Consider two threads
calling poll_out(), it is always possible for a thread to be swapped out
right after reading the status register, the other thread fill the TX FIFO,
and the original thread is swapped back to write more data to the FIFO
because previously read status doesn't indicate the FIFO is full.

To close this race condition, this commit uses a spinlock for each FIFO.
This ensures that only one thread accesses the FIFO even for SMP cases.

This closes zephyrproject-rtos#45302.

[1] https://docs.xilinx.com/v/u/en-US/pg142-axi-uartlite

Signed-off-by: Yasushi SHOJI <[email protected]>
nashif pushed a commit that referenced this issue May 12, 2022
Xilinx AXI UART Lite v2.0[1] has the following clause for both RX and TX
FIFO respectively:

    When a read request is issued to an empty FIFO, a bus error (SLVERR) is
    generated and the result is undefined.

    When a write request is issued while the FIFO is full, a bus
    error (SLVERR) is generated and the data is not written into the FIFO.

To protect this, we have:

    xlnx_uartlite_read_status(dev) & STAT_REG_RX_FIFO_VALID_DATA, and
    xlnx_uartlite_read_status(dev) & STAT_REG_TX_FIFO_FULL

but these are not enough for multi-threaded apps.  Consider two threads
calling poll_out(), it is always possible for a thread to be swapped out
right after reading the status register, the other thread fill the TX FIFO,
and the original thread is swapped back to write more data to the FIFO
because previously read status doesn't indicate the FIFO is full.

To close this race condition, this commit uses a spinlock for each FIFO.
This ensures that only one thread accesses the FIFO even for SMP cases.

This closes #45302.

[1] https://docs.xilinx.com/v/u/en-US/pg142-axi-uartlite

Signed-off-by: Yasushi SHOJI <[email protected]>
laxiLang pushed a commit to laxiLang/zephyr that referenced this issue May 30, 2022
Xilinx AXI UART Lite v2.0[1] has the following clause for both RX and TX
FIFO respectively:

    When a read request is issued to an empty FIFO, a bus error (SLVERR) is
    generated and the result is undefined.

    When a write request is issued while the FIFO is full, a bus
    error (SLVERR) is generated and the data is not written into the FIFO.

To protect this, we have:

    xlnx_uartlite_read_status(dev) & STAT_REG_RX_FIFO_VALID_DATA, and
    xlnx_uartlite_read_status(dev) & STAT_REG_TX_FIFO_FULL

but these are not enough for multi-threaded apps.  Consider two threads
calling poll_out(), it is always possible for a thread to be swapped out
right after reading the status register, the other thread fill the TX FIFO,
and the original thread is swapped back to write more data to the FIFO
because previously read status doesn't indicate the FIFO is full.

To close this race condition, this commit uses a spinlock for each FIFO.
This ensures that only one thread accesses the FIFO even for SMP cases.

This closes zephyrproject-rtos#45302.

[1] https://docs.xilinx.com/v/u/en-US/pg142-axi-uartlite

Signed-off-by: Yasushi SHOJI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants