Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: os: ring_buffer: recent changes cause UART shell to fail on qemu_cortex_a9 #43818

Closed
ibirnbaum opened this issue Mar 15, 2022 · 4 comments
Closed
Assignees
Labels
area: ARM ARM (32-bit) Architecture area: Base OS Base OS Library (lib/os) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@ibirnbaum
Copy link
Member

Description

After a recent rebase from 2.7.99 and afterwards taking some time to first built an application containing the shell with UART backend, it now turns out that the shell has become non-functional via the UART backend on qemu_cortex_a9, more specifically the input side, interrupt-driven output still works as it did before.

The first interrupt-driven reception of a typed character received via UART works, the character is being printed on the command line. Before and after the reception of this one character, the system will not report any errors or behave erratically if the shell is left alone for a longer period of time, so the issue isn't driven by some system component's timing. However, erratic behaviour starts once a second character is received. The received character is not printed on the command line, instead, the warning message "shell_uart: RX ring buffer full." is displayed repeatedly. From that point on, this warning is displayed several times whenever a character is received. No further characters are added to the command line, but successful reception by the device driver is hinted at via the warning messages which only appear once a character is typed.

Of all the components involved, the UART device driver, the UART shell backend, and the os/lib/ring_buffer component used by the UART shell backend, only the latter has been modified recently. Using a JTAG debugger, it can be observed that from the reception of the 2nd character on, the size/offset calculations for the sh_uart->rx_ringbuf return results that don't look plausible.

For example, during the first run of ring_buf_put_claim for the rx_ringbuf, the free space in the buffer is calculated as 64 bytes, which is the value returned to the caller uart_rx_handle. During the 2nd run, the free space suddenly decreases to 1 byte, and a 'put_base adjustment' marked as 'unlikely' is performed every single time. In all runs of ring_buf_put_claim, the array index for the caller to write to, buffer[put_head-base], is calculated as 0, which doesn't seem plausible for a 64-byte buffer. The buffer's behaviour also seems strange in conjunction with the while-loop exit condition in uart_rx_handle in subsys/shell/backends/shell_uart.c (simplified):

do {
len = ring_buf_put_claim;
if (len > 0) rd_len = uart_fifo_read;
ring_buf_put_finish;
} while (rd_len && (rd_len == len));

If the buffer returns 1 in ring_buf_put_claim starting with the 2nd execution after the first execution returned 64, as long as 1 byte each is received from the UART via uart_fifo_read, this loop would continue until no data was left. Yet, as the received character isn't processed within the loop, the call to ring_buf_put_claim returns 0 during the 2nd run of the loop, leading to the "buffer full" warning message being printed and breaking out of the loop, as both len and rd_len evaluate to 0 once the while-statement is reached. For rd_len > 1, the loop will exit if len == 1, and during the first run, where the buffer returns len as 64, the loop exits after the first run (and 1 byte received) unless 64 bytes are received via the first uart_fifo_read call.

In conclusion, I believe that something isn't right about the value returned by ring_buf_put_claim and how it's calculated. Those values don't seem plausible and there is no reason for a full buffer warning and the console's input side stalling after the reception of 1 byte of data.

I could verify that upon first access, all members of the rx_buffer are zeroed except the (valid) buffer address pointed to by .buffer, and the .size member is correctly initialized to 64. I can provide the intermediate values of each run of ring_buf_put_claim/ring_buf_put_finish if this is helpful.

To Reproduce
Build a sample including the shell for qemu_cortex_a9 with standard settings except for disabling optimization in the compiler settings. My latest test build which produced the error was a build of samples/net/sockets/echo_server.

Expected behavior
Interrupt-driven reception of input bytes for the command line should work via the UART shell backend.

Impact
UART-based console is effectively unusable and limited to diagnostic output only.

Environment (please complete the following information):
Built on a Linux host, Ubuntu 18.04, Zephyr SDK 0.13.2, Zephyr 3.0.0-446-g3470ace98d4d

Additional context
This behaviour is also reproducible on a non-public Zynq-7000 based target hardware, which makes it highly unlikely that the issue is related to the Xilinx UART or how it's simulated in qemu_cortex_a9.

@ibirnbaum ibirnbaum added the bug The issue is a bug, or the PR is fixing a bug label Mar 15, 2022
@nashif
Copy link
Member

nashif commented Mar 16, 2022

@npitre FYI

@npitre
Copy link
Collaborator

npitre commented Mar 16, 2022

What Zephyr version are you using? In particular, what commit ID?

This looks like the same bug as #43205. If so the fix is in #43226 and
applied to the main branch already.

I just tried the following (not to bother with a networking setup):

west build -b qemu_cortex_a9 samples/subsys/shell/devmem_load

and the shell works as expected. Please confirm if this works for you.

@carlescufi
Copy link
Member

@ibirnbaum could you please give us additional info as per @npitre's comments?

@carlescufi carlescufi added priority: low Low impact/importance bug area: ARM ARM (32-bit) Architecture area: Base OS Base OS Library (lib/os) labels Mar 23, 2022
@ibirnbaum
Copy link
Member Author

Sorry for the late reply, I''ve been bugfixing other stuff...
It has in fact turned out that I was behind the commit that fixed the issue for all platforms including qemu_cortex_a9.
The observed behaviour is gone, the console works as it should, closing this issue as it no longer exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Base OS Base OS Library (lib/os) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants