Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: kernel: scheduler: Test from kernel.scheduler.slice_perthread fails on some nrf platforms #43975

Closed
PerMac opened this issue Mar 18, 2022 · 5 comments
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Waiting for response Waiting for author's response

Comments

@PerMac
Copy link
Member

PerMac commented Mar 18, 2022

Describe the bug
A test case test_priority_cooperative fails (not providing output/hangs) on
-nrf52dk_nrf52832
-nrf5340dk_nrf5340_cpuapp
-nrf5340dk_nrf5340_cpuapp_ns
-nrf9160dk_nrf9160
test scenario kernel.scheduler.slice_perthread then fail with a timeout (tests/kernel/sched/schedule_api/).

To Reproduce
Steps to reproduce the behavior:

  1. run scripts/twister -s tests/kernel/sched/schedule_api/kernel.scheduler.slice_perthread --platform nrf52dk_nrf52832 --device-serial /dev/ttyACM0 --device-testing --verbose --inline-logs --west-flash
  2. See error

Expected behavior
Scenario passes

Impact
Not clear

Logs and console output

INFO    - 1/1 nrf52dk_nrf52832          tests/kernel/sched/schedule_api/kernel.scheduler.slice_perthread FAILED Timeout (device 187.135s)
INFO    - /home/maciej/zephyrproject2/zephyr/twister-out/nrf52dk_nrf52832/tests/kernel/sched/schedule_api/kernel.scheduler.slice_perthread/handler.log
ERROR   - *** Booting Zephyr OS build zephyr-v3.0.0-1332-g574e166e1357  ***
Running test suite threads_scheduling
===================================================================
START - test_bad_priorities
PASS - test_bad_priorities in 0.1 seconds
===================================================================
START - test_priority_cooperative

INFO    - /home/maciej/zephyrproject2/zephyr/twister-out/nrf52dk_nrf52832/tests/kernel/sched/schedule_api/kernel.scheduler.slice_perthread/handler.log

INFO    - 0 of 1 test configurations passed (0.00%), 1 failed, 0 skipped with 0 warnings in 213.11 seconds

Environment (please complete the following information):

  • OS: Ubuntu 18.04
  • Toolchain Zephyr SDK
  • Commit SHA zephyr-v3.0.0-1332-g574e166e1357

Additional context
Tests passes on nrd52840dk_nrf52840, nrf5340dk_nrf5340_cpunet and nrf9160dk_nrf9160_ns

@PerMac PerMac added the bug The issue is a bug, or the PR is fixing a bug label Mar 18, 2022
@nashif nashif added priority: medium Medium impact/importance bug area: Kernel labels Mar 18, 2022
@andyross
Copy link
Contributor

Hm... that is an extremely simple test case. It just spawns a thread at a lower priority, verifies that it doesn't preempt, sleeps, then sees that it ran. It's kinda hard to believe we really have a hardware-specific edge case here, even more so one that causes a silent hang. I'm thinking more like "board-specific kconfig breaking assumptions in the test". Can someone with hardware go in and add some instrumentation (e.g. just printk("at %s:%d\n", __func__, __LINE__); before each step in the test, etc...) and see if we can see what's going on. Also: what happens if you comment out this test case? Are other cases in the same test broken on this hardware or just this one?

Absent that, I guess I need to get my hands on the hardware. Can someone link me the most appropriate kit to order?

@PerMac
Copy link
Member Author

PerMac commented Mar 21, 2022

Thanks @andyross for your tips. It seems that this is the only test causing issues. Other passes without that one. It also seems that this line zassert_true(last_prio == k_thread_priority_get(k_current_get()), NULL); k_sleep(K_MSEC(100)); is hanging and no output is provided afterwards https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/sched/schedule_api/src/test_sched_priority.c#L63. I got the output just before but not just after.
I think I won't help you with what to order, but I can help with debugging. I can test/debug your ideas on the hw. You can also ping me on discord if you'd like to sync up

@andyross
Copy link
Contributor

This may take a while. :) The first thing I'd check would be to remove the other test cases from the test suite (just comment them out, etc...) to eliminate the possibility that this is an interaction with some previous test on those boards.

Next is generally "double all the stack sizes" (e.g. "grep STACK build/zephyr/.config" and make them all bigger in prj.conf) to rule out stack overflows. It's becoming less and less common, but this remains a super cheap test that catches bugs that are otherwise really opaque.

FWIW: a hang in k_sleep() generally implies something got messed up in the timer driver. We might look there to see if there is an overflow condition or something on these devices (do they have a different clock configuration, maybe? I know nRF devices have the choice of a few different drivers).

Oh, and again: if you have a reference to a board I can order from DigiKey/Mouser or wherever that would be helpful.

@nashif nashif added the Waiting for response Waiting for author's response label Apr 11, 2022
@andyross
Copy link
Contributor

Process demands we close this for now, pending response. Please reopen if it persists.

@PerMac
Copy link
Member Author

PerMac commented Apr 19, 2022

@andyross Sorry for the hassle and not replying. I wanted to collect some more input about the failures, but I couldn't reproduced them locally and in the meantime doing restructuring in our hw CI. If I manage to reproduce the error I will come back to you with more details. Thanks and sorry again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Waiting for response Waiting for author's response
Projects
None yet
Development

No branches or pull requests

3 participants