Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cortexm_common: Make no_thread_idle compatible with cortex-m0 #14557

Merged
merged 2 commits into from
Sep 23, 2020

Conversation

bergzand
Copy link
Member

@bergzand bergzand commented Jul 20, 2020

Contribution description

This PR modifies the cortexm thread specific bits to allow running the PendSV IRQ permanently at a lower priority than the other IRQs. With this, the no_thread_idle feature is extended to cover the cortex-m0(+).

Interrupts are disabled during sched_run to prevent the scheduler bits from being modified during the scheduling algorithm. They are briefly enabled after the sleep in sched_arch_idle so that pending interrupts can be serviced after the CPU returns from sleeping.

Testing procedure

Same as #14224, but now with the samr21-xpro (or other cortex-m0) included in the tests. Everything should still work as usual.

Benchmarks

Early benchmarks shows that this decreases raw thread switch performance by about 5%:

samr21-xpro on Master
2020-07-20 11:29:46,590 # START
2020-07-20 11:29:46,595 # main(): This is RIOT! (Version: 2020.10-devel-264-ge5d69-HEAD)
2020-07-20 11:29:46,597 # main starting
2020-07-20 11:29:47,598 # { "result" : 65217 }
2020-07-20 11:31:46,644 # Exiting Pyterm
samr21-xpro on this PR
2020-07-20 11:29:25,821 # START
2020-07-20 11:29:25,829 # main(): This is RIOT! (Version: 2020.10-devel-266-gcf82c3-pr/cortexm_common/pendsv_priority)
2020-07-20 11:29:25,830 # main starting
2020-07-20 11:29:26,832 # { "result" : 62176 }
2020-07-20 11:29:29,660 # Exiting Pyterm

Issues/PRs references

None

@bergzand bergzand added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: cpu Area: CPU/MCU ports labels Jul 20, 2020
@bergzand bergzand requested a review from kaspar030 July 20, 2020 09:46
@bergzand
Copy link
Member Author

Additional benchmarks using bench_mutex_pingpong:

  • same54-xpro: 254777 to 251572
  • Nucleo-f103rb: 114078 to 113897

After investigating a bit more on the samr21-xpro, the two extra instructions in the PendSV handler drop the bench_mutex_pingpong from 65217 to 64690. Then enabling the no_thread_idle feature drops the benchmark result to 62176 due to the extra logic in sched_run(). The sched_arch_idle function doesn't seem to be called during the benchmark.

@@ -462,8 +467,6 @@ void sched_arch_idle(void)
* According to [this](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHJICIE.html),
* dynamically changing the priority is not supported on CortexM0(+).
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is obviously outdated now

@@ -107,7 +107,7 @@ extern "C" {
* If you want to set this, define it in your `cpu_conf.h`.
*/
#ifndef CPU_CORTEXM_PRIORITY_GROUPING
#define CPU_CORTEXM_PRIORITY_GROUPING (0)
#define CPU_CORTEXM_PRIORITY_GROUPING (1)
Copy link
Contributor

@benpicco benpicco Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment says

A currently experimental way to make PendSV on Cortex-M to run last is to

  • make sure @ref CPU_DEFAULT_IRQ_PRIO is even (e.g. 6U),
  • set @ref CPU_CORTEXM_PENDSV_IRQ_PRIO to one higher, and
  • set this one

But when checking, most implementations (all but nrf52) will set

#define CPU_DEFAULT_IRQ_PRIO            (1U)

in cpu_conf.h, violating this advice.

(Which begs the question: Why not also move the definition of CPU_DEFAULT_IRQ_PRIO to cpu_conf_common.h?)

@bergzand
Copy link
Member Author

More measurement of this Pre-PR versus Post-PR on the samr21-xpro using bench_mutex_pingpong:

Pre Post impact
GCC 65217 62176 4.7%
GCC+LTO 68279 65217 4.5%
LLVM 64000 61224 4.3%

LLVM+LTO didn't compile here.

@kaspar030 kaspar030 self-assigned this Aug 25, 2020
Comment on lines 470 to 471
__DSB();
__ISB();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't be necessary anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

@kaspar030
Copy link
Contributor

IIUC CPU_CORTEXM_PRIORITY_GROUPING sets the number of bits that are "ignored" (make two IRQs the same priority). So if set to 1, irq prio 0 and 1 would be the same, or 2 and 3.
So having CPU_DEFAULT_IRQ_PRIO at 1, CPU_CORTEXM_PRIORITY_GROUPING at 1 and CPU_CORTEXM_PENDSV_IRQ_PRIO at (CPU_DEFAULT_IRQ_PRIO + 1) should actually do what's expected (putting pendsv in a lower prio group than CPU_DEFAULT_IRQ_PRIO).

But wouldn't it make more sense to leave the grouping alone (keep at 0 bits)? Otherwise, the assumptions break if a CPU overrides CPU_DEFAULT_IRQ_PRIO to a non-even value.

@bergzand
Copy link
Member Author

IIUC CPU_CORTEXM_PRIORITY_GROUPING sets the number of bits that are "ignored" (make two IRQs the same priority). So if set to 1, irq prio 0 and 1 would be the same, or 2 and 3.
So having CPU_DEFAULT_IRQ_PRIO at 1, CPU_CORTEXM_PRIORITY_GROUPING at 1 and CPU_CORTEXM_PENDSV_IRQ_PRIO at (CPU_DEFAULT_IRQ_PRIO + 1) should actually do what's expected (putting pendsv in a lower prio group than CPU_DEFAULT_IRQ_PRIO).

You're right. The only thing here is that the __NVIC_SetPriority() already takes the number of priority bits into account. For example, the cortex-m0+ only has 4 priority levels, 0, 64, 128 and 192 (not counting the negative priorities). When setting the priority to 1, it is shifted to 64 by the __NVIC_SetPriority() function. Setting the CPU_CORTEXM_PRIORITY_GROUPING to 1 makes the lower 2 bits of the priority into the subgroup number, so this doesn't affect anything, luckily.

But wouldn't it make more sense to leave the grouping alone (keep at 0 bits)?

Yes!

If we always want to have the PendSV interrupt at the highest IRQ priority value, we can set it to 255 and let the hardware truncate it to the max value.

@bergzand bergzand force-pushed the pr/cortexm_common/pendsv_priority branch from 9062901 to 56ff5f8 Compare August 25, 2020 11:47
@bergzand
Copy link
Member Author

rebased!

@bergzand bergzand force-pushed the pr/cortexm_common/pendsv_priority branch from 9ab3b3e to 99d4338 Compare September 12, 2020 19:38
@bergzand
Copy link
Member Author

Rebased again to current master

@kaspar030
Copy link
Contributor

Please squash. I'm giving this another test run, codewise, LGTM.

@@ -317,7 +317,10 @@ void __attribute__((naked)) __attribute__((used)) isr_pendsv(void) {
#endif
"push {lr} \n" /* push exception return code */

/* Disable interrupts during scheduling */
"cpsid i \n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move this comment after the next line? or would it get too long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded it a bit to make it fit :)

@bergzand bergzand force-pushed the pr/cortexm_common/pendsv_priority branch from d14f2bd to 3e3f066 Compare September 22, 2020 08:52
@bergzand
Copy link
Member Author

Squashed!

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

This extends the rather hacky previous no-idle code to a more straight forward implementation that also works on smaller Cortex-M.

Those take a ~5% performance hit in synthetic context switching benchmarks.
The benefits are:

  • saving 256b of idle stack RAM
  • faster thread-sleep-thread transitions (if switching to the same thread)
  • with PendSV at low priority, all Cortex-M benefit when multiple threads are triggering PendSV, as sched_run would only run once

I consider the benefits outweighing the performance drop. And, re-enabling the idle thread is still possible by disabling the no-idle-thread feature.

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Sep 22, 2020
@bergzand
Copy link
Member Author

Amended in the Kconfig selector 🙄

@benpicco
Copy link
Contributor

Murdock is still not happy. :(

@bergzand
Copy link
Member Author

Murdock is still not happy. :(

All run tests, I'll check if I can reproduce those locally

@bergzand
Copy link
Member Author

libhydrogen is also failing currently on master, see #14754 (comment) (and the nightlies).
I guess the stack requirement with this increased slightly and now causes an overflow with periph_timer_periodic.
utensor triggers an actual hard fault in sched_run() 😲

@bergzand
Copy link
Member Author

bergzand commented Sep 22, 2020

utensor triggers an actual hard fault in sched_run() astonished

I can reproduce this failure on current master.

I guess the stack requirement with this increased slightly and now causes an overflow with periph_timer_periodic.

This test does a printf inside the IRQ callback. It looks easy enough to change that to a bit of fmt code.

@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 23, 2020
This modifies the cortex-m thread specifics to allow running the PendSV
interrupt continuously at lower priority and removes the priority
modifications during the interrupt itself. Interrupts are disabled
during the scheduling itself, but enabled briefly after the sleep to
ensure that they are handled if activated during the scheduling or the
sleep.
@bergzand bergzand force-pushed the pr/cortexm_common/pendsv_priority branch from da2fb6c to eb73515 Compare September 23, 2020 09:02
@bergzand
Copy link
Member Author

Rebased this to include the commits from #15061 (merged) in here. Makes testing a bit easier

@bergzand
Copy link
Member Author

All build failures should now be unrelated to this PR.

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 23, 2020
@kaspar030
Copy link
Contributor

Removed "CI: run tests" to prevent unrelated build failures, the samr21-xpro:llvm tests also fail on master.

@kaspar030
Copy link
Contributor

&go.

@kaspar030 kaspar030 merged commit 97a0ba3 into RIOT-OS:master Sep 23, 2020
@bergzand bergzand deleted the pr/cortexm_common/pendsv_priority branch September 23, 2020 09:52
@bergzand
Copy link
Member Author

Awesome, thanks for the review!

@MrKevinWeiss
Copy link
Contributor

Ping @kaspar030 and @bergzand

It appears the ba58273 broke stdin on the saml10-xpro.

I don't know exactly how to fix it... If you guys understand then you and use the HiL CI to test and verify the fix otherwise I would suggest a revert.

This was caught by the nightlies in the HiL CI (far too late though)

@bergzand
Copy link
Member Author

It appears the ba58273 broke stdin on the saml10-xpro.

Blegh, thanks for reporting! I strongly suspect something in the power management of the saml10-xpro. At least that's the first spot I would look for this.

@benpicco
Copy link
Contributor

Can confirm, when I remove FEATURES_PROVIDED += no_idle_thread stdin works again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants