drivers: timer: nrf_rtc_timer: NRF boards take a long time to boot application in CONFIG_TICKLESS_KERNEL=n mode after OTA update #45315
Labels
bug
The issue is a bug, or the PR is fixing a bug
platform: nRF
Nordic nRFx
priority: low
Low impact/importance bug
Stale
Observed on the NRF9160 running a non-tickless kernel. When doing an OTA update it took about 12 seconds from reset to MCUboot's "Jumping to the first image slot" log message. However it consistently takes a similar amount of time until the Zephyr "*** Booting Zephyr..." log message is printed.
This issue occurs because the tick interrupts aren't firing for that same period of time. Once the first interrupt has fired, the subsequent ones fire at the correct rate and the scheduler and everything else starts working correctly.
The root cause exists in the sys_clock_driver_init function in nrf_rtc_timer.c. The driver kicks off an asynchronous NRF_RTC_TASK_CLEAR and then calls the compare_set function to set the CC for the first tick interrupt. In my case the value of COUNTER at this time is 400000, so with a tick rate of 128 it will set CC to 400256 (counter() + CYC_PER_TICK). However very shortly afterwards the clear TASK will complete and COUNTER will be reset to 0. Eventually once the counter climbs back up the 400256 the compare event will fire, triggering the first tick interrupt.
So effectively this means that the application startup time is extended by however long it's been since the last RTC reset. So for most boots this is fast enough to not be noticable, but after a long OTA operation, it's very noticeable.
This is technically a problem for builds with CONFIG_TICKLESS_KERNEL=y, though the initial call to compare_set isn't as important, and I'm guessing subsequent ones generally lose the race with the NRF_RTC_TASK_CLEAR.
In light of that, the right fix in my opinion would be to wait until the NRF_RTC_TASK_CLEAR is done before continuing in sys_clock_driver_init, though I don't know if there is a clean way to do that.
Alternatively (for non-tickless) you could just set the initial_timeout to CYC_PER_TICK as an absolute time (256 in my example), since you know the clear is pending or possibly just occurred. The heuristics in compare_set and set_absolute_alarm prevent you from doing this as-is, so you'd need to call set_comparator and friends directly. Ironically if you use set_absolute_alarm it assumes that since the delta is larger than COUNTER_HALF_SPAN that the time has passed and that it should overwrite it with counter() + 2. This in effect mimics the original behavior, since the COUNTER is about to get cleared and you'll end up needing to wait until it reaches 400002.
The text was updated successfully, but these errors were encountered: