Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K_ESSENTIAL option doesn't have any effect on k_create_thread #45545

Closed
xnaveira opened this issue May 11, 2022 · 7 comments · Fixed by #45813
Closed

K_ESSENTIAL option doesn't have any effect on k_create_thread #45545

xnaveira opened this issue May 11, 2022 · 7 comments · Fixed by #45813
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug

Comments

@xnaveira
Copy link
Contributor

Describe the bug
In our application we spawn a thread at innit that will be running a loop during all the lifecycle of the app. If that thread stops execution or exits for any reason we want the whole system to restart. To achieve this we configure the thread with the option K_ESSENTIAL which according to the documentation:

This option tags the thread as an essential thread. This instructs the kernel to treat the termination or aborting of the thread as a fatal system error.

What we observe is that configuring the K_ESSENTIAL option has no effect whatsoever and that if the thread stops the rest of the app keeps running.

To Reproduce
Here a simple app to reproduce the problem:

#include <zephyr.h>
#include <sys/printk.h>


#define LOOP2_STACK_SIZE 1024
#define LOOP2_PRIORITY 1

struct k_thread loop2_thread;
K_THREAD_STACK_DEFINE(loop2_area, LOOP2_STACK_SIZE);


void loop2(void *one, void *two, void *three) {
	int i = 0;
	while (1)
	{
		printk("LOOP 2\n");
		i++;
		if (i > 3)
		{
			printk("LOOP 2 EXITING\n");
			return;
		}	
		k_sleep(K_SECONDS(1));
	}
	
}


void main(void)
{
	printk("Hello World! %s\n", CONFIG_BOARD);

        k_tid_t loop2_thread_id = k_thread_create(&loop2_thread, loop2_area,
                                                   K_THREAD_STACK_SIZEOF(loop2_area),
                                                   loop2,
                                                   NULL, NULL, NULL,
                                                   LOOP2_PRIORITY, K_ESSENTIAL, K_NO_WAIT);

	while (1) {
		printk("LOOP 1\n");
		k_sleep(K_SECONDS(2));
	}
	
}

Expected behavior
When the spawned thread exits the app should be restarted (or halted depending on the platform)

Impact
This is a good fit for our application since the devices that are running it are spread geographically and adds robustness to the design.

Environment (please complete the following information):
We have observed this behavior when testing on qemu_cortex as well as on our hardware board which is a Nordic Semiconductors nrf9160 based one.
We are using Nordic SDK 1.9.1 which ships with Zephyr OS v2.7.99

@xnaveira xnaveira added the bug The issue is a bug, or the PR is fixing a bug label May 11, 2022
@andyross
Copy link
Contributor

I originally assumed this got dropped in the rework of thread abort that happened about a year ago. But here's the commit that removes the old implementation, and it's not present there either (it's just an assert): c0c8cb0

So I guess we've never done this right per the docs. Oops. Should be a simple fix.

@xnaveira
Copy link
Contributor Author

So if I understand correctly, this option was dropped? Is there a similar facility in the kernel that we could use?

@andyross
Copy link
Contributor

No, we just never did it the way we documented it. I don't see why we shouldn't match the docs, your use of K_ESSENTIAL as a kind of watchdog feature seems very reasonable to me, and the added code is literally as simple as if(z_is_thread_essential(t)) k_panic(); in the right spot on the abort path.

@xnaveira
Copy link
Contributor Author

Oh I see. Happy to test a patch if you send it my way :D

@xnaveira
Copy link
Contributor Author

Is there any work going to be done around this?

@carlescufi
Copy link
Member

@andyross could you please open a Pull Request with the fix?

@carlescufi carlescufi added the priority: high High impact/importance bug label May 17, 2022
andyross pushed a commit to andyross/zephyr that referenced this issue May 19, 2022
Documentation specifies that aborting/terminating/exiting essential
threads is a system panic condition, but we didn't actually implement
that and allowed it as for other threads. At least one app wants to
exploit this documented behavior as a "watchdog" kind of condition,
and that seems reasonable.  Do what we say we're supposed to do.

This also includes a small fix to a test, which seemed like it was
written to exercise exactly this condition.  Except that it failed to
detect whether or not a system fatal error was actually signaled and
was (incorrectly) indicating "success".  Check that we actually enter
the handler.

Fixes zephyrproject-rtos#45545

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor

Fix up. Just to quibble with priority though: this is a silly thing to mark "high". The obvious workaround to a lack of system panic on exiting essential threads is just to panic at the end of the thread function. :) Or if you want to catch async aborts, join a thread to it and panic in that.

carlescufi pushed a commit that referenced this issue May 20, 2022
Documentation specifies that aborting/terminating/exiting essential
threads is a system panic condition, but we didn't actually implement
that and allowed it as for other threads. At least one app wants to
exploit this documented behavior as a "watchdog" kind of condition,
and that seems reasonable.  Do what we say we're supposed to do.

This also includes a small fix to a test, which seemed like it was
written to exercise exactly this condition.  Except that it failed to
detect whether or not a system fatal error was actually signaled and
was (incorrectly) indicating "success".  Check that we actually enter
the handler.

Fixes #45545

Signed-off-by: Andy Ross <[email protected]>
laxiLang pushed a commit to laxiLang/zephyr that referenced this issue May 30, 2022
Documentation specifies that aborting/terminating/exiting essential
threads is a system panic condition, but we didn't actually implement
that and allowed it as for other threads. At least one app wants to
exploit this documented behavior as a "watchdog" kind of condition,
and that seems reasonable.  Do what we say we're supposed to do.

This also includes a small fix to a test, which seemed like it was
written to exercise exactly this condition.  Except that it failed to
detect whether or not a system fatal error was actually signaled and
was (incorrectly) indicating "success".  Check that we actually enter
the handler.

Fixes zephyrproject-rtos#45545

Signed-off-by: Andy Ross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants