-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel timeout_list NULL pointer access #36266
Comments
Loopback is supported, just set |
Can be replicated with native_posix
|
Note that because 0 is a valid fd, the check in tcpserver thread needs to look like this
As almost every limit in Zephyr is determined at build time, some networking limits need to be increased because you are creating quite many sockets etc. in the application. So I compiled the application like this in order to increase the socket count etc.
This helped a bit although I am still seeing error prints in console and eventually there is a crash like this.
I also needed to apply the fix at #36284. |
The client thread is flooding the system which is causing some issues in TCP because the Everything starts to work much better if I add
after the socket close in tcpclient thread. |
The following is how it goes:
|
My understanding is if remote TCP server is unreachable (e.g. IP address is wrong) and TCP client in Zephyr is keeping retry to connect, then Zephyr will crash. |
A race condition occurs when tcp_conn_unref() is executed by Locking
|
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
@tbursztyka do you think this issue is still applicable? |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
I use MIMXRT1060_EVK board to run Zephyr. To avoid NULL pointer access, I use MPU to set ARM address 0x0 (located in ITCM which is not used in my program) as no-access attribute, so that NULL pointer access to 0 will cause MPU FAULT. A NULL pointer access occurs in remove_timeout() which is called by sys_clock_announce() in kernel/timeout.c.
"Faulting instruction address (r15/pc): 0x6000ddee" is executing
next->prev = prev;
insys_dlist_remove()
.next
pointer represented byr3
should not be NULL (0).The following is the complete assembly code of remove_timeout() in which sys_dlist_remove() is called.
To Reproduce
The following is main.c used to duplicate the phenomenon. 32 TCP server threads and 32 TCP client threads are created. TCP client thread tries to connect to local server. Zephyr does not support IP loopback interface, so TCP client connection fails and then continues to try connection. After few minutes, MPU fault occurs.
Impact
This is a test to cause potential problem. Multi-thread sockets program can cause system crash as in #32564
Environment
The text was updated successfully, but these errors were encountered: