-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: L2CAP: Deadlock when there are no free buffers while transmitting on multiple channels #34600
Comments
@KatrineNordic which Nordic (upstream, not NCS/sdk-zephyr) revision did you test this with? |
This seems to be an issue upstream, as per a discussion with @KatrineNordic and @joerchan |
Downgrading to low since:
|
@Vudentz FYI, in case you have additional comments to this. |
@KatrineNordic, it sounds like you've already done some digging here to find the cause. Do you have any suspicions on where in the code the bug is? Where does this queuing happen? |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
This issue likely still exists. I will try to reproduce. |
I have discussed this with @KatrineNordic. The issue is is a missing trigger for
It is possible to end up in a situation where none of those triggers will happen, but there remain segments to be sent. Note that the segment sent trigger will not happen if there was no room to queue any segments from This issue can be fixed by a trigger for when room becomes available on the connection TX queue. I believe we can make |
I believe this is corner case of issue #20640. The fix for that issue looks to be based on the belief that if at least one segment is queued, the next will be able to queue in the place of the previous. I don't have a complete overview over why it does not work, but I suspect that there is a race for the queue between the segment-sent-callback and the application queuing more. This commit is the partial fix with the assumption: c654bcf |
@alwa-nordic is following up on possible solutions. |
Working on it, managed to reproduce -a- deadlock (not sure if it's that specific one yet, but does look like it) on https://github.com/jori-nordic/zephyr/tree/l2cap-deadlock |
This test reproduces more-or-less zephyrproject-rtos#34600. It has a central that connects to multiple peripherals, opens one l2cap CoC channel per connection, and transmits a few SDUs largely exceeding the MPS of the channel. In this commit, the test doesn't pass, but when it passes (after the subsequent commits), error and warning messages are expected from the stack, as this is not the happy path. We can later debate on whether these particular error messages should be downgraded to debug. Signed-off-by: Jonathan Rico <[email protected]>
This test reproduces more-or-less #34600. It has a central that connects to multiple peripherals, opens one l2cap CoC channel per connection, and transmits a few SDUs largely exceeding the MPS of the channel. In this commit, the test doesn't pass, but when it passes (after the subsequent commits), error and warning messages are expected from the stack, as this is not the happy path. We can later debate on whether these particular error messages should be downgraded to debug. Signed-off-by: Jonathan Rico <[email protected]>
This test reproduces more-or-less #34600. It has a central that connects to multiple peripherals, opens one l2cap CoC channel per connection, and transmits a few SDUs largely exceeding the MPS of the channel. In this commit, the test doesn't pass, but when it passes (after the subsequent commits), error and warning messages are expected from the stack, as this is not the happy path. We can later debate on whether these particular error messages should be downgraded to debug. Signed-off-by: Jonathan Rico <[email protected]> (cherry picked from commit 7a6872d)
This test reproduces more-or-less #34600. It has a central that connects to multiple peripherals, opens one l2cap CoC channel per connection, and transmits a few SDUs largely exceeding the MPS of the channel. In this commit, the test doesn't pass, but when it passes (after the subsequent commits), error and warning messages are expected from the stack, as this is not the happy path. We can later debate on whether these particular error messages should be downgraded to debug. Signed-off-by: Jonathan Rico <[email protected]> (cherry picked from commit 7a6872d)
Describe the bug
Transmitting many large SDUs as quickly as possible on multiple L2CAP channels quickly fills a lot of buffers. When no more segments can be queued to be transmitted because there are no more free buffers, or there are no more credits for the channel, queuing of segments is stopped. Whenever a previously queued segment has been transmitted, or more credits are received, segments are again being queued for transmission. If there are still no free buffers, queuing of segments is stopped again right away. If this happens when all segments that have previously been queued on the channel have already been transmitted, and there are no more credits to receive, queuing of segments is stopped indefinitely.
Expected behavior
Queuing of segments continues when there are free buffers available.
Impact
Annoyance.
Environment (please complete the following information):
Additional context
It is possible to work around this by having a high enough number of buffers (CONFIG_BT_L2CAP_TX_BUF_COUNT or CONFIG_BT_CONN_TX_MAX) that there are always buffers available when queuing of segments is restarted.
The text was updated successfully, but these errors were encountered: