-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: Controller: Fix spurious ISO Sync receiver stall #80775
Bluetooth: Controller: Fix spurious ISO Sync receiver stall #80775
Conversation
Align audio test Controller Kconfig value same as used with nRF53bsim. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
Fix spurious ISO Sync Receiver stall due to uninitialised value accessed due to regression introduced by commit 64facee ("Bluetooth: controller: Stop Sync ISO ticker when establishment fails"). Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
@dkalowsk @mmahadevan108 It would be very nice to get this for 4.0, otherwise we should disable the failing test case (it would be quite bad to release 4.0 with a failing BT testcase as that would prevent backporting any other BT related fix) |
Will try to test this with my PR that saw a similar problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead just use the nRF53 BSIM .conf file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will investigate as part of enabling some high reliability CAP tests. Reusing the target conf file is challenging as high reliability test need lot more tx and rx buffers to handle worst case retransmissions.
Trying out fixes and Kconfig changes here: #80788
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a feeling it was related to uninitialzed data, but valgrind didn't show anything for me. How did you find this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to debug inside the lower link layer to check when the reception stopped, there were calls to ticker_stop
and lead to the caller which used this uninitialized value (recently added struct member). Since the value is from inside the Rx buffer node allocated from pool, the overall content get assigned something sometime before when receiving other packets.
@@ -1281,6 +1282,7 @@ static void isr_rx_done(void *param) | |||
|
|||
/* Calculate and place the drift information in done event */ | |||
e->type = EVENT_DONE_EXTRA_TYPE_SYNC_ISO; | |||
e->estab_failed = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value isn't assigned after line 1252 - Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, intentional. Only relevant structure members are assigned for each type being passed around.
Controller has large Rx data flow (in struct node_rx) overheads (two parts: fixed overhead of unions, and configurable PDU length payload). Generally, not all members of the overhead part (respective to the type of rx node) are assigned.
When adding new struct member, typically all sources of the rx node type in the data flow need to be inspected :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought i need to investigate further today...#80788 pass locally, but fails in CI :-(
I think this fixes the stalling issue (passes those tests locally), but it will fail other tests from my PR #80691:
This is a broadcast test, and the host always enqueues 2 SDUs per stream in the controller. I think it's OK to merge, as it fixes the issues it claims, but it does not solve all issues. Will spend some more time tomorrow verifying that the host correctly enqueues the number of SDUs it expects. |
@aescolar thanks for bringing it to attention. It does have the associated Issue to fix so as far as I can tell this will work on RC2 provided maintainers are good with it (and it looks like Thally is) |
@aescolar @Thalley any new PR fails CI with different audio tests failing for same commit HEAD with re-runs: https://github.com/zephyrproject-rtos/zephyr/actions/runs/11666290372?pr=80788 Comparing the bsim timestamps between CI and local run (local native posix toolchain), they do not appear same and i suspect timing changes in bsim run on CI too for re-runs. Audio test need the bsim timing to be deterministic on every run otherwise the Controller will experience overlapping states/roles causing tests to fail. Example, timing compare here: #80788 (comment) |
Refer to #80734 (comment) The conf change in the commit in this PR is to address the inconsistency between nrf52 bsim and nrf53 bsim testing. |
Fix spurious ISO Sync Receiver stall due to uninitialised
value accessed due to regression introduced by
commit cvinayak@64facee ("Bluetooth: controller: Stop Sync ISO
ticker when establishment fails").
Align audio test Controller Kconfig value same as used with nRF53bsim.
Fixes #80734.