Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NimBLE Stack Overflow #5200

Closed
todd-herbert opened this issue Oct 31, 2024 · 3 comments · Fixed by #5202
Closed

[Bug]: NimBLE Stack Overflow #5200

todd-herbert opened this issue Oct 31, 2024 · 3 comments · Fixed by #5202
Assignees
Labels
bug Something isn't working

Comments

@todd-herbert
Copy link
Contributor

todd-herbert commented Oct 31, 2024

Category

BLE

Hardware

Not Applicable

Firmware Version

2.5.9.28b469d

Description

Over the past week, I've occasionally been getting stack overflow errors from NimBLE. I've been working on Heltec VME290 when seeing this. The error occurs during the initial get config stage of connecting to my Android device. I have struggled to reliably repeat this, as flashing a new build seems to resolve the issue, at least temporarily.

Today on Discord, a user has reported a similar NimBLE stack issue when using the ATAK plugin. In their case, the canary is triggered later in use, not on the initial connection. They were able to reliably reproduce it across several different ESP32 devices. The issue for them affects builds at least as far back as 2.5.3, which was the oldest build they tested.

If I have correctly understood the discussion on Discord, the likely solution is to increase CONFIG_BT_NIMBLE_HOST_TASK_STACK_SIZE. I'm not confident enough myself though to know how much of an increase would be appropriate, or what potential negative impacts this could have.

Relevant log output

Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception). 
Debug exception reason: Stack canary watchpoint triggered (nimble_host)
Core  0 register dump:
PC      : 0x40384bdb  PS      : 0x00060036  A0      : 0x80382f30  A1      : 0x3fcc19e0  
A2      : 0x3fc9b880  A3      : 0xb33fffff  A4      : 0x0000abab  A5      : 0x00060023
A6      : 0x00060023  A7      : 0x0000cdcd  A8      : 0xb33fffff  A9      : 0xffffffff
A10     : 0x3fc9b854  A11     : 0x00000001  A12     : 0x00060021  A13     : 0x3fcc1ab0  
A14     : 0x02c9b880  A15     : 0x00ffffff  SAR     : 0x00000013  EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000  


Backtrace: 0x40384bd8:0x3fcc19e0 0x40382f2d:0x3fcc1a20 0x403814d8:0x3fcc1a50 0x403814ce:0xa5a5a5a5 |<-CORRUPTED




ELF file SHA256: bbfa50525a92fcd2

E (15278) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
@todd-herbert todd-herbert added the bug Something isn't working label Oct 31, 2024
@thebentern
Copy link
Contributor

I found a reference to someone from espressif indicating that the stack size should be increased to 8192 when verbose logging is enabled on NimBLE. Given our closer to MTU sized payloads and frequency of transmission, that figure could be a good starting point.

@todd-herbert
Copy link
Contributor Author

Thanks for the insight there. I didn't want to just take a stab in the dark at a PR without any real knowledge.

@thebentern
Copy link
Contributor

Thanks for the insight there. I didn't want to just take a stab in the dark at a PR without any real knowledge.

Thanks for hunting down the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants