Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the infinite restart caused by unformatted t-echo fs file system #3775

Merged
merged 2 commits into from
May 3, 2024
Merged

Conversation

lewisxhe
Copy link
Contributor

@lewisxhe lewisxhe commented May 3, 2024

In the actual production environment, it was found that some t-echo did not start as expected. After debugging, it was found that they all died while reading file system files, triggering system assertions and causing infinite restarts. The following is the exception log.

//\ E S H T /\ S T / C

DEBUG | ??:??:?? 3 Filesystem files:
DEBUG | ??:??:?? 3  db.proto (1966 Bytes)
DEBUG | ??:??:?? 3 Using analog input 4 for battery level
INFO  | ??:??:?? 3 Scanning for i2c devices...
DEBUG | ??:??:?? 3 Scanning for i2c devices on port 1
DEBUG | ??:??:?? 4 I2C device found at address 0x51
INFO  | ??:??:?? 4 PCF8563 RTC found
DEBUG | ??:??:?? 4 I2C device found at address 0x77
DEBUG | ??:??:?? 4 Wire.available() = 1
INFO  | ??:??:?? 4 BME-280 sensor found at address 0x77
INFO  | ??:??:?? 4 2 I2C devices found
DEBUG | ??:??:?? 4 acc_info = 0
DEBUG | ??:??:?? 4 found i2c sensor meshtastic_TelemetrySensorType_BME280
INFO  | ??:??:?? 4 Meshtastic hwvendor=7, swver=2.3.8.102a20d2
DEBUG | ??:??:?? 4 Reset reason: 0x0
DEBUG | ??:??:?? 4 Setting random seed 2134818540
INFO  | ??:??:?? 4 Initializing NodeDB
INFO  | ??:??:?? 4 Loading /prefs/db.proto
ERROR | ??:??:?? 4 assert failed C:\Users\Lewis\.platformio\packages\framework-arduinoadafruitnrf52\libraries\Ad

The solution is to try to create a file after initializing the file system and test whether the file is created normally. If not, manually format the internal file system again. After adding it, the device starts as expected and works normally.

By the way, this bug has existed for a long time. The last exception was fixed in #1987. This time it may be a better way to make a judgment before initialization. Because the application layer is constantly changing

@caveman99
Copy link
Member

@lewisxhe do you think this check during init can replace #1987 ? I have a suspicion this is also triggering on valid filesystems, causing the odd loss of configurations in the node.

@lewisxhe
Copy link
Contributor Author

lewisxhe commented May 3, 2024

Yes, this problem will also occur during operation. It can replace the PR submitted previously. In my faulty devices, mesh firmware is written first, and the shell is installed after preliminary testing. This problem will still be found. This problem has existed for a long time. Even a normally operating device may cause file system corruption when restarting. Several cases have been found in the echo after-sales process, and they were finally solved by formatting the file system.

@thebentern
Copy link
Contributor

@lewisxhe do you think this check during init can replace #1987 ? I have a suspicion this is also triggering on valid filesystems, causing the odd loss of configurations in the node.

@caveman99 for #1987 I was thinking about this one earlier... if the issue does occur during operation when we are attempting to save, in theory we might still have all of the objects loaded in memory. Perhaps we can try to more gracefully save and restore what we do have after we format the file system.
Trying to test changes to this has always been an issue, since it's a sporadic event. We can add some code to force the fail behavior, but then things will behave in different and unexpected ways during an actual failure event.

@caveman99
Copy link
Member

@thebentern @lewisxhe let's pull this PR in and have a look at the format-on-save code in a separate issue.

@thebentern
Copy link
Contributor

Agreed

@caveman99 caveman99 merged commit a8c38c4 into meshtastic:master May 3, 2024
69 checks passed
@caveman99
Copy link
Member

ARM runner out in space with Major Tom again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants