-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gimlet SP should pull its fault pin on very serious failures #1206
Comments
Presently we assert the fault net on sequencer failures by having the hubris/drv/gimlet-seq-server/src/main.rs Line 210 in fe71189
and only stop asserting it once the sequencer server is properly running (i.e., we have reached A0 unless on a lab image): hubris/drv/gimlet-seq-server/src/main.rs Lines 502 to 504 in fe71189
Would the natural extension of this be to have system_init() in the Gimlet app's main.rs immediately pull the pin low and leave it that way until the sequencer has actually come up? Or might we want to pull it low only in the case of the init function actually failing (e.g. wrong board)?
|
Currently Line 128 in fe71189
Result or something and set the fault net before panicking. However, it's occurred to me that maybe we would also want to assert FAULT_L in the event of any kernel panic, which, in turn, makes me wonder whether we ought to have a way for an app to provide additional code that the kernel should run inside of its panic handler, so that we can assert the fault net on any kernel panic.
One way to do this could be to stuff a function pointer in a static for "extra code to do on panics", and have the kernel crate expose some function for doing this at the top of an app's entrypoint, but that feels a little messy. Another option could be |
The SP has a net connected to Ignition for reporting failures up-stack. We should use it in very severe failure cases. Ideally, we would use it if the "you flashed the wrong firmware on this board" check before main triggers -- assuming that pin is in the same place on all supported revisions, of course.
More broadly we're talking about how to detect low-level boot failure, and such failures may want to pull the SP fault pin. (It's not immediately clear, because that fault pin does not go to the RoT, and the RoT is responsible for deciding to roll back an SP update. I'll file a separate bug.)
The text was updated successfully, but these errors were encountered: