Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gimlet SP should pull its fault pin on very serious failures #1206

Open
cbiffle opened this issue Mar 14, 2023 · 2 comments
Open

Gimlet SP should pull its fault pin on very serious failures #1206

cbiffle opened this issue Mar 14, 2023 · 2 comments
Milestone

Comments

@cbiffle
Copy link
Collaborator

cbiffle commented Mar 14, 2023

The SP has a net connected to Ignition for reporting failures up-stack. We should use it in very severe failure cases. Ideally, we would use it if the "you flashed the wrong firmware on this board" check before main triggers -- assuming that pin is in the same place on all supported revisions, of course.

More broadly we're talking about how to detect low-level boot failure, and such failures may want to pull the SP fault pin. (It's not immediately clear, because that fault pin does not go to the RoT, and the RoT is responsible for deciding to roll back an SP update. I'll file a separate bug.)

@hawkw
Copy link
Member

hawkw commented Nov 4, 2024

Presently we assert the fault net on sequencer failures by having the drv-gimlet-seq-server task pull it low as soon as it starts:

sys.gpio_reset(FAULT_PIN_L);

and only stop asserting it once the sequencer server is properly running (i.e., we have reached A0 unless on a lab image):
// Clear the external fault now that we're about to start serving messages
// and fewer things can go wrong.
sys.gpio_set(FAULT_PIN_L);

Would the natural extension of this be to have system_init() in the Gimlet app's main.rs immediately pull the pin low and leave it that way until the sequencer has actually come up? Or might we want to pull it low only in the case of the init function actually failing (e.g. wrong board)?

@hawkw
Copy link
Member

hawkw commented Nov 4, 2024

Currently system_init() handles such faults by panicking:

assert_eq!(rev, expected_rev);
, which I suppose is reasonable. However, if we want to stick code for asserting the fault pin there, we might want to change the function to return a Result or something and set the fault net before panicking. However, it's occurred to me that maybe we would also want to assert FAULT_L in the event of any kernel panic, which, in turn, makes me wonder whether we ought to have a way for an app to provide additional code that the kernel should run inside of its panic handler, so that we can assert the fault net on any kernel panic.

One way to do this could be to stuff a function pointer in a static for "extra code to do on panics", and have the kernel crate expose some function for doing this at the top of an app's entrypoint, but that feels a little messy. Another option could be #[linkage="extern_weak"], but this is unstable and AFAICT, it seems kind of unclear what its eventual fate will be (see rust-lang/rust#29603)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants