Gimlet SP should pull its fault pin on very serious failures #1206

cbiffle · 2023-03-14T20:44:58Z

The SP has a net connected to Ignition for reporting failures up-stack. We should use it in very severe failure cases. Ideally, we would use it if the "you flashed the wrong firmware on this board" check before main triggers -- assuming that pin is in the same place on all supported revisions, of course.

More broadly we're talking about how to detect low-level boot failure, and such failures may want to pull the SP fault pin. (It's not immediately clear, because that fault pin does not go to the RoT, and the RoT is responsible for deciding to roll back an SP update. I'll file a separate bug.)

hawkw · 2024-11-04T21:59:30Z

Presently we assert the fault net on sequencer failures by having the drv-gimlet-seq-server task pull it low as soon as it starts:

hubris/drv/gimlet-seq-server/src/main.rs

Line 210 in fe71189

sys.gpio_reset(FAULT_PIN_L);

and only stop asserting it once the sequencer server is properly running (i.e., we have reached A0 unless on a lab image):

hubris/drv/gimlet-seq-server/src/main.rs

Lines 502 to 504 in fe71189

    
           // Clear the external fault now that we're about to start serving messages 
        
           // and fewer things can go wrong. 
        
           sys.gpio_set(FAULT_PIN_L);

Would the natural extension of this be to have system_init() in the Gimlet app's main.rs immediately pull the pin low and leave it that way until the sequencer has actually come up? Or might we want to pull it low only in the case of the init function actually failing (e.g. wrong board)?

hawkw · 2024-11-04T22:14:02Z

Currently system_init() handles such faults by panicking:

hubris/app/gimlet/src/main.rs

Line 128 in fe71189

assert_eq!(rev, expected_rev);

, which I suppose is reasonable. However, if we want to stick code for asserting the fault pin there, we might want to change the function to return a Result or something and set the fault net before panicking. However, it's occurred to me that maybe we would also want to assert FAULT_L in the event of any kernel panic, which, in turn, makes me wonder whether we ought to have a way for an app to provide additional code that the kernel should run inside of its panic handler, so that we can assert the fault net on any kernel panic.

One way to do this could be to stuff a function pointer in a static for "extra code to do on panics", and have the kernel crate expose some function for doing this at the top of an app's entrypoint, but that feels a little messy. Another option could be #[linkage="extern_weak"], but this is unstable and AFAICT, it seems kind of unclear what its eventual fate will be (see rust-lang/rust#29603)...

cbiffle mentioned this issue Mar 14, 2023

SP and RoT need to coordinate on firmware rollback decisions #1207

Open

cbiffle added this to the MVP milestone Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gimlet SP should pull its fault pin on very serious failures #1206

Gimlet SP should pull its fault pin on very serious failures #1206

cbiffle commented Mar 14, 2023

hawkw commented Nov 4, 2024 •

edited

Loading

hawkw commented Nov 4, 2024

Gimlet SP should pull its fault pin on very serious failures #1206

Gimlet SP should pull its fault pin on very serious failures #1206

Comments

cbiffle commented Mar 14, 2023

hawkw commented Nov 4, 2024 • edited Loading

hawkw commented Nov 4, 2024

hawkw commented Nov 4, 2024 •

edited

Loading