Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved C3 watchdog support #47

Open
ThirteenFish opened this issue Jun 28, 2024 · 0 comments
Open

Improved C3 watchdog support #47

ThirteenFish opened this issue Jun 28, 2024 · 0 comments

Comments

@ThirteenFish
Copy link

ThirteenFish commented Jun 28, 2024

It turns out at least one other person has thought about how to do hardware watchdogs on Linux and there's a robust set of tooling surrounding them. We should make use of these tools, replacing our own implementation. As far as I can tell there's for parts to the next iteration: a kernel driver, userspace tooling, systemd service specific configuration, and Oresat documentation/testing.

  • Kernel
    • There's a watchdog driver class. Since our watchdog is GPIO driven we probably want gpio-wdt.
    • This is probably set up though device tree? So we'll want to configure it there for the C3
    • Is there a software/simulated driver that we can run as part of a VM for testing?
    • At least two driver based watchdogs already exist on the C3, watchdog0, and watchdog1. What are they and how can they be incorporated?
  • Userspace
    • systemd has PID 1 support for hardware watchdogs and a general scheme for system wide watchdogs. See this blogpost for an introduction and man systemd-system.conf for specific knobs to turn.
    • There's also userspace tooling like wdctl. Are there more tools?
    • watchdog as mentioned in the blog is also a thing. Do we want it? It monitors system health. Do we care about the things it monitors? Are there other system health indicators that we would care about?
    • Are there other userspace things or tools that I've missed?
  • Services
    • As mentioned in the above blog, services have software watchdog support. See the manpages for systemd.service, systemd.exec, sd_watchdog_enabled and sd_event_set_watchdog (are there others too?).
    • Our mission critical services should be covered by this. This is at least oresatd, and uhf and lband.
    • Since those are python/rust are there sd_* bindings for those languages? What libraries and how would they be integrated.
    • Are there other services that should be covered by watchdogs?
  • Oresat
    • There's a lot of moving parts here and so it'd be great if there was an architecture guide and a users guide on how all of this fits together.
      • On the architecture side which pieces exist, how they fit together, the rational behind them.
      • On the users guide side, what tools are available to configure, poke at, or disable the watchdog, what would I need to set up a new service covered by a watchdog, how would I verify that I succeeded.
    • Some kind of test environment or VM, probably using a software or simulated watchdog
    • Some kind of manual test plan or automated tests to verify that the watchdogs function the way we expect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant