Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the light-client more robust to panics by using panic=unwind #519

Open
tomaka opened this issue May 4, 2023 · 8 comments
Open

Make the light-client more robust to panics by using panic=unwind #519

tomaka opened this issue May 4, 2023 · 8 comments

Comments

@tomaka
Copy link
Contributor

tomaka commented May 4, 2023

This isn't possible in Rust yet, but in principle we could use panic=unwind when compiling the wasm node. Then, the code could be tweaked in order to restart services if they die.

@tomaka tomaka added the blocked Progress on this issue requires something beyond our control label May 4, 2023
@tomaka
Copy link
Contributor Author

tomaka commented Aug 25, 2023

rust-lang/rust#111322 is included in Rust 1.72

@tomaka tomaka removed the blocked Progress on this issue requires something beyond our control label Aug 25, 2023
@tomaka
Copy link
Contributor Author

tomaka commented Aug 25, 2023

Unfortunately, using rust-lang/rust#111322 seems incompatible with using wasm32-wasi-threads for #91

Or maybe we could do threads manually instead of using wasm32-wasi-threads.

@tomaka tomaka changed the title Make the wasm-node more robust to panics by using panic=unwind Make the light-client more robust to panics by using panic=unwind Sep 3, 2023
@tomaka
Copy link
Contributor Author

tomaka commented Sep 3, 2023

Ignoring the wasm-node, I think it's a good idea to do this in situations where the light client is embedded in a program where unwinding is enabled.

One blocker is the design of the "networking events receivers". It's not possible right now to create a new receiver in case the sync/runtime/etc service dies and takes the receiver with it in its death.
And similarly if the networking service dies things are complicated and a simple receiver is maybe not the most appropriate way.

@tomaka
Copy link
Contributor Author

tomaka commented Sep 3, 2023

One issue with the latest comment is that it forces the UnwindSafe trait on Platform.
A bigger issue is (if I'm not mistaken) that async functions seem to generate futures that don't implement UnwindSafe.

@tomaka
Copy link
Contributor Author

tomaka commented Sep 9, 2023

The first thing to do would be to refactor the RuntimeService to have a clear separate between foreground and background (which is something I think should be done no matter what).

@tomaka
Copy link
Contributor Author

tomaka commented Nov 20, 2023

Here are the difficulties I've noted for each module:

  • Network service: lib.rs injects bootnodes after creating the service, so if the network service crashes and restarts it won't know of any bootnode. Also the changes conflict with Use only one network service for all chains #111.
  • Runtime service: several functions need to return Results when otherwise they always succeed
  • Transactions service: the channel that watches transactions has weird guarantees; wrap around another type to make things clearer
  • Sync service: serialize_chain_information might be problematic
  • JSON-RPC service: the whole design might need to be re-thought again in order to gracefully handle subscription tasks crashing

Another important point is that subscription IDs should be assigned by the frontend, in order to avoid race conditions.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 21, 2023

After #1376, I've tried to compile smoldot with the instructions given here: rust-lang/rust#111322

This increases the binary size (expectedly) from 2.78 MiB to 3.0 MiB
I guess the trade-off could be worth it? Not totally sure.

Also, this unfortunately requires defining the eh_personality language item, which I obviously can't do properly.

No matter what, as explained above, we still want smoldot-light-base to be able to recover in case of panic. This just wouldn't apply to the wasm node just yet.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 24, 2023

Services to make panic-resilient:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant