-
-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Rust Errors more idiomatic #1655
Comments
Hey, I would like to help implement this, but I'm new to ockam. I see the error definition has an error code and domain string. pub struct Error {
code: u32,
#[cfg(feature = "std")]
domain: String,
} What should those be when there is a |
@mcastorina thank you for picking this up 🙏 I think we need to take a deeper look at the design of our Errors so that they carry source, backtrace, cause etc. as they propagate within Ockam or surface to a user. We chose the code and domain approach because we'll need to expose them over FFI. But that shouldn't hurt the experience of Rust users. Rust Users should get source, backtrace, cause etc. |
I'm excited to contribute! :) Perhaps https://github.com/dtolnay/thiserror will be useful here. |
@mrinalwadhwa I'm not sure we should implement Fromstd::io::Error for ockam::Error. That may lead to worse errors experience. E.g. in this case when std::io::Error is thrown, from a function caller side I would expect error that says "Invalid forwarder address user input" or something like that, but if we implement default conversion from std::io::Error caller will get something like "IO error", which is less informative. |
As for extending Error information with stacktrace and additional data, that would be great, but I'm not very familiar with how that works and how that can or can't be compatible with our Actor framework. There's some research work to be done by someone |
@mrinalwadhwa I your example, you are writing an application so I don't think you should return Anyway, IMHO @mcastorina is right about the usefulness of |
Nice to see love for dtolnay/thiserror and dtolnay/anyhow! IMO they're excellent defaults when it comes to Rust error handling. That said, Ockam has some special needs. @mrinalwadhwa already mentioned the need to support FFI but our existing error handling strategy also has a number of important advantages for targeting embedded (
So the big question is, without losing these advantages, how can we get features like:
It would be nice to also support these on i.e. Neither thiserror or anyhow 3 provide full In general, the situation is not great when it comes to idiomatic unified error-handling and
So maybe we should also be looking at using a collection of simple crates & code to implement our wish list individually? A couple that spring to mind are:
Can anyone suggest some more? Finally, Rust's error-handling story is starting to solidify but there are still a few big changes coming (e.g. RFC-2504) so it would be nice to not lock the library or dependent apps into anything that's going to be deprecated. [1] One change I would like to see here is to define [2] We can get [3] The anyhow crate does provide partial |
+1 to use
I agree with @antoinevg that the best thing for now is to wait. Given my recent experience (which you can read below), points 1 (backtraces) and 2 (u32 to strrepr) should be easier down the road if errors don't contain parameters. Finally, point 3 (human-readable error messages) should be feasible if the Here's my two cents on whether enum's variants should contain parameters to build custom messages or not and how this impact the above three points. In the project I've been working on for the past two years (close to 100k lines) we use We follow a similar approach to what's done in the pub struct Error {
code: ErrorCode,
friendly_message: String,
internal_message: String,
} where:
Then we have a bunch of #[derive(thiserror::Error, Debug)]
pub enum CustomError {
#[error("score below threshold [expected={}][actual={}]", _0, _1)]
ScoreBelowThreshold(u32, u32),
#[error("unknown error")]
Unknown,
} We really struggled for a while deciding how to deal with error handling. We were also new to Rust. And, honestly, we are not 100% happy with this approach, but it has proved to be pretty solid so far. Having a good error handling strategy can make a huge difference in the developer experience, that's for sure. The biggest disadvantage we've found: variants with parameters make debugging easier but are limiting if you want to use |
I've been thinking about this a bit, and feel like I should write down my thoughts in case I'm getting something massively wrong about the I think a design like the It hopefully also would allow us to avoid allocation in many cases, see the internal Unfortunately, the approach you get with something like thiserror on an enum tends to be better if you don't have to ferry user errors through it. It also grows fairly large over time, and is hard to expose to the FFI, except manually. We could probably design our own proc macro for this (mostly for assigning error codes, which is otherwise tedious) and I do still consider that an option (I haven't made up my mind here), but... My experience is it's a bit of a headache, and in https://github.com/mozilla/application-services became a bit unwieldy. This approach feels sloppy and bad to use from user code unless you curtail that, which can be very difficult (although it's probably the cleanest and simplest to write, ironically). Also, direct use of All that said, one constraint I haven't fully worked through is serialization. In #967 the errors were reworked due to the need to be deserializable, and that... complicates the design somewhat, especially the desire to have static data. Hrm. Sorry that this is a bit disjointed, I felt like getting it down when I had the thoughts rather than waiting until morning (which may have been advisable). (This is slightly related to #1562, which is partially about FFI error ergonomics) |
Great article about https://matklad.github.io/2020/10/15/study-of-std-io-error.html |
Yep, it's a great and easily-overlooked example of Rust API design (and matklad's whole blog is top-notch, I highly recommend it). |
Having slept on it, I think there are two approaches here I hadn't really considered yesterday (and it's possible that the right solution will involve both in some capacity).
For example: // This is a private inner type that isn't exposed outside of ockam_core.
enum Repr {
Static { code: u32, domain: &'static str, ... },
NonStatic { code: u32, domain: String, ... },
// ...
} When serializing A variant of this approach is to pack the I still have to work through my thoughts here on whether or not this actually buys us anything (aside from some perf), but it's an interesting consideration I hadn't thought of when making the writeup last night. (Sorry for all the messages, this is both one of my first issues here, and also something I think is critically important to having a good-feeling API, so I'd rather over-share my thought process than under-share it. It also serves to document some of the logic for future users) |
Multiple messages are great. Don't hold back .. your train of thought is insightful and I'm learning a lot. |
Some further thoughts on this came to me over the weekend. For one, I don't really know what the use of the error domain is, besides helping out the other side of the FFI, perhaps. The second takes some explanation: One hesitation I have with the In order for // This impl is referenced to later as (*1)
impl<E: std::error::Error> From<E> for MyError { ... snip ... } Then, to allow users to interact with our errors nicely, we need to implement // This impl is referenced to later as (*2)
impl std::error::Error for MyError { ... snip ... } Unfortunately, we can't. Consider the case above where we invoke (There are additional complexities too — There are two solutions to this:
I've been leaning towards 1, part of this is that I don't know how we can ever have a But the downside here is that it's friction for anybody writing an That is, one way you could imagine this working is:
This still has a few issues. The main one being that now we might end up with I haven't gotten that far in experimenting here (it basically came to me over the weekend that this approach might be solve more problems), but so far it seems promising. 1: I'm also not sure how to ensure users pick a good error code, though — even as it is we probably have cases where the error code contains a I also have been imagining that maybe implementing a proc-macro that auto-assigns error domains / codes like 2: It's not as bad for |
@thomcc I wonder how backtrace looks like if error happened inside a Worker/populated to the Worker. When you debug Workers, backtrace that debugger shows you is useless. |
I need to do some experimentation to answer that, but it's a good point. |
Over the weekend I've been revisiting the error design, which has been fairly fruitful — I didn't really understand the issues or design constraints last time... and I think the proposals I made would not have helped. Or, they'd help a little, but would have left many of the same problems that exist in the current design1. (In my defense, it was my first task). Anyway, after a lot more understanding of ockam, how it's used, and its constraints, and a bunch of research... I think I have come to a good understanding about the use cases Note that... this is mostly ignoring things that I think are bigger issues that I intend to address in another write up (that I have in progress), like swallowing errors, serialization, and several more. Anyway, here's my current thinking about what
|
In #2566 @spacekookie and I overhauled the error API. While I believe that this is largely an improvement, It revealed it has a number of areas needing further refinement, and several aspects did not pan out as well as I had hoped:
|
I agree with most of your observations, except:
The problem I see with this approach is that we will return errors that don't mean anything to users. If you look at the breakdown from the ockam_node miro board, tons of errors are because of an internal sender problem. Throwing a tokio error at the user is extremely unergonomic, so we'll want to wrap the error with some amount of context at least. Overall crate-local errors (or even module-local errors) should be considered part of the API, not just an afterthought. And especially not something where we should just propagate errors from layers that our users neither know or care about |
This is what the some_tokio_func().await.map_err(|e| {
Error::new(Origin::Node, Kind::Io, e)
.context("reason", "some_tokio_func() is upset!")
}) (That said, I don't blame you for not realizing this -- the fact that it's key/value so that stuff like errors from workers could contain which worker type/instance caused the issue probably made this confusing) Or is there a reason that's not viable? |
I think I need to play around with the attached context API a bit more. It's certainly doable. I still think that having a per-API error facade to abstract the most common problems is a good design decision. |
I partially agree. I think you should have the underlying error available in some way -- dropping it on the floor because it seems like an implementation detail is not useful behavior for a library. And I think a lot of them are potentially relevant to to users of the API. For example, serde errors:
(The use of BARE makes everything more annoying here since it has basically no ability to provide even remotely useful info...) I also think tokio errors are a weird case. Tokio errors have a few flavors:
A lot of these I feel like we are in the habit of just sending to without thought. The very vague hope was that the fact that there's easy
Ah, yeah, I think this is maybe what I mean by needing improved usage examples. It's probably pretty unclear that that's what that part is for. I had planned on integrating it myself, which would help there, but... well, yeah (thank you so much for taking it over though seriously). That said I think this part of the API does have some real wonkiness:
Hmm... I don't really know what you mean by a failure journal.
Yeah I mean, the need to go through traits really limits flexibility here. Same reason |
So I'm not against this, for sure. In the direct user-facing API-surface there are very likely pieces where it doesn't really make sense to use something type-erased like In the internals I'm unsure. I think we need to be considered about error handling there, and the errors aren't API-facing. If there's specific cases that need to be handled separately it makes sense though... (Hm, I guess I see how something like the I'd... really like to try and avoid making it too easy/automatic to ignore errors (at least in some of the code). |
Ah, okay. I'm pretty sure I have a solution to how to make the design not require different trait bounds on
This allows the function signatures to work without differences between Also, it does turn out that nothing needs error roundtrip serialization (when the initial work was done, it was still needed, but it got removed since then). I had a quick meeting with @spacekookie this morning about this and it also seems that supervisors will not need it either (they need serialization of some data, but no requirement for deserialization) This enables... major cleanups to the internals and the model. Actually, almost all the complexity in the code (and much of the complexity in the model) came from how painful serialization of some of this data is. Footnotes
|
I wrote
Got
Update:
I think we need to take a deeper look at the design of our Errors so that they carry source, backtrace, cause etc. as they propagate within Ockam or surface to a user. We chose the code and domain approach because we'll need to expose them over FFI. But that shouldn't hurt the experience of Rust users. Rust Users should get source, backtrace, cause etc.
The text was updated successfully, but these errors were encountered: