Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blazor "Error boundaries" design proposal #30940

Closed
SteveSandersonMS opened this issue Mar 15, 2021 · 2 comments
Closed

Blazor "Error boundaries" design proposal #30940

SteveSandersonMS opened this issue Mar 15, 2021 · 2 comments
Assignees
Labels
area-blazor Includes: Blazor, Razor Components design-proposal This issue represents a design proposal for a different issue, linked in the description Done This issue has been fixed
Milestone

Comments

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Mar 15, 2021

Summary

Provide a mechanism to catch and handle exceptions thrown within a particular UI subtree.

Motivation and goals

Currently, we say that unhandled exceptions from components are fatal, because otherwise the circuit is left in an undefined state which could lead to stability or security problems in Blazor Server. There is a mechanism for displaying a terminal error UI to the end user, on by default in the template.

However many developers regard this as inadequate, leading to many requests for "global exception handling". The main point is that it's unrealistic to guarantee that no component throws from a lifecycle method or rendering, especially given large developer teams and use of 3rd-party component libraries. When inevitable unhandled exceptions occur, people are unsatisfied with:

  • The UX being limited to "show an error and ask to reload, losing state". People want a more graceful form of error handling with as much recovery potential as possible.
  • The limitation that all error UI logic has to be in JavaScript (because we refuse to run any more .NET code after the unhandled exception)

Other SPA frameworks support more granular mechanisms for global and semi-global error handling. We would like to do something inspired by the error boundaries feature in React.

Goals:

  • Allow users to provide fallback UIs for select component subtrees in the event that there is an exception there
  • Allow users to build apps with cleaner fallback experiences in the event of unexpected errors
  • Give users more fine-grained control over how failures are handled in the app

Non-goals:

  • Changing what it means to do global exception handling. We already have ILogger for logging, and beyond that, truly unhandled exceptions mean the app is in an undefined state which is unsafe to continue.

In scope

Defining one or more "error boundary" regions within .razor code that handle exceptions from their child content. Each error boundary:

  • ... may be nested
  • ... can optionally define custom UI that will be shown if an error occurs. If not specified, we have a default UI. This needs to vary across rendering platforms (e.g., only web rendering can use HTML elements), and should be locale-independent.
  • ... can support recovery. That is, when they catch an error, it's not fatal to the circuit. The user could navigate somewhere else and continue, or the error boundary itself could even switch back into a non-errored state and render new child content.
  • ... log errors in a sensible way by default. Behaviors will vary across hosting models. For example on Blazor Server, always log detailed errors to the server ILogger, plus send DetailedErrors-respecting info to the client console.

Error boundaries are not expected to catch all possible exceptions. We're trying to cover ~90% of the most common scenarios, such as rendering and lifecycle methods. We are only trying to catch errors that we know for sure are recoverable (i.e., can't corrupt framework state). See the detailed design below for more info. In any other case, the existing mechanisms for terminating the circuit and showing the terminal error UI will continue to apply.

Out of scope

  • Anything that creates new semantics for what kinds of errors are catchable/recoverable, or new application states. See the detailed design below for info on how to reason about this.
  • Catching errors from outside the known, predefined, recoverable places. See detailed design.
  • A mechanism for exceptions to bubble up to ancestor error boundaries. We don't need to recreate concepts like exception filters or rethrowing. It's enough to catch at the closest boundary.

Risks / unknowns

Risk: Developers might think that all possible exceptions will be caught and become recoverable. This doesn't put server stability at risk, but just means the developer will be surprised when the terminal error UI still can appear.

Risk: Developers might disagree with the design about which specific error types should be recoverable.

Risk: Developers might think they don't need to customize terminal error UI any more. It may be confusing that there are multiple different error UIs to customize (one per error boundary, and a final terminal one).

Risk: If errors are recoverable, unwise use of this feature could lead to an infinite error loop. For example, an error boundary might immediately try to re-render the same child content after an error, which immediately (or asynchronously, after some async I/O) throws again.

  • Update: As a mitigation, the error boundary base component will attempt to detect infinite error loops by counting the number of errors. If it exceeds a reasonable number (which is configurable) then we'll treat it as fatal and stop.

Unknown: What happens if the error boundary itself has an unhandled error? See detailed design for a proposal. The risk here is that it's nonobvious.

Risk: If a large number of errors occur all at once within different error boundaries (e.g., if there's a separate boundary around each row in a grid, and every row throws), the log may be swamped with errors.

Risk: If we have to put in new try/catch blocks within the low-level rendering internals (and we will), this might adversely affect perf when running on WebAssembly, perhaps especially in AoT mode, because (IIRC) exceptions are implemented by calling out to JavaScript which then calls back into .NET. The cost would likely be paid by everyone, whether or not they are using error boundaries. We'll have to measure this.

Unknown: What are we putting in the default template? Should there be any error boundaries?

  • Update: As per below, the templates will only include default CSS for error boundaries, but not error boundary components themselves. Developers need to take deliberate action to opt into allowing recovery from errors in specific places.

Risk: If there is an error boundary in the default template, or as soon as a developer adds one to their site, they are implicitly promising that any code they call from their lifecycle methods can tolerate unhandled exceptions. For example, any circuit-scoped model state no longer gets discarded at the first unhandled exception. What if it gets into a broken/illegal state and the circuit is still trying to use it? Developers need to understand what they are opting into. This is the same as with any state that spans multiple requests in MVC.

Risk: Naive developers might routinely mark all their components as being an error boundary and simply ignore exceptions. Again, this doesn't put framework internal stability at risk, but means the application will just not do anything sensible to notify about or log exceptions. Maybe we should put in some kind of roadblock to make error boundary components a special thing, not just a behavior you can trivially mix into any other component - not sure how though.

  • Mitigation: Rather than letting just any component identify itself as an error boundary, the detailed design below proposes a mechanism for ensuring that error boundary components are separate things (but can still be subclassed for different rendering platforms).

Risk: Whatever default error UI we bake in, developers might take a dependency on its exact appearance and behaviors (e.g., resizing to fit content or not) forever. Will we ever be able to change the default error UI? Is it enough to document that the error UI should be customized and if you don't, you accept that it might just change in new framework versions? See detailed design for proposal.

  • Mitigation: The default error UI will be an empty <div>, with style/contents provided purely via CSS in the template, so there's no back-compatibility concern.

Examples

Page-level error boundary in MainLayout.razor:

<div class="main">
    <div class="content px-4">
        <ErrorBoundary @ref="errorBoundary">
            @Body
        </ErrorBoundary>
    </div>
</div>

@code {
    ErrorBoundary errorBoundary;

    protected override void OnParametersSet()
    {
        // On each page navigation, reset any error state
        errorBoundary?.Recover();
    }
}

Error boundary around a component that sometimes fails due to bad data:

@foreach (var item in SomeItems)
{
    <div class="item">
        <ErrorBoundary @key="item">
            <ChildContent>
                <MyItemDisplay Item="@item" />
            </ChildContent>
            <ErrorContent>
                <p class="my-error">Sorry, we can't display @item.ItemId due to a technical error</p>
            </ErrorContext>
        </ErrorBoundary>
    </div>
}
@SteveSandersonMS SteveSandersonMS added the design-proposal This issue represents a design proposal for a different issue, linked in the description label Mar 15, 2021
@SteveSandersonMS SteveSandersonMS added this to the 6.0-preview3 milestone Mar 15, 2021
@SteveSandersonMS SteveSandersonMS self-assigned this Mar 15, 2021
@SteveSandersonMS SteveSandersonMS added the area-blazor Includes: Blazor, Razor Components label Mar 15, 2021
@SteveSandersonMS
Copy link
Member Author

SteveSandersonMS commented Mar 15, 2021

Detailed design

How do we decide what kinds of errors should be recoverable? How do we avoid the risk that this feature creates new ways to corrupt framework state and introduce stability/security issues?

I propose we think about it as follows: we should only do things that are equivalent to what you could already have done in user code. There's an error-handling pattern you could already have done; it's just really inconvenient. I propose we take that pattern and automate it. Then it doesn't create any entirely new security/stability risks.

An existing error handling pattern

Imagine if a developer did the following:

  1. Define a component, <ErrorBoundary>, which takes child content. It uses a <CascadingValue> to advertise itself to descendants.
  2. In every descendant, you receive the ErrorBoundary as a cascaded parameter. And, in every lifecycle method and event handler, you wrap all your code in a try/catch. If there's an exception, you simply call some method HandleError(exception) on the ErrorBoundary and return.
  3. The ErrorBoundary ancestor receives the HandleError call and responds by re-rendering itself. This time instead of rendering ChildContent, it renders some other ErrorContent.

What about exceptions during rendering? This is debatable, but technically nothing stops you today from doing a try/catch around the rendering logic. If there's an exception, as well as calling HandleError, you could call __builder.Clear() so the builder is left in a valid state. So that's technically handleable today already.

That's pretty much it. These exceptions are no longer fatal, because the framework never sees them. The lifecycle methods and event handlers never throw. Framework internal state is not at risk in any way.

Automating this error handling pattern

Of course the primary drawback to this pattern is that it's ridiculously inconvenient to put try/catch around every single method, and there would be perf drawbacks to passing around cascaded parameters to every single component. But if we bake this pattern into Renderer, then neither the try/catch blocks nor the cascading parameters would be needed.

The key principle, then is:

  • In specific places where the framework calls user code, we should add a try/catch around the call into user code. And only around the single line of that call - not around any other framework code. This ensures it's equivalent to the developer putting a try/catch around their own code, and doesn't risk leaving any framework state in an incomplete place.
  • We only do this in places where it's OK if the user code doesn't do anything. For example, the framework doesn't care whether an event handler, lifecycle method, or rendering method does something or not. However the framework does require IServiceProvider.GetService to return something, and it does require IComponentActivator to return a valid component, so exceptions from those places aren't applicable to route to an error boundary.

What's in and what's out

We will support catching and recovery from exceptions from:

  • Lifecycle methods (OnInitialized[Async], OnParametersSet[Async], and OnAfterRenderAsync)
  • Event handlers
  • Rendering (BuildRenderTree)

Although the above is phrased in terms of ComponentBase APIs, the renderer really does it in terms of IComponent/IHandleEvent, and IHandleAfterRender, so it works with any IComponent.

We will not support catching and recovery of exceptions from:

  • Component constructors, IComponentActivator, or IComponent.Attach.
    • Reason 1: these shouldn't be recoverable because what state would we leave the parent in? This would violate the principle of not creating any new application states.
    • Reason 2: we don't have to! If we just let this bubble up to be an exception from the parent's rendering process, it can be caught and handled there.
  • Component disposal (Dispose/DisposeAsync)
    • Reason: although we could make these recoverable, it's too dangerous. If disposal logic doesn't complete, it very likely means the component is now leaking memory or unmanaged resources. This is too serious to suppress, so developers should manually use try/catch inside disposal logic if it's actually recoverable.
  • Event handlers that are arbitrary delegates. We only handle the ones where it goes through IHandleEvents and the target is an IComponent. Otherwise, we have no idea which component to blame.
  • InvokeAsync
    • Reason: exceptions from these are already nonfatal - we just return them to the caller. If in the future we create a new void-returning API like InvokeVoidAsync that does its own error handling, then we would catch those exceptions in the enclosing error boundary.
  • .NET-to-JS interop calls
    • Reason: exceptions from these are already nonfatal - we just return them to the caller. Also they aren't necessarily in the context of any particular component.
  • Any other code outside the component rendering system.
    • Reason 1: Blazor doesn't know about other exceptions, e.g., in response to a System.Timer.
    • Reason 2: How would we know which component to blame for the exception, hence which error boundary to trigger?

Error boundary components

There will need to be different error boundary components for different hosting models. This is because the default error UI has to vary (you can only use HTML elements in web rendering; you'd do something else in MBB). So there has to be some public API you can implement/subclass to define an error boundary component.

However, we don't want just any component to be able to mark itself as an error boundary, as this would likely lead to antipatterns like marking all components as error boundaries (in an attempt to create On-Error-Resume-Next-style semantics). Also at least at initially, we should impose some clear behavioral rules on what error boundaries do when there is an error.

So, I propose the following:

  • Renderer knows only about some new internal interface IErrorBoundary
  • In Microsoft.AspNetCore.Components, there's a standard base class public abstract class ErrorBoundaryBase : IErrorBoundary, IComponent. This contains the common error handling semantics that we'd expect to be the same across all rendering platforms. For example, on error it logs errors, and shows some ErrorContent or default error UI.
  • In Microsoft.AspNetcore.Components.Web, there's the actual component developers consume, public class ErrorBoundary : ErrorBoundaryBase which overrides abstract base class methods to:
    • Define the default error UI in terms of HTML elements
    • Define default error logging behaviors. See below for how the behaviors differ between WebAssembly/WebView/Server/Prerendering.

More explicitly, these types look like:

// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components

namespace Microsoft.AspNetCore.Components
{
    internal interface IErrorBoundary
    {
        void HandleException(Exception exception);
    }

    public abstract class ErrorBoundaryBase : ComponentBase, IErrorBoundary
    {
        // ... implement stuff ...
        public void Recover() {  }
        protected abstract Task OnErrorAsync(Exception exception);
    }
}

// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components.Web

namespace Microsoft.AspNetCore.Components.Web
{
    // This is what you actually use if you're doing web-based rendering.
    // Mobile Blazor Bindings would have its own equivalent to this that uses native UI elements for the default error UI
    public class ErrorBoundary : ErrorBoundaryBase { ... }
}

Developers can create a custom subclass of ErrorBoundary to define an alternate default error UI. Then will need to recreate the basic logic for showing either the ChildContent or the error UI, e.g.:

@inherits ErrorBoundary

@if (CurrentException is null)
{
    @ChildContent
}
else if (ErrorContent is not null)
{
    @ErrorContent(CurrentException)
}
else
{
    <div class="my-custom-error-ui">Oh no, there was a problem!</div>
}

Custom error boundary classes could include logic like a "retry" button (which calls the base class Recover method) or custom logging logic.

Alternative considered: I have considered having IErrorBoundary be public. Maybe we'll do that eventually. But I'm not keen on taking a gamble that there's no need to impose any standard expectations on what error boundaries should do in response to an error, as it could easily lead to all errors being suppressed and inefficient things like every single component also being an error boundary. The ErrorBoundaryBase is where the important infinite-error-loop detection logic lives, so it's helpful that this will always be in effect.

Semantics

Many aspects of the following are quite subtle and still open for debate.

How do we clean up the failed subtree?

We could rely on the IErrorBoundary component tearing down its child content after an error, thereby triggering all the usual disposal and cleanup logic (invoking Dispose on all components in the subtree, unregistering all their event handlers, notifying the JS client to unregister event listeners, etc.). This is something that ErrorBoundaryBase will do.

However if an IErrorBoundary fails to do that, we'd create a strange situation where "failed" components continue to live and receive events. This shouldn't put the framework state at any risk (as per the principles above) but is quite unexpected. The worst problem with it, I think, is that it would prevent us from ever making this stricter, as people may depend on failed components continuing to operate.

To avoid the risk that one day, we try to make IErrorBoundary more open but don't realise we need to guarantee the subtree must be disposed on error, I propose that when the Renderer will notify an error boundary about an error, it first forcibly discards that entire subtree (whether the IErrorBoundary wants to or not). This will ensure all the cleanup logic occurs in that subtree, independently of whatever content the error boundary will render next.

If the IErrorBoundary tries to re-render its existing ChildContent, that will be fine - it just means you now have an entirely new copy of those components.

Recovery from errors

The default ErrorBoundaryBase component will let the developer call a Recover method which makes it switch back to render its ChildContent instead of any error content. This is what makes possible the following simple and intuitive per-page error handling system:

<ErrorBoundary @ref="errorBoundary">
    @Body
</ErrorBoundary>

@code {
    ErrorBoundary errorBoundary;

    protected override void OnParametersSetAsync()
    {
        errorBoundary?.Recover();
    }
}

The end user can then click on a different page in a nav menu and carry on with their circuit. They can even come back to the same page that previously failed, and will get a fresh copy of it.

What happens if the error boundary itself fails? Can we get infinite error loops?

An IErrorBoundary needs to be able to handle errors that occur directly within its own ChildContent fragment, not just in further-down descendants. For example, you might have:

<ErrorBoundary>Some markup that might throw a nullref</ErrorBoundary>

Do you expect the error boundary to catch the nullref? Of course you do. So we can't have the rule that error boundaries only catch things from their descendants. Error boundaries themselves must continue to operate after an error in their own direct content. That's yet another reason why error boundaries should be special things and not just any old component, since we don't want just any old component to continue operating after it has an exception in its own rendering.

Does this create a risk of infinite error loops? Why yes it does. If ErrorContent throws during rendering, you'll keep re-rendering it. We can't know whether an exception comes from something in ErrorContent (versus ChildContent), because exceptions can occur after asynchronous lifecycle methods or event handlers, and hence might arrive after we've already switched into or out of an error state.

The best way to mitigate this, I think, is a simple "error counting" mechanism for detecting infinite error loops. Hopefully most developers will never encounter this, but if more than N (default = 100, but configurable) exceptions are received in between recoveries, the error boundary will treat the new error as fatal to the circuit. This is helpful because:

  1. It makes it easy to spot basic cases of infinite error loops
  2. It exposes the case where some child component could continue to trigger errors forever, e.g., in response to a timer. This would be super inefficient, so we really do want the developer to notice this and do something about it.

What gets logged, to where, and when?

For web-based rendering, I think this needs to vary across hosting models:

  • WebAssembly and WebView:
    • Immediately on error, we log detailed error info to the console.
    • Option: If the user clicks the default error content, should we log the same info again, so the user can figure out which failure is which (if there were multiple failures at once)? Maybe we don't do that, but developers who really want that behavior can put it in their own ErrorContent or ErrorComponentBase.
  • Server:
    • Immediately on error, we log detailed error info to the server-side ILogger, and DetailedErrors-respecting info to the client-side console
    • Option: If we're supporting the "click-to-relog" thing (see above), then we'd only relog to the browser console (via JS interop, regardless of Server/WebAssembly), because we don't want to spam server logs and confuse an incident investigator into thinking the error happened more than once.
  • Prerendering:
    • Immediately on error, we log detailed error info to the server-side ILogger. There is no client we can log to.
    • The user cannot click anything during prerendering.

I propose a new DI service to supply these per-hosting-model behaviors:

namespace Microsoft.AspNetCore.Components.Web
{
    public interface IErrorBoundaryLogger
    {
        ValueTask LogErrorAsync(Exception exception);
    }
}

Notice that this is specific to Microsoft.AspNetCore.Components.Web. It's not a concept in the core at all.

Default project template

After some consideration, I suggest we do not include any ErrorBoundary in the default template. This is because:

  • Developers may not realize that, if there is one, it means their per-circuit state has to be able to tolerate the user continuing after an exception. Developers should only opt into this where it makes sense for their application.
  • It may be annoying during development for every unhandled exception to remove all the UI from the screen and display some "oh noes" content instead. The developer may not want that as a global behavior across the whole app.
  • It will be nonobvious to a newcomer that there are two different error UIs to customize.

More generally, error boundaries don't make applications more robust. They just make some errors recoverable. The existing default behavior of forcing a circuit to restart / page to reload on exception is already a good behavior that's easy to reason about.

I'd be in favour of documenting how you can do per-page error boundaries as a trivial tweak in MainLayout.razor, but we should include cautions about what this entails about the user being able to continue after any lifecycle exception anywhere.

Errors after disposal

This is subtle! Since exceptions can occur asynchronously in lifecycle methods or event handlers, they might not happen until after the component (or even its enclosing error boundary) has been removed from the UI and disposed. How should we handle the errors then?

  1. We could say that there's no longer an error boundary to route them to, so they are fatal.
    • Bad choice! This means that a normally-handleable error suddenly becomes fatal to the circuit just because the user happened to navigate to a different page while some operation was in flight. This could be really common for anything like HttpClient requests that time out.
  2. We could say that since the component is gone, we don't care about its errors any more, so just ignore them.
    • Also, bad choice! Async errors are currently fatal (i.e., before this feature). If we just started ignoring them, we'd be significantly loosening the whole system by default, even for people who are not using error boundaries.
  3. We could try to behave as if the component/errorboundary had not been disposed.

I think (3) is the only good choice. This is tricky to implement because we have to keep track of more info inside the Renderer than we were doing before. Using only the IComponent Receiver on an event, we have to be able to walk its original ancestry even after disposal. It is possible to do this, fortunately. The result is:

  • If the component (but not the error boundary) is already disposed, its async exceptions trigger the error boundary into an error state in the same way as if the component had not been disposed. So, the error is logged as usual and the user sees an error state. If the developer really wants to totally suppress this, they are going to have to write some code (e.g., try/catch, or an extra layer of error boundary).
  • If the enclosing error boundary is already disposed too, then we still call the OnErrorAsync on it so it can log as usual, but the fact that it tries to switch into an error UI state is irrelevant and has no visible effect, just like any other component trying to render after its was removed from the component hierarchy. This very likely matches what the developer expects and wants.

Default error content

We don't want to bake in any human-readable default error UI because:

  • We can't know whether or not the developer wants to include a stack trace. Even when DetailedErrors is true, you still don't want to show technical details in the UI in production (e.g., on WebAssembly). Developers who really want to see stack traces in development should probably use some logic like #if DEBUG inside their ErrorContent or a custom error boundary subclass - we can't do that for them.
  • It would need to make sense in every language

So instead, the default error UI is simply: <div class="blazor-error-boundary"></div>. The default project template uses CSS to add sensible colors, text, and even a warning icon:

image

Since this is purely in CSS in user code, the framework isn't responsible for the language or any back-compatibility across versions.

@SteveSandersonMS SteveSandersonMS changed the title Error boundaries design proposal Blazor Error boundaries design proposal Mar 17, 2021
@SteveSandersonMS SteveSandersonMS changed the title Blazor Error boundaries design proposal Blazor "Error boundaries" design proposal Mar 17, 2021
@HaoK
Copy link
Member

HaoK commented Mar 17, 2021

Looks great overall, I wonder if there could be a more descriptive name other than AutoReset="true". Its not clear what is automatically getting reset or to what state, not suggesting this as a name, but its going to logically be doing something like ShowChildContentOnRecovery? And what happens if AutoReset is false (and why would someone want to do this, would they have to manually chose what to rerender?)

@ghost ghost added Done This issue has been fixed and removed Working labels Mar 22, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-blazor Includes: Blazor, Razor Components design-proposal This issue represents a design proposal for a different issue, linked in the description Done This issue has been fixed
Projects
None yet
Development

No branches or pull requests

3 participants