-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 5.0 | Implement SqlException.IsTransient #649
Comments
Please consider adding some of these that EF Core's SqlServerTransientExceptionDetector does not recognize:
Also, if SqlClient does not recognize the specific error code, perhaps it can still treat some of the Database Engine Error Severities as transient; like 13 (deadlock) or 17 (out of resources). |
EF Core's SqlServerTransientExceptionDetector returns |
@KalleOlaviNiemitalo that's great, thanks for this info! Your suggestion below also makes sense to me - if there's any non-transient error in the SqlException, that means there's no use to retry since that error will happen again, so we may as well return false from IsTransient. /cc @ajcvickers for the error codes missing in the EF Core SQL Server provider. |
To be clear, I do not suggest returning For example, if an SqlException contains the following two errors, then SqlException.IsTransient would be
|
I tend to see it the other way - if there's a chance that an error could be indicating a transient problem, then that's a good reason for surfacing it as transient. From this discussion on MySQL:
See also the note in the IsTransient proposal about the property being "optimistic". So I'd say, if there's a single error in the exception which is known to not be transient by nature, the entire exception should report IsTransient=false; otherwise, it should return true. A pragmatic approach may make sense here - we should keep in mind that the use of this is to trigger retrying by an upper layer. |
In that case, SqlException won't need a list of transient error numbers at all, as it will only recognize errors that are "known to not be transient by nature". |
At least for PostgreSQL (which I maintain), the (vast) majority of errors are still non-transient, so it seems to make sense to have a list for errors that are known to be (possibly) transient. That may not be the case for SQL Server, though I'd be surprised. |
@AndriySvyryd has traditionally been the steward of which codes we treat as transient. |
I agree with this too! In the initial phase, the list of transient errors that the driver would recognize as "transient" is driven by Azure's documented error numbers: Transient Fault Error Codes as currently SqlClient retries on. The error messages documented above are good recommendations, but as client drivers work close to SQL Server/Azure Standards, only errors listed "transient" by server are handled implicitly in SqlClient. ORM frameworks like EF Core posses the flexibility to implement their own retry policy based on experiences and application errors, as it's an external extender. We will take them into discussion when we work on this activity internally in order to provide best experience to customers, but we also would need to abide by driver policies to reflect server-side transient policies. |
It doesn't sound like there is consensus on what a transient exception is and if that's the case adding the feature without a way to configure it sounds like setting ourselves up for years of issues of people saying one thing or another should be transient. To avoid that could we add something at a process level (so at provider level in the api) which lets users add whatever we end up needing to determine if an error is transient? If other providers end up needing the same functionality the implementations could be promoted to a common api in .net 6. |
I want to sidestep the question of what should be considered a transient error for now to focus on the biggest advantage of this API. In the past SQL Azure has changed the list of transient errors requiring changes to the application, EF Core or SqlClient to change the retry policy accordingly. If SqlClient implements A related useful API would be a |
But if we delegate to the server to decide if something is transient how do we know if inability to connect to the server is transient... 😀 |
I think that's giving up a bit quickly here; this is definitely something we need to discuss thoroughly in order to arrive at a good definition, but I think that's certainly possible. I don't believe transience is a concept that really varies that much, and pragmatically speaking I think most applications are going to have the same needs around this. EF Core's transient error detector could be considered a pretty good proof of this; the built-in retrying strategy identifies the codes it identifies, and we've rarely had users arguing for a different concept of transience. What we have seen, as @AndriySvyryd wrote, is changes in actual error codes. Note also that the EF Core detector has been copied and pasted for use outside of EF Core (e.g. https://gist.github.com/hyrmn/ce124e9b1f50dbf9d241390ebc8f6df3#file-sqlservertransientexceptiondetector-cs), which is exactly why I think this kind of knowledge should go into SqlClient.
The whole point of this API is for the DB provider to report to an upper layer (Polly, EF Core, LLBLGen Pro) what it considers to be transient for the general, common case. If an application wants to customize which errors are considered transient, it can simply do that at that upper layer, i.e. defining a Polly/EF Core retrying strategy that does whatever the user wants. In other words, I don't think it makes any sense for users to call a SqlClient API which changes what SqlClient reports in IsTransient, when that IsTransient API is itself called by the user (via Polly/EF Core). If you want to control what's considered transient and what isn't, that's totally fine - you can just customize your Polly/EF Core policy and leave SqlClient out of it. I also very much agree with @AndriySvyryd on the problem of changing error codes in SQL Azure. |
Ok, |
I just want to add it is very frustrating to have an exception saying "An exception has been raised that is likely due to a transient failure.", a documented |
Have you considered trying to add the feature? the codebase is in the repository. |
My concern, based on the status of the current PR, is that it's going to be a lot harder to maintain this list in SqlClient than in EF Core :( |
What about false positive results? This SQL also produces error code 121, but it's a syntax error:
I've seen many implementations which contains error code 121, but no one cares about it. I think it also affects error codes 64, 121, 233, 10053, 10054 and 10060 Checking the inner Win32Exception can be a solution, but also not perfect in the case of multiple errors. |
dotnet/runtime#34817 introduces DbException.IsTransient for .NET 5.0, database-agnostic transient error detection. Note that this is orthogonal to SqlClient's own resilient connection mechanism - IsTransient is only about surfacing error transience to external retry strategies/policies such as Polly.
Note that EF Core's SQL Server provider has a transient exception detector, this could be a starting point for implementing this within SqlClient. The goal would eventually be for EF Core to rely on the new IsTransient property rather than contain specific knowledge on SqlClient error codes.
The text was updated successfully, but these errors were encountered: