-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(DataStore): retry initial sync network failures from RemoteSyncEngine #1773
Conversation
a230ac3
to
9161e3f
Compare
@@ -158,10 +158,16 @@ final class AWSInitialSyncOrchestrator: InitialSyncOrchestrator { | |||
return .successfulVoid | |||
} | |||
|
|||
var underlyingError: Error? | |||
if syncErrors.contains(where: isNetworkError) { | |||
underlyingError = getFirstUnderlyingNetworkError(errors: syncErrors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any potential side effects on using the first network error? (ie. can different network errors have different retry policies?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use .first(where predicate:)
and check if syncErrors
has any network error and grab the first in a single operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is not ideal to have RemoteSyncEngine contain URLError based retry logic, and this is why it's forcing us to propagate one of the network errors up to the RemoteSyncEngine. Ideally, InitialSyncOperation should be responsible for checking the network error, determining that the error is retryable, and retry with the exponential back-off. This means it will cycle on the single InitialSyncOperation trying to sync that one model. Once the network is back up, then the operation will finish, and move onto the next InitialSyncOperation.
This code current assumes that the first network error is probably similar to other network errors if they were run approximately the same time. Of course this assumption could be wrong, but the side effect of it being wrong and pick up a network error that RemoteSyncEngine decides is non-retryable is that the sync engine will move to an errored state, as with all other non-retryable errors.
The retry policy i believe is determined by the implementation in RequestRetryablePolicy
which filters down to
case .notConnectedToInternet,
.dnsLookupFailed,
.cannotConnectToHost,
.cannotFindHost,
.timedOut:
@@ -219,4 +225,39 @@ extension AWSInitialSyncOrchestrator { | |||
} | |||
return false | |||
} | |||
|
|||
private func isNetworkError(_ error: DataStoreError) -> Bool { | |||
guard case let .sync(_, _, underlyingError) = error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check required if we are only looking for network error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required because the InitialSyncOrchestrator wraps the error from the operations in a DataStoreError.sync
at line 120, stores in syncErrors
, and moves onto the next operation. Comes to think of it, i'm not really sure why it needs to be this way, since syncErrors
is just an array of [DataStoreError]
. But as it is currently, it needs to unwrap the .sync
case like it's done above in isUnauthorizedError
if case .failure(let dataStoreError) = result {
let syncError = DataStoreError.sync(
"An error occurred syncing \(modelSchema.name)",
"",
dataStoreError)
self.syncErrors.append(syncError)
|
||
private func getFirstUnderlyingNetworkError(errors: [DataStoreError]) -> Error? { | ||
for error in errors { | ||
guard case let .sync(_, _, underlyingError) = error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as previous
2e581ac
to
79532fe
Compare
Codecov Report
@@ Coverage Diff @@
## main #1773 +/- ##
==========================================
- Coverage 59.23% 59.18% -0.06%
==========================================
Files 716 716
Lines 21916 21943 +27
==========================================
+ Hits 12982 12987 +5
- Misses 8934 8956 +22
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
if syncErrors.contains(where: isNetworkError) { | ||
underlyingError = getFirstUnderlyingNetworkError(errors: syncErrors) | ||
if let error = syncErrors.first(where: isNetworkError(_:)) { | ||
underlyingError = getFirstUnderlyingNetworkError(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getFirstUnderlyingNetworkError
-> getUnderlyingNetworkError
Description:
This PR fixes RemoteSyncEngine to accurately determine whether it should retry based on the URLError. Previously, when errors were handled, they were ignored and a hardcoded
URLError.notConnected
is passed to theretryRequestAdvice
method to get the advice to retry. This is a bug when the error is a non-retryable error, since the RemoteSyncEngine's retry attempt will immediately fall into the same errored state. For example, data decoding issues which should be non-retryable but the bug caused the sync engine to restart a number of times up to the limit determine by the algorithm in RequestRetryablePolicy.Changes made
URLError
and start checking against a real URLError, retrieved from the underlying error. All components, specifically the sync and subscription components, are responsible for populating the underlying error that it receives fromAmplify.API
.DataStoreError.sync
error with the URLError as the underlying error.Testing Initial Sync - AWSInitialSyncOrchestrator
An end-to-end flow is when the component propagates the URLError back up to RemoteSyncEngine and RemoteSyncEngine accurately determines that it should retry.
query(lastSyncTime:, nextToken:)
Testing data decoding issue
Subscriptions
Amplify.API.subscription API will use the AppSyncRealTimeClient library for subscriptions. When there is a network status changes, it will internally disconnect and reconnect with exponential back-ff. From Amplify.API's perspective, it doesn't receive an error until all retry attempts have been exhausted, and the error is an
.operationError
, not a URLError. See aws-amplify/aws-appsync-realtime-client-ios#58 for more detailsSync to Cloud Mutations
OutgoingMutationQueue will create a SyncMutationtoCloudOperation for the mutation event, which contains retry logic for network failures. See #1722 for more details
Check points: (check or cross out if not relevant)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.