fix(datastore): retry on subscription connection error #2571
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
#2152
main
PR fix(datastore): retry on subscription connection error #2571 (This PR)v1
PR fix(datastore-v1): retry on subscription connection error #2581Description of changes:
Once DataStore is in an active state, then the network goes down, then recovers, the subscription websocket will send an error on network recovery. This error propagated back to API/DataStore which handles the error by moving DataStore to a stopped state. It is not processed as retryable. This delays the recoverability of DataStore until the next DataStore operation is called, since it remains in a stopped state. The scenario is reproducible by running an iOS app on a device or running a macOS app on the macbook. Running on an iOS simulator does not have the same effect and the websockets do not disconnect, rather the subscription event is received with a delay (~5-10 seconds) after network recovers.
The AppSyncRealTime websocket client, used by Amplify.API (API plugin) does have its own retry logic on classification of errors coming from the underlying websocket, but in certain scenarios, whether it's an unclassifed error or retry has been exhausted, will eventually propagate a
ConnectionProviderError.connection
error back to the caller and close the websocket connection. In this case, the API Plugin receives this terminating error and sends backAPI.operationError
to DataStore.DataStore’s sync engine used to be much more aggressive in retrying the sync process. This caused retry storms on AppSync and was fixed in #1901 It was a code bug that retried on all errors, regardless of the underlying error. By fixing this, it has created another issue where in this scenario where retry would have fixed the problem, was not initiated until the next explicit DataStore operation (start/save/query/delete/etc..) was called, because it has transitioned to the stopped state. When the network is turned back on, DataStore gets the
API.operationError
and stops. We were able to observe this on multiple attempts, #2152 (comment) and #2152 (comment).ConnectionProvider.connection
errors from AppSyncRealTimeClient as anAPIError.networkError
. ConvertConnectionProvider.connection
to some Foundation’s URLError case that makes sense likeURLError.networkConnectionLost
APIError
and extract out the URLError when it is aAPIError.networkError
as an indication to retry the sync process. It passes the URLError to RequestRetryablePolicy, which now classifies.networkConnectionLost
as retryable.With this change, when the wifi is turned back on from off, DataStore receives the websocket error and restarts the sync process, and transitions to the active state successfully.
Check points: (check or cross out if not relevant)
DataStore checkpoints (check when completed)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.