You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a SQS message processing service that runs 10 processing threads. All 10 of those threads have gotten deadlocked on calls to aws-api. A message processing thread makes calls to our customers' AWS accounts using IAM AssumeRole. Each customer has a different IAM Role ARN and ExternalId which results in the aws-api calls using different CredentialProvider. As stated before, these calls occur in parallel.
If you have 4 or more CredentialsProviders running fetch at the same time, you will hit a deadlock.
The fetch-creds calls go through the async-fetch-pool which is size 4. If you launch N threads to run the above flow, where N >= async-fetch-pool size, you will get a deadlock. This occurs because the inner fetch-creds call is sitting in the async-fetch-pool's queue. The queue will never get processed since the executor is waiting on all the outer AssumeRole fetch-creds calls to finish.
The text was updated successfully, but these errors were encountered:
#130 attempted to fix a deadlock issue while fetching credentials for aws-api calls. This fix does not work in the general case.
Dependencies
Description with failing test case
We have a SQS message processing service that runs 10 processing threads. All 10 of those threads have gotten deadlocked on calls to aws-api. A message processing thread makes calls to our customers' AWS accounts using IAM AssumeRole. Each customer has a different IAM Role ARN and ExternalId which results in the aws-api calls using different
CredentialProvider
. As stated before, these calls occur in parallel.If you have 4 or more
CredentialsProvider
s runningfetch
at the same time, you will hit a deadlock.I created a repro here.
I think the problem is the same as #130. Its fix only made it less likely to occur. In my repro, this is what is happening in each thread.
The fetch-creds calls go through the async-fetch-pool which is size 4. If you launch N threads to run the above flow, where N >= async-fetch-pool size, you will get a deadlock. This occurs because the inner fetch-creds call is sitting in the async-fetch-pool's queue. The queue will never get processed since the executor is waiting on all the outer AssumeRole fetch-creds calls to finish.
The text was updated successfully, but these errors were encountered: