Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Starvation: AAD Token Validation #2774

Closed
1 task done
serpentfabric opened this issue Aug 8, 2024 · 6 comments · Fixed by #2870
Closed
1 task done

[Bug] Starvation: AAD Token Validation #2774

serpentfabric opened this issue Aug 8, 2024 · 6 comments · Fixed by #2870
Labels
Bug Product is not functioning as expected P1 More important, prioritize highly
Milestone

Comments

@serpentfabric
Copy link

serpentfabric commented Aug 8, 2024

Which version of Microsoft.IdentityModel are you using?
Note that to get help, you need to run the latest version.

    <PackageVersion Include="IdentityModel" Version="7.0.0" />    
    <PackageVersion Include="Microsoft.Identity.Web" Version="2.20.0" />

Where is the issue?

  • M.IM.Validators

Is this a new or an existing app?
existing

Repro
If you send 200 simultaneous requests to an API when the applicaiton first starts then the application will be unresponsive for up to a minute while we believe it makes requests to AAD to get the metadata required to validate tokens on the request. After "warming up" the issue goes away.

I think I found the cause. And I think I found a workaround. But I reckon it's something worthy of a fix.

Cause

The server is configured to use AAD here:

builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddMicrosoftIdentityWebApi(_ => { }, identity =>
    {
        identity.TenantId = config.Authentication.TenantId;
        identity.ClientId = config.Authentication.ClientId;
        identity.Instance = config.Authentication.Instance;
    });

And when the 200 requests arrive at once, the code in AadIssuerValidator calls GetBaseConfigurationAsync which in turn calls GetConfigurationAsync which in turn fails this condition the following check the first 200 (or however many concurrent requests came in) times:

public virtual async Task<T> GetConfigurationAsync(CancellationToken cancel)
{
    if (_currentConfiguration != null && _syncAfter > DateTimeOffset.UtcNow)
    {
        return _currentConfiguration;
    }

    await _refreshLock.WaitAsync(cancel).ConfigureAwait(false);
    try
    {
        if (_syncAfter <= DateTimeOffset.UtcNow)
        {

and then the remaining 199 requests all queue up waiting for that lock...

workaround

we created a startup probe, that runs before the container accepts traffic, that runs the following code and basically hydrates the metadata using the config options

        var options = _optionsMonitor.Get(JwtBearerDefaults.AuthenticationScheme);
        var metadata = await options.ConfigurationManager!.GetConfigurationAsync(CancellationToken.None);

        var validator = _issuerValidatorFactory.GetAadIssuerValidator(options.Authority!);
        var configManagerPropertyV1 = GetConfigurationManagerPropertyInfo("ConfigurationManagerV1");
        var configManagerPropertyV2 = GetConfigurationManagerPropertyInfo("ConfigurationManagerV2");

        try
        {
            var v1Metadata = await GetConfigurationManagerConfig(configManagerPropertyV1, validator);
            var v2Metadata = await GetConfigurationManagerConfig(configManagerPropertyV2, validator);

            if (metadata.SigningKeys.Any() && v1Metadata.SigningKeys.Any() && v2Metadata.SigningKeys.Any())
            {
                return HealthCheckResult.Healthy();
            }
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy(exception: ex);
        }
        

        return HealthCheckResult.Unhealthy();
}
    /// <summary>
    /// Get oauth metadata by using reflection to access the internal <see cref="ConfigurationManager{OpenIdConnectConfiguration}"/>
    /// in <see cref="AadIssuerValidator"/>.
    /// </summary>
    /// <param name="configManagerProperty"></param>
    /// <param name="validator"></param>
    /// <returns></returns>
    private static async Task<OpenIdConnectConfiguration> GetConfigurationManagerConfig(PropertyInfo? configManagerProperty,
        AadIssuerValidator validator)
    {
        var getter = configManagerProperty!.GetGetMethod(nonPublic: true);
        var configManager = getter!.Invoke(validator, null) as ConfigurationManager<OpenIdConnectConfiguration>;
        return await configManager!.GetConfigurationAsync(CancellationToken.None);
    }

    private static PropertyInfo GetConfigurationManagerPropertyInfo(string propertyName)
    {
        return typeof(AadIssuerValidator).GetProperty(propertyName, BindingFlags.NonPublic | BindingFlags.Instance)!;
    }

Expected behavior
The metadata is already available before the application can receive the requests or it re-uses the recently fetched metadata once subsequent requests get the lock.

Actual behavior
All requests waiting for the lock re-retrieve the token's metadata/signing-keys regardless if they were fetched recently enough

Possible solution
i haven't given a lot of thought to how this system works, so i reckon my workaround isn't ideal, but i think it's enough to inspire the right kind of change?

Additional context / logs / screenshots / links to code
Here you can see the cache size growing for requests when it should just be a single one:
image
image
image

@jennyf19
Copy link
Collaborator

@serpentfabric are you seeing the same behavior on the latest version 8.0.2? In 8.0.2 we are locking on the first call to the metadata endpoint and not on all, which might impact what you are seeing. thanks!

@jennyf19 jennyf19 added More info needed Customer has been asked for more information and removed needs attention untriaged labels Aug 29, 2024
@tom-wolfe
Copy link

We're using 8.0.2 and seeing this same issue. Using all the default settings and refresh intervals, these are the kinds of logs we're seeing from a single container instance:

image

@tustyan-vcc
Copy link

We have the same problem, we see a lot of requests going towards well-known endpoint, even though Automatic Refresh Interval should be 12 hours, and RefreshInterval is 5 minutes. Looks like this happens more when the app gets a lot of incoming traffic.

We use version 8.0.2 of the library:
[15:52:49 DBG] Microsoft.IdentityModel Version: 8.0.2.0. Date 09/19/2024 13:52:49. PII logging is OFF. See https://aka.ms/IdentityModel/PII for details.

One thing we noticed in the codebase is that this line, will always set _syncAfter to 5 minutes, so AutomaticRefreshInteval of 12 hours is only used once after initial retrieval when _currentConfiguration object is still null, and then after first call to UpdateCurrentConfiguration method it will do automatic refresh every 5 minutes

_syncAfter = DateTimeUtil.Add(DateTime.UtcNow, AutomaticRefreshInterval < RefreshInterval ? AutomaticRefreshInterval : RefreshInterval);

Also here are some logs from the app, it is a single call to the endpoint that requires authenticated user, where token is malformed, but as you can see it results in 3 separate calls towards metadata endpoint:

[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/volvocars.onmicrosoft.com/v2.0/.well-known/openid-configuration'.
[15:54:13 INF] IDX10243: Reading issuer signing keys from validation parameters.
[15:54:13 INF] IDX10265: Reading issuer signing keys from configuration.
[15:54:13 INF] IDX10243: Reading issuer signing keys from validation parameters.
[15:54:13 ERR] IDX10511: Signature validation failed. Keys tried: 'Microsoft.IdentityModel.Tokens.X509SecurityKey, KeyId: 'H9nj5AOSswMphg1SFx7jaV-lB9w', InternalId: 'H9nj5AOSswMphg1SFx7jaV-lB9w'. , KeyId: H9nj5AOSswMphg1SFx7jaV-lB9w
'.
Number of keys in TokenValidationParameters: '0'.
Number of keys in Configuration: '8'.
Matched key was in 'Configuration'.
kid: 'H9nj5AOSswMphg1SFx7jaV-lB9w'.
Exceptions caught:
 '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
token: '[PII of type 'Microsoft.IdentityModel.JsonWebTokens.JsonWebToken' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'. See https://aka.ms/IDX10511 for details.
[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/volvocars.onmicrosoft.com/v2.0/.well-known/openid-configuration'.
[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/volvocars.onmicrosoft.com/v2.0/.well-known/openid-configuration'.
[15:54:13 ERR] Inbound Request Error
[15:54:13 DBG] IDX21811: Deserializing the string: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' obtained from metadata endpoint into openIdConnectConfiguration object.
[15:54:13 DBG] IDX21812: Retrieving json web keys from: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/81fa766e-a349-4867-8bf4-ab35e250a08f/discovery/v2.0/keys'.
[15:54:13 DBG] IDX21811: Deserializing the string: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' obtained from metadata endpoint into openIdConnectConfiguration object.
[15:54:13 DBG] IDX21812: Retrieving json web keys from: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/81fa766e-a349-4867-8bf4-ab35e250a08f/discovery/v2.0/keys'.
[15:54:13 DBG] IDX21811: Deserializing the string: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' obtained from metadata endpoint into openIdConnectConfiguration object.
[15:54:13 DBG] IDX21812: Retrieving json web keys from: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX20805: Obtaining information from metadata endpoint: 'https://login.microsoftonline.com/81fa766e-a349-4867-8bf4-ab35e250a08f/discovery/v2.0/keys'.
[15:54:13 DBG] IDX21813: Deserializing json web keys: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX10806: Deserializing json: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' into 'Microsoft.IdentityModel.Tokens.JsonWebKeySet'.
[15:54:13 DBG] IDX21813: Deserializing json web keys: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX10806: Deserializing json: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' into 'Microsoft.IdentityModel.Tokens.JsonWebKeySet'.
[15:54:13 DBG] IDX21813: Deserializing json web keys: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]'.
[15:54:13 DBG] IDX10806: Deserializing json: '[PII of type 'System.String' is hidden. For more details, see https://aka.ms/IdentityModel/PII.]' into 'Microsoft.IdentityModel.Tokens.JsonWebKeySet'.

@tustyan-vcc
Copy link

tustyan-vcc commented Sep 20, 2024

And this line of code seems to be incorrect

if (Interlocked.CompareExchange(ref _configurationRetrieverState, ConfigurationRetrieverIdle, ConfigurationRetrieverRunning) != ConfigurationRetrieverRunning)

According to this comparand in on position 3, and value to set is on position 2, it will compare current state with running, and replace it with idle when they are equal. Probably is should be the other way around?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked.compareexchange?view=net-8.0#system-threading-interlocked-compareexchange(system-int32@-system-int32-system-int32)
Here is the outcome of it:

_configurationRetrieverState initial value _configurationRetrieverState final value will update
Idle Idle true
Running Idle false

If case seems correct, since if current state is idle, it will run the update, and if it is running, it will not, but this code will never set it to running, and there seem to be no other place that will set _configurationRetrieverState flag to running in the code base.

@andrewhickman-aveva
Copy link

We found that after updating to version 8.0 of Microsoft.AspNetCore.Authentication.JwtBearer, we got a lot more configuration refresh requests from this line:

This occurs when receiving a token with an invalid signing key or issuer.

Prior to dotnet/aspnetcore#49541, ConfigurationManager was always null here, and this issue did not occur

@pmaytak pmaytak added this to the 8.1.1 milestone Oct 5, 2024
@pmaytak pmaytak added P1 More important, prioritize highly Bug Product is not functioning as expected and removed More info needed Customer has been asked for more information labels Oct 5, 2024
@pmaytak pmaytak closed this as completed Oct 5, 2024
@pmaytak
Copy link
Contributor

pmaytak commented Oct 5, 2024

The fix for this has been released in 8.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Product is not functioning as expected P1 More important, prioritize highly
Projects
None yet
6 participants