Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: InvalidOperationException thrown in QueryAliasIndex when ISession is still in "connecting" state #16069

Open
ApacheTech opened this issue May 16, 2024 · 25 comments
Labels
Milestone

Comments

@ApacheTech
Copy link

Describe the bug

When loading multiple pages of a site quickly in multiple tabs, a InvalidOperationException exception is thrown, due to calling the IOrchardHelper.GetContentItemByAliasAsync() method.

InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is connecting.

    System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke()
    System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, object state)
    System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, object state)
    System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref Task currentTaskSlot, Thread threadPoolThread)
    Dapper.SqlMapper.QueryAsync<T>(IDbConnection cnn, Type effectiveType, CommandDefinition command) in SqlMapper.Async.cs
    YesSql.Store.ProduceAsync<T, TState>(WorkerQueryKey key, Func<TState, Task<T>> work, TState state)
    YesSql.Services.DefaultQuery+Query<T>.FirstOrDefaultImpl()
    YesSql.Services.DefaultQuery+Query<T>.FirstOrDefaultImpl()
    AliasPartRazorHelperExtensions.GetContentItemIdByAliasAsync(IOrchardHelper orchardHelper, string alias)
    AliasPartRazorHelperExtensions.GetContentItemByAliasAsync(IOrchardHelper orchardHelper, string alias, bool latest)
    Data8.Website.ApiDocs.Components.Abstractions.MenuContentItemComponentBase.OnInitializedAsync() in MenuContentItemComponentBase.cs
    30. ContentItem = await OrchardHelper.GetContentItemByAliasAsync(Alias);

To Reproduce

My specific case uses a Blazor8 de-coupled site, global InteractiveServer, and uses GetContentItemByAliasAsync for menus, and routing.

Steps to reproduce the behaviour:

  1. Within a site that uses GetContentItemByAliasAsync for anything such as routing, menus, etc.
  2. Create a page with a link to a page that uses GetContentItemByAliasAsync.
  3. Within your browser, use "middle-click" to open several tabs using that link. My specific case uses a menu, and opening multiple tabs with middle-click, one for each item of the menu.
  4. On some tabs, you will see an Internal Server Error, with a stack-trace similar to above.

Expected behaviour

Each tab should load correctly, waiting for the YesSql session to be open and available, before executing the query.

Screenshots

image

image

Possible Solution

This method would likely need to be fleshed out to use a retry policy, or wait for the session to be open and available, before querying the database:

https://github.com/OrchardCMS/OrchardCore/blob/main/src/OrchardCore.Modules/OrchardCore.Alias/Services/AliasPartContentHandleProvider.cs#L37

Copy link
Contributor

Thank you for submitting your first issue, awesome! 🚀 We're thrilled to receive your input. If you haven't completed the template yet, please take a moment to do so. This ensures that we fully understand your feature request or bug report. A core team member will review your issue and get back to you.

If you like Orchard Core, please star our repo and join our community channels

@Piedone
Copy link
Member

Piedone commented May 16, 2024

Hmm, curious. Doesn't this look like some YesSql issue @sebastienros?

Which OC version are you using?

@ApacheTech
Copy link
Author

OrchardCore 1.8.3

@Piedone
Copy link
Member

Piedone commented May 16, 2024

Are you using SQL Server? And what's the hosting environment, SQL Azure?

@ApacheTech
Copy link
Author

Sorry, I should have said within my initial report.

Azure SQL Server, authenticating using Azure AD. Connection timeout is 30 seconds. MultipleActiveResultSets is set to true.

I'm currently running the site on IIS Express, on localhost, in Debug mode (Press Play in VS). But the same issue happens with our internal staging site hosted on IIS, on-prem, in Development environment, and our canary site, hosted on Azure AS, in Production environment.

Same issue, same symptoms, same steps.

The only change I've made is to update the Npgsql transitive package to a non-vulnerable version. Our sites are specced to ISO127001, so we promote vulnerable fourth-party packages, and update them, where possible. I have tried downgrading back to the minimum version for OrchardCore, and the same problem persists.

I'm wondering if it's a clash with Blazor; as a "session" in Blazor is the scope of the SignalR circuit, which is per-user, per-tab.

@ApacheTech
Copy link
Author

ApacheTech commented May 16, 2024

The GetContentItemByAliasAsync method is being called like this, within MenuContentItemComponentBase, from the screenshot above:

    protected override async Task OnInitializedAsync()
    {
        ContentItem = await Orchard.Content.GetContentItemByAliasAsync(Alias);
        if (ContentItem is null) return;
        var items = await ParseChildItemsAsync(ContentItem);
        MenuItems = items?.ToList();
        IsLoading = false;
    }

Orchard.Content is the injected IOrchardHelper class.

I've tried adding a Polly retry policy, but it still throws the exception, and either ignores, bypases, or otherwise breaks out of the policy, to display the same screen as in the screenshot:

    protected override async Task OnInitializedAsync()
    {
        await Policy
            .Handle<InvalidOperationException>()
            .WaitAndRetry(3, _ => TimeSpan.FromMilliseconds(500))
            .Execute(async () => ContentItem = await Orchard.Content.GetContentItemByAliasAsync(Alias));

        if (ContentItem is null) return;
        var items = await ParseChildItemsAsync(ContentItem);
        MenuItems = items?.ToList();
        IsLoading = false;
    }

An ErrorBoundary will catch the error, but there's not really an easy way to recover from it.

@Piedone
Copy link
Member

Piedone commented May 16, 2024

Thanks for the details!

This is not a solution, but can help in the interim: did you try SQL-level retries in the style of ConnectRetryCount=15;ConnectRetryInterval=5; in the connstring?

Also, with 1.8.x you shouldn't need to use MARS. That's only required if the app issues multiple queries in parallel with the same YesSql ISession, which is unsupported and shouldn't happen in OC's code, but if you do this in custom code then you'll need to remove the parallelization.

@ApacheTech
Copy link
Author

I set MARS to true because it complained when I first set up the site (on 1.18.2). I run multiple queries concurrently, but not specifically in parallel. i.e. The page is made of multiple components; the header grabs the header-menu, and site name, the footer grabs the gutter-menu, the sidebar grabs the header-menu. The main body of the page grabs the content item with the alias equal to the current page URI. Every page grabs the site name for the document title. But none of these things are done by way of Parallel, only by async/await.

Adding ConnectRetryCount=15;ConnectRetryInterval=5; does not resolve the issue.

@ApacheTech
Copy link
Author

ApacheTech commented May 16, 2024

I have a suspicion that there is a YesSql connection service somewhere set to Singleton, and then a Scoped service is consuming that connection. Likely, that Dapper connection you can see in the stack trace.

When I open multiple tabs, it is the older tabs that end up in error, not the new ones. If it was just congestion, or queuing, you would expect the new tabs performance to degrade overtime.

Tab 1 starts loading, initialises a connection. That connection gets cached as a static flyweight/singleton.
Tab 2 starts loading, initialises a connection. That connection gets cached as the same singleton.
Tab 3 starts loading, initialises a connection. That connection gets cached as the same singleton.
Tab 4 starts loading, initialises a connection. That connection gets cached as the same singleton.

At this point, Tab 1 has a cached connection that it expects to be open, and available, but it's actually only just been initialised by tab 4, and is still connecting, so it throws the expection. Tab 4 knows to wait until the connection is stable, and then renders the page as expected.

That would seem to make sense, in this scenario.

Maybe a case for AsyncLocal<T> to store the cached connection?

@ApacheTech
Copy link
Author

I have tried disabling MARS; it didn't make any difference.

@ApacheTech
Copy link
Author

I've gone through the application on a StateHasChanged purge, and I've turned off pre-rendering, for now.

I've been able to uncover a different stack trace that seems to say that it is a race condition with the Dapper connection.

One scope is caching its connection, while another scope is trying to use that same cache. What I can't tell at the moment, is who is the onus on to ensure scope safety with this? Is it the app-dev (me), the framework (OC), or the data access (YesSql/Dapper)?

[2024-05-16T15:17:40.280Z] Error: System.InvalidOperationException: BeginExecuteReader requires an open and available Connection. The connection's current state is connecting.
   at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__195_0(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 434
   at YesSql.Store.ProduceAsync[T,TState](WorkerQueryKey key, Func`2 work, TState state)
   at YesSql.Services.DefaultQuery.Query`1.FirstOrDefaultImpl()
   at YesSql.Services.DefaultQuery.Query`1.FirstOrDefaultImpl()
   at AliasPartRazorHelperExtensions.GetContentItemIdByAliasAsync(IOrchardHelper orchardHelper, String alias)
   at AliasPartRazorHelperExtensions.GetContentItemByAliasAsync(IOrchardHelper orchardHelper, String alias, Boolean latest)
   at Data8.Website.ApiDocs.Components.Abstractions.MenuContentItemComponentBase.OnInitializedAsync() in C:\dev\prop\Data8.Website\src\Data8.Website.ApiDocs\Components\Abstractions\MenuContentItemComponentBase.cs:line 30
   at Microsoft.AspNetCore.Components.ComponentBase.RunInitAndSetParametersAsync()
   at Microsoft.AspNetCore.Components.RenderTree.Renderer.GetErrorHandledTask(Task taskToHandle, ComponentState owningComponentState)

@MikeAlhayek
Copy link
Member

@ApacheTech it could be something in the framework.

Are you by any chance able to upgrade to 2.0? With 2.0 and the YesSql upgrade, there is an option to detect occurrent issue. If you are able to upgrade, you can configure YesSqlOptions by setting EnableThreadSafetyChecks to true.

@ApacheTech
Copy link
Author

I thought OrchardCore only went to 1.8.3 at the moment?

After further digging, this is a clash between the Blazor component lifecycle, and whatever Dapper/Yessql/OC is doing behind the scenes. If I push all calls to GetContentItemByAliasAsync into OnIntializedAsync, instead of OnParametersSetAsync, then I don't seem to get the errors... but it also breaks the component. OnInitializedAsync is only called once per component, per circuit; you need the logic within OnParametersSetAsync to be able to handle the dynamic routing.

@MikeAlhayek
Copy link
Member

I thought OrchardCore only went to 1.8.3 at the moment?

1.8.3 is production ready. 2.0 "which will be released soon", is available using the preview feed.

https://docs.orchardcore.net/en/latest/getting-started/preview-package-source/

You can review the release notes and address all the breaking changes
https://docs.orchardcore.net/en/latest/releases/2.0.0/

@ApacheTech
Copy link
Author

Well.... that's my Friday sorted.

Our GoLive is supposed to be around mid-June, early July. Is there a proposed release date for 2.0?

@Piedone
Copy link
Member

Piedone commented May 16, 2024

There's no planned date but maybe mid-June looks realistic.

We'll release once all 2.0 issues are fixed and (due to the huge fundamental changes with migrating to System.Text.Json) we've done sufficient testing with some Lombiq projects (done for the OSS ones, just some last polishing: Lombiq/Open-Source-Orchard-Core-Extensions#692) and Orchard Core Commerce (WIP: OrchardCMS/OrchardCore.Commerce#442). Perhaps also DotNest.

If the open bugs don't affect you, and you don't mind potentially finding new ones, then you can use the preview packages right now. Then, upgrading to the actual released 2.0 version will be seamless for you (unless we add new breaking changes :)).

@sebastienros
Copy link
Member

You can already try with the main branch or the nuget packages that ship nightly, no need to wait.

@sebastienros
Copy link
Member

MenuContentItemComponentBase.OnInitializedAsync() seems to be calling GetContentItemIdByAliasAsync, correct?

In this case please add a lock on this method and tell us if that fixes the issue. If that is the case then your component is invoked by multiple threads which is not supported by YesSql (or dapper, or anything with transactions/connections). You would probably need to use some locking/semaphore, or find another way to do that, or fix the component.

@sebastienros sebastienros added this to the backlog milestone May 16, 2024
@ApacheTech
Copy link
Author

MenuContentItemComponentBase.OnInitializedAsync() seems to be calling GetContentItemIdByAliasAsync, correct?

In this case please add a lock on this method and tell us if that fixes the issue. If that is the case then your component is invoked by multiple threads which is not supported by YesSql (or dapper, or anything with transactions/connections). You would probably need to use some locking/semaphore, or find another way to do that, or fix the component.

Using a SemaphoreSlim(1, 1), it does seem to resolve this issue.

Would this be just a quirk of using OrchardCore with Blazor, or is it something that can be addressed at a lower level? If it is something for the app-dev to be aware of, that's fine. I'm seeing more and more people wanting to use Blazor with OrchardCore, so this is something good to know, and pass on to new devs dipping their toes in the water.

@MikeAlhayek
Copy link
Member

@ApacheTech we welcome any PR that would improve things for Blazor. If you have an improvement, please submit a PR. I think @ns8482e and @agriffard worked with Blazor and OrchardCore.

@Piedone
Copy link
Member

Piedone commented May 16, 2024

And @Skrypt.

@ApacheTech
Copy link
Author

Was a fix for this ever added to 2.x?

@Piedone
Copy link
Member

Piedone commented Oct 23, 2024

No.

@ApacheTech
Copy link
Author

Thank you for your insightful, and detailed answer.

What is the expected ETA?

@Piedone
Copy link
Member

Piedone commented Oct 23, 2024

Please look at the conversation above, and under your PR. This isn't actionable and thus won't be worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants