-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior when "ARR affinity" cookies is disabled #11501
Comments
To keep in sync tenant states accross instances you have to enable the Distributed Tenants feature |
@jtkech thank you! I enabled the "Distributed Tenants" on the default tenant where the SaaS recipe is enabled, but that still isn't working as expected. I added 2 tenants in addition to the default one. But, sometime when I refresh the Tenants page I see only 2 and sometimes I see all 3 and other time I see only the default tenant. So something isn't right where each instance is reading from a different source. I could not even enabled the "Distributed Tenants" while the ARR cookie disabled. I had to turn it back on and then enable the "Distributed Tenants" feature. It's like each instance is caching it's own Do I need to also enable other distributed cache features like |
Yes, you need a stateless config as most as possible, also for tenant settings/configs. I saw that you use
Not required but good to use it.
No, we don't use it internally, but you could if you use it explicitly.
We use the distributed cache for different documents as site settings, templates, layers, and so on, all that uses the Hmm, after enabling the Distributed Tenants feature, you need to stop all your instances and then restart. |
@jtkech That seems to have worked. Here is what I did
|
@CrestApps Good to know, thanks! Yes for sure, forgot to say that to make a given Redis feature working, it should be enable on each tenant, can be done by using setup recipes including it, or as you did from the app, can be also done from configuration. |
@jtkech Thank you! I am using I am actually experiencing an issue when setting up a new tenant. When I click on "Setup" on a new tenant, the request process for few seconds and then it shows blank while screen and the site never gets setup. Unfortunately, I can seem to be able to view the log stream in Azure Web Service to see what could be erroring. When I scale down to a single instance, the setup works. Not sure how enable logging so logs from all instances is written to the log stream. Also, not sure what else do I need to do to setup new tenant. The post request on setting up the new client return HTTP Code 400. This only happens when I am using multiple nodes not when I am using a single instance. |
For the logs would need a shared App_Data folder under which you could see them, or the ability to connect to a given VM to see their local files. You could try to include the related service in all recipes that you are using to setup tenants. But normally for the tenant states themselves you only need to have Redis Cache and Lock on the Default tenant of each instance, having the Cache on other tenants is still important to keep in sync other document data accross instances. Hmm, is it working if you re-enable ARR’s Instance Affinity, one problem I see is that when a tenant was created we show a link that contains a secret token in the query string, based on the secret of the tenant specific settings
Maybe we have to wait a little so that other instances are in sync with these new tenant specific settings, or maybe the generated token is only known by the instance that rendered the Tenant Index page, or not yet shared through the Redis DataProtection
|
@jtkech |
Yes I agree, normally that's the goal of the Redis DataProtection
I don't think so, currently they reload data when a tenant state was changed. Here the code where we check from the distributed cache if tenants were created, in which case we load data from shared sources. OrchardCore/src/OrchardCore/OrchardCore/Shell/Distributed/DistributedShellHostedService.cs Lines 139 to 152 in d187e85
Hmm, seems that here the only problem is around the DataProtection of the Default tenant on each instance, we use the oob aspnetcore Redis DataProtection. Can you retry it and wait a little after having created a new tenant, also try to clear all your browser cookies. |
@jtkech Thank you for the feedback. So here is what I have done
|
Sorry, would need to try it but quite busy, hmm and with a paid plan to scale out to multiples instances ;) Or still use your plan for testing and if you give me some accesses I could try to debug it when I will have time, as I remember there are Kudu tools but that I didn't use for a while. Anyway I'm interested to understand what happens, are you using OrchardCore as packages? Would be good to use the source code for testing, e.g. in the setup controller we could comment out where it checks the secret token and see what happens, and so on step by step. Can you already check what is stored in your shared containers for shell settings/configs, can you find some data protection keys in your Redis instance. When you are logged in to go to the Admin, do you need to login again e.g. when the request is sent to another instance. In the meantime for info, looking at the code I saw that here we always use the data protection provider of the Default tenant, to generate the token on a given instance, and to check it on a given instance when setting up any tenant. The secret token is valid 24 hours, if it doesn't match we return a 404, if not authorized we return 403. |
In #11526 we saw that the antiforgery cookie name depends on the ContentRootPath that may be different for the same app but accross different instances. |
Because of the following I'm re-opening this issue. Okay, with a local single K8s cluster node I installed locally the same kind of configuration, one redis pod, one mssql pod and for now 2 OC pods, and I used The "good new" is that I can repro locally the exact same behavior, so I will be able to investigate without having to subscribe for a paid plan on Azure during all the tests ;) The first test I did is to decorate the Yes, maybe because when a tenant is not yet initialized it still uses at the very beginning some host level registrations, I will look at it more in details when I will have time, but for now not so easy to debug in pod containers with tools that I haven't used for a year ;) OrchardCore/src/OrchardCore/OrchardCore/Modules/Extensions/ServiceCollectionExtensions.cs Lines 306 to 319 in 4ff4e4b
|
I am glad you are able to re-produce this issue locally! This is very good first step! Thanks for recalling this issue |
|
@jtkech I think the problem goes beyond the setup page. The issue happens with any form submitting. Did you try to perform other actions like creating/editing content? |
But now we don't have the same config, for the setup concern I added the redis DP (good to try if you were using azure DP) to the We now use unique cookie names, maybe not yet the case for you, also I needed another update, at some point the settings VersionId was null, leading up to an Just setup 10 tenants (from different opened browsers) without any issues, and then edited / published content items in different tenants, all is working fine on my side. To try it on your side you will have to wait for the tenant level ATF updates that I mentioned, I will do a PR soon. |
@jtkech, @ns8482e seems to be experiencing similar issue to this one but he is running on Kubernetes cluster not on AppService. An example of the issue is when he enables a feature an alert stating that the feature was enabled will be displayed but the status of the feature will show disabled when the page is loaded "post request". After few refreshes he gets the correct state of the feature. It's like the user during the post request is handled by a different node on the cluster that isn't aware of the shell reload. Do we store the IoC container in a distributed cache? Meaning, if the shell is released do we build one container for all of the nodes? I am guessing not. If not, when we release shell, do we have a way ensure that all the nodes in a cluster are released? I know you worked on a PR that had to do with clustering OC internally. Not sure what is the status in that PR. |
@MikeAlhayek did you enable any cache/reload synchronization feature like Redis ones? Have you setup a way to share data protection keys too? |
@sebastienros yes. He is using a shared directory for protection keys, media and shell host. He is using Redis for distributed caching and using the following features "Redis", "Redis Bus", "Redis Cache", "Redis DataProtection", "Redis Lock" I think when a shell-release happens, nodes on the same cluster are not aware of this important event which could be causing the disconnect. |
To keep in sync tenants states we have the |
Description = "Keeps in sync tenants states, needs a distributed cache e.g. 'Redis Cache' and a stateless configuration, see: https://docs.orchardcore.net/en/latest/docs/reference/core/Shells/index.html", |
@sebastienros , @MikeAlhayek @jtkech I have a plain OC 1.7 Project with following program.cs and docker image created that deployed using kubernetes. with connection to redis using OrchardCore.Logging;
using Surevelox.OrchardCore.K8S.Host;
var builder = WebApplication.CreateBuilder(args);
builder.Host.UseNLogHost();
builder.Services.AddOrchardCms()
.AddGlobalFeatures("OrchardCore.Redis.Cache")
.AddGlobalFeatures("OrchardCore.Redis.Lock")
.AddGlobalFeatures("OrchardCore.Redis.Bus")
.AddGlobalFeatures("OrchardCore.Tenants.Distributed")
.AddSetupFeatures("OrchardCore.Redis.Cache")
.AddSetupFeatures("OrchardCore.Redis.Lock")
.AddSetupFeatures("OrchardCore.Redis.Bus")
.AddSetupFeatures("OrchardCore.Tenants.Distributed")
;
var app = builder.Build();
app.UseOrchardCore();
app.Run(); |
Looks like you are missing |
Do I need to have it enabled if that uses shared folder with persisted volume claim? |
Ah okay, yes normally it should work. |
Humm, yes |
@jtkech |
At some point we thought about using event bus (we have a |
Yes on Azure we have ARR affinity, on another server we may need the tenant Clusters I was working on. |
Yeah RedisBus may do the job. I bet @ns8482e would love to take the lead and implement it to solve this issue 🤪 We could also use Azure Service Bus or Azure Event Hubs for this. If I have time, I may try it out. |
@jtkech @sebastienros @MikeAlhayek created a bug #14373 |
Atleast we should check tenant state in request pipeline and not wait for 10 seconds for hosted service to update |
Okay For info I was wrong, there is no pooling time, only an idle time and of 1 second. Maybe worth to try it without idle time or a smaller one |
Describe the bug
I am running OC CMS by using multiple Azure services. For this app, I am using the following services
My app is running using 3 instances of the web services. so when a user makes a request to the website there are 3 different instances that could potentially handle the request. I am testing out the app scalability and would like to disable the ARR Affinity cookie. When the ARR cookie is disabled, the app seems to not work as expected. It's like the app can't handle the "form post request which is a second request a user would make to submit a web form".
To Reproduce
Steps to reproduce the behavior:
Add the following settings in the configuration section in the Azure Web Service
a.
OrchardCore__ConnectionString=ConnectionStringToAzureDatabase
b.
OrchardCore__DatabaseProvider=SqlConnection
c.
OrchardCore__Default__State=Uninitialized
d.
OrchardCore__Default__TablePrefix=Default
e.
OrchardCore__OrchardCore_DataProtection_Azure__ConnectionString=ConnectionStringToStorageAccount
f.
OrchardCore__OrchardCore_DataProtection_Azure__ContainerName=dataprotection
g.
OrchardCore__OrchardCore_Media_Azure__BasePath=Media
h.
OrchardCore__OrchardCore_Media_Azure__ConnectionString=ConnectionStringToStorageAccount
i.
OrchardCore__OrchardCore_Media_Azure__ContainerName=media-{{ ShellSettings.Name }}
j.
OrchardCore__OrchardCore_Media_Azure__CreateContainer=true
k.
OrchardCore__OrchardCore_Redis__Configuration=ConnectiontToTheRedisDatabase,allowAdmin=true
l.
OrchardCore__OrchardCore_Shells_Azure__ConnectionString=ConnectionStringToStorageAccount
m.
OrchardCore__OrchardCore_Shells_Azure__ContainerName=hostcontainer
Make sure you scale out the app service to at least 3 instances
Publish the app to azure web service using docker containers.
Setup your default site using SaaS recipe. Ensure that the app is working with no know error as it should.
Let's attempt to reproduce the issue now by disabling the ARR Affinity cookie as explained here
With the ARR cookies disabled, go to the "Tenants" section of your dashboard and add 2+ new tenant. The results will be that one or more of the tenants won't be added or show up.
If you now enable the ARR Affinity cookie some of the tenants you added in the previous step my show up. Weird?
Expected behavior
When the ARR Affinity cookie is disabled, everything should work the same since my app is using a single instance of the shell, data-protection and distributing my cache using Redis database.
The text was updated successfully, but these errors were encountered: