Chance of getting 409 while renewing the primary host lock #1864

brettsam · 2017-09-05T20:22:45Z

While investigating another issue I stumbled upon a lot of these 409s coming from renewing the primary host lock. I did some investigation and I believe that, because we do a 'fire and forget' with the host disposal, it's possible that a new host comes up and acquires the lock just before the old host releases it. That means that the next attempt to renew fails, as the lock has been released.

The error looks like: Singleton lock renewal failed for blob 'locks/AppName/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2017-09-02T18:38:31.526Z (111006 milliseconds ago) with a duration of 57508 milliseconds. The lease period was 15000 milliseconds.

I think it's happening with something like below.
Note 1 -- this is all on a single instance and references to hosts are about the hosts being cycled via the ScriptHostManager.
Note 2 -- All of the host locks use the current machine id as the proposed lease, which allows multiple hosts on the same machine modify the lease.

Host1 -> acquire lock
Host1 -> renew lock
something triggers host recycle and creates a new host while disposing of the old host
Host2 -> acquire lock
Host1 -> release lock
Host2 -> renew lock -> throw

It's actually possible for several other combinations to happen (I've seen a race where Host1 renews right as it releases, also causing a throw).

Ultimately, these errors seem benign as they'll eventually recover and all will be well, but it causes a lot of errors in the logs (I see 100k+ 409 logs that involve the host lock over the last week).

The text was updated successfully, but these errors were encountered:

vovikdrg · 2017-10-30T13:19:40Z

For now solution is to add settings.job with

{ "is_singleton": true }

It will not scale but at least it will not fail..

…re#1864

mathewc · 2017-12-11T19:24:33Z

@brettsam You said "acquires the lock just before the old host releases it" - how can that happen? The lease can only be held by one. Perhaps what you're saying is something like:

h1 has lock and is renewing it regularly
host restart is triggered
h1 is orphaned (on background thread)
h1 stopasync is called, and while that is progressing h2 is starting up
h2 attempts to acquire lock, before h1 has released it (before disposal happens)

However, PrimaryHostCoordinator.AcquireOrRenewLeaseAsync seems to already handle this case correctly - if it doesn't have the lease and fails to acquire it, no error is thrown.

alrod · 2017-12-12T02:28:07Z

@mathewc, the lease can be held by multiple hosts inside single instance as we use the same leaseId (instanceId) to acquire the lease. It's flow @brettsam mentioned:
Host1 -> acquire lock
Host1 -> renew lock
something triggers host recycle and creates a new host while disposing of the old host
Host2 -> acquire lock
Host1 -> release lock
Host2 -> renew lock -> throw

mathewc · 2017-12-12T17:21:02Z

I see, thanks for clarifying. If the problem is that the renewal fails because we already have the lock then can't we handle that condition by ignoring that specific error?

…re#1864

fabiocav · 2018-01-16T21:22:03Z

Work has been merged in 1.x. Need to make sure this isn't an issue in 2.0.

…ixes #1864" This reverts commit f6f5fb3.

mathewc · 2018-01-23T21:47:35Z

Related to #2326

pseabury · 2020-04-15T12:44:28Z

@brettsam

Any lease renewals that end with /host are likely nothing to worry about. Instances can swap between "primary" hosts without affecting the running functions. Let me know if you see anything that may be leading to failures and I can investigate.

2020-04-15 09:35:27,550 Host.Singleton RD00155DE250FA ERROR Singleton lock renewal failed for blob 'fr911prodadapterprocessorfunc/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-04-15T09:34:52.747Z (34714 milliseconds ago) with a duration of 140 milliseconds. The lease period was 15000 milliseconds.

Overnight I saw this error a number of times, but it caused enough of a delay (5-10mins!) in consumption of ServiceBus messages that it triggered several application-level errors for us as processing these messages in a timely fashion is critical. Currently these are v2 functions with ServiceBus triggers deployed to an AppService that is scaled out on 2-N instances.

Paul

pseabury · 2020-04-15T13:44:10Z

Disregard my previous - after further investigation these errors were just coincidental to another network issue outside of Azure.

ThomasArdal · 2020-04-26T10:26:23Z

I'm also seeing this error. This is the most recent instance:

Singleton lock renewal failed for blob 'myfunctionhere/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-04-26T09:50:57.017Z (18690 milliseconds ago) with a duration of 4 milliseconds. The lease period was 15000 milliseconds

I'm running on Functions v2 and all of the functions inside this app are HTTP triggered and short living.

I agree with the other people in this thread. If this isn't an issue, it shouldn't be logged as an error. Errors are something that we need to look at and not simply ignore (IMO).

ThomasArdal · 2020-04-27T08:13:30Z

Idea! Maybe use structured logging properties for the 2020-04-26T09:50:57.017Z (18690 milliseconds ago) part of the message. This would make it a lot easier for people to ignore errors of this type 👍

wpitallo · 2020-09-15T10:06:53Z

I have a webapp running azure functions and I am getting this everytime I restart the webapp or swap slots.

2020-09-15T10:01:12Z [Error] Singleton lock renewal failed for blob '.../host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-09-15T10:00:47.566Z (24432 milliseconds ago) with a duration of 3 milliseconds. The lease period was 15000 milliseconds.

fabiocav · 2020-09-15T17:53:40Z

@wpitallo any other side effect? This may happen if the primary host changes and the lease is stolen by another instance.

wpitallo · 2020-09-15T18:13:36Z

Not that I am aware of at this stage.

shaw-system1 · 2021-01-05T11:17:18Z

Same issue here with an Azure (~3) timer triggered function;

Singleton lock renewal failed for blob '.../host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 0001-01-01T00:00:00Z (-2147483648 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.

Last successful lock time on this is weird. Doesn't seem to cause problems, just fills app insights with sev 3 errors

ketanbembi · 2021-01-21T01:17:32Z

Hi All

I am facing same issue in my Azure Function app.

It seems like the real problem is that in non prod environments we are using single blob storage used by multiple Azure function apps. Hence when renew lease operation is performed by one of the Azure function app, then due multiple azure function apps using blob storage, the error of "Singleton lock renewal failed for blob" happens. Which in turns lead to restart of job host and hence azure function abruptly stops without any function error.

I can conclude the above fact based on the my investigation that our production environment doesn't have same setup of using single blob storage used by multiple azure function apps. And we have never face "Singleton lock renewal failed for blob" error in that environment.

So as per my understanding the solution to this error is avoid sharing single blob storage with multiple Azure functions.

TIA

ThomasArdal · 2021-01-21T06:50:29Z

I can confirm that this happens in environments with individual storage accounts for each function app as well. I always create a new storage account for every new function app I create. Sharing storage accounts introduce some other problems why I wouldn't recommend sharing storage accounts between function apps ever 👍

beaubgosse · 2021-04-21T14:27:31Z

I'll be following this, since we also see this problem with a consumption plan Azure Function App

hiattp · 2022-01-04T17:31:02Z

We receive this error every time our durable function executes (we are using dedicated storage for this function and it is running in an isolated application service environment). Unfortunately this appears to halt the function execution after 80 seconds. So we have a durable function that will run for ~80 seconds, then we get the "Singleton lock renewal failed" message in the log traces and no further logging from our application code until it retries the durable function activity 30 minutes later. It cycles like this indefinitely, running for ~80 seconds every 30 minutes but never completing.

It doesn't seem to be directly related to our application logic because within that 80 second period the application logic can get further along (e.g. if we bump the thread count). And if we run the durable activity logic directly on a local instance of the docker container it will run to completion. So it seems to be time bound, and prevents our application code from executing despite no errors in the application logic itself.

We are running on version ~4, but downgrading to ~3 didn't seem to help.

kaluznyt · 2022-01-21T10:38:03Z

I started seeing this after updating (function) apps (version ~4 / .net 6) and updating all of dependencies, there are only time triggers (multiple) and service bus triggers (multiple) there is 1 storage account. on .net3.1 and some older library versions I havent seen those logging, perhaps it's just some noise, I havent spotted yet if that affects function execution in any way.

Request [xxxxxxxxxxx] PUT https://name.blob.core.windows.net/azure-webjobs-hosts/locks/fname/host?comp=lease
x-ms-lease-action:acquire
x-ms-lease-duration:15
x-ms-proposed-lease-id:xxxxxxxxxxxx
x-ms-version:2020-08-04
Accept:application/xml

Error response [xxxxxxxxxxxx] 409 There is already a lease present. (00.0s)
Server:Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-version:2020-08-04
x-ms-error-code:LeaseAlreadyPresent

Those are logs of severity "Warning" in insights

shaeney · 2022-01-24T10:57:07Z

I also started seeing this after updating (function) apps (version ~4 / .net 6) and updating all of dependencies.

A single Service Bus Trigger. 1 Function, 1 Storage account.

Didnt see this before update to Function v4 and net 6

Bpflugrad · 2022-03-02T17:30:22Z

This was affecting my Azure Functions V4 apps with .NET 6 with deployment slots.

What I ended up discovering was that one of my swap slot was using the same Storage Account connection string as my production Slot. Once I changed this the lease problem stopped appearing in Application Insights.

paul-datatech911 · 2022-03-02T17:45:38Z

@Bpflugrad - that's good info, thanks!

kacperwojtyniak · 2022-04-20T06:25:27Z

I am also seeing this error, although I am using a single Storage Account for a Function App. The issue appears when scaling out to multiple instances. I have just one BlobTrigger function and two TimerTrigger functions. Is there a way to avoid this error or at least make it not log as an exception in App Insights?

framewerks · 2022-05-09T09:48:45Z

I also started seeing this after updating (function) apps (version ~4 / .net 6) and updating all of dependencies.

A single Service Bus Trigger. 1 Function, 1 Storage account.

Didnt see this before update to Function v4 and net 6

Same for me, we discovered after our log analytics costs increased (!). Like @Bpflugrad I found I had slots sharing the same storage account, but even after fixing this the issue remains.

jedmonsanto · 2022-09-05T06:27:22Z

Creating in Home Storage caused the same problem, just giving my two cents

but when creating in local path, works fine

chralph · 2022-12-07T15:22:12Z

Adding, also get this issue with any trigger type at random intervals. We tried to scale up and out but this did not provide a solution. Happening on multiple function apps with no correlation. Advised that this is is just noise and too ignore but it still shows as an error in insights which is not ideal.

LeszekKalibrate · 2023-03-31T13:48:36Z

It happens in function apps (version ~3 / .net core 3.1) too.

magnusjak · 2023-04-12T07:30:07Z

Also seeing this error (409 with 'LeaseIdMismatchWithLeaseOperation') on one of our function apps (Runtime version ~4).
It doesn't happen every run though, just once in a while. Still annoying to have this pop up in our exceptions dashboard.

Are there no easy way to silence these errors if they don't really matter?

gopala000 · 2023-09-11T14:46:46Z

I see this event and another thread start processing the message. Any ideas how we can avoid the issue? Unless you code for idempotency this will be an issue. For example, people will get invitation emails twice if not coded to exclude resending emails if sent in last minute.

More details here. https://stackoverflow.com/questions/77070373/azure-functions-service-bus-trigger-settings-maxautolockrenewalduration-or-maxau

christopheranderson added the needs-investigation label Sep 20, 2017

christopheranderson added this to the Triaged milestone Sep 20, 2017

paulbatum assigned alrod and unassigned alrod Oct 4, 2017

paulbatum modified the milestones: Triaged, Next Oct 4, 2017

paulbatum assigned alrod Nov 17, 2017

paulbatum modified the milestones: Next, Sprint 11 Nov 17, 2017

alrod added a commit to alrod/azure-webjobs-sdk-script that referenced this issue Nov 22, 2017

Chance of getting 409 while renewing the primary host lock. Fixes Azu…

7e273de

…re#1864

paulbatum modified the milestones: Sprint 11, Sprint 15 Jan 10, 2018

alrod added a commit to alrod/azure-webjobs-sdk-script that referenced this issue Jan 12, 2018

Chance of getting 409 while renewing the primary host lock. Fixes Azu…

dc22c3b

…re#1864

alrod added a commit that referenced this issue Jan 12, 2018

Chance of getting 409 while renewing the primary host lock. Fixes #1864

f6f5fb3

mathewc added a commit that referenced this issue Jan 23, 2018

Revert "Chance of getting 409 while renewing the primary host lock. F…

52da369

…ixes #1864" This reverts commit f6f5fb3.

mathewc mentioned this issue Jan 23, 2018

Revert "Chance of getting 409 while renewing the primary host lock. F… #2347

Merged

mathewc removed this from the Sprint 15 milestone Jan 23, 2018

christopheranderson added this to the Triaged milestone Feb 15, 2018

paulbatum modified the milestones: Triaged, Backlog Apr 6, 2018

synercoder mentioned this issue Nov 7, 2018

Lots of servers idling on 409 primary host lock #3748

Closed

atomhearth mentioned this issue Jul 27, 2020

Singleton lock renewal failed Azure/azure-webjobs-sdk-extensions#249

Open

v-anvari mentioned this issue Jan 19, 2021

Azure Function App stops responding to event after ~4 days if left unattended #7025

Closed

v-bbalaiagar mentioned this issue Feb 16, 2021

azure timer function not executing all of sudden #7159

Closed

v-bbalaiagar mentioned this issue Mar 18, 2021

Function host stopping in the middle of function execution #7182

Closed

v-bbalaiagar mentioned this issue Oct 19, 2021

Azure Queue trigger is not triggering after being idle for sometime. #7753

Closed

ramya894 mentioned this issue Aug 23, 2022

Azure timer triggered function not invoked. #8649

Open

liliankasem mentioned this issue Mar 28, 2023

Azure Functions host - Lease lock http 409 Azure/azure-webjobs-sdk#2318

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chance of getting 409 while renewing the primary host lock #1864

Chance of getting 409 while renewing the primary host lock #1864

brettsam commented Sep 5, 2017

vovikdrg commented Oct 30, 2017

mathewc commented Dec 11, 2017

alrod commented Dec 12, 2017

mathewc commented Dec 12, 2017

fabiocav commented Jan 16, 2018

mathewc commented Jan 23, 2018

pseabury commented Apr 15, 2020

pseabury commented Apr 15, 2020

ThomasArdal commented Apr 26, 2020

ThomasArdal commented Apr 27, 2020

wpitallo commented Sep 15, 2020

fabiocav commented Sep 15, 2020

wpitallo commented Sep 15, 2020

shaw-system1 commented Jan 5, 2021 •

edited

Loading

ketanbembi commented Jan 21, 2021

ThomasArdal commented Jan 21, 2021

beaubgosse commented Apr 21, 2021

hiattp commented Jan 4, 2022

kaluznyt commented Jan 21, 2022 •

edited

Loading

shaeney commented Jan 24, 2022

Bpflugrad commented Mar 2, 2022

paul-datatech911 commented Mar 2, 2022

kacperwojtyniak commented Apr 20, 2022

framewerks commented May 9, 2022

jedmonsanto commented Sep 5, 2022

chralph commented Dec 7, 2022

LeszekKalibrate commented Mar 31, 2023

magnusjak commented Apr 12, 2023

gopala000 commented Sep 11, 2023

Chance of getting 409 while renewing the primary host lock #1864

Chance of getting 409 while renewing the primary host lock #1864

Comments

brettsam commented Sep 5, 2017

vovikdrg commented Oct 30, 2017

mathewc commented Dec 11, 2017

alrod commented Dec 12, 2017

mathewc commented Dec 12, 2017

fabiocav commented Jan 16, 2018

mathewc commented Jan 23, 2018

pseabury commented Apr 15, 2020

pseabury commented Apr 15, 2020

ThomasArdal commented Apr 26, 2020

ThomasArdal commented Apr 27, 2020

wpitallo commented Sep 15, 2020

fabiocav commented Sep 15, 2020

wpitallo commented Sep 15, 2020

shaw-system1 commented Jan 5, 2021 • edited Loading

ketanbembi commented Jan 21, 2021

ThomasArdal commented Jan 21, 2021

beaubgosse commented Apr 21, 2021

hiattp commented Jan 4, 2022

kaluznyt commented Jan 21, 2022 • edited Loading

shaeney commented Jan 24, 2022

Bpflugrad commented Mar 2, 2022

paul-datatech911 commented Mar 2, 2022

kacperwojtyniak commented Apr 20, 2022

framewerks commented May 9, 2022

jedmonsanto commented Sep 5, 2022

chralph commented Dec 7, 2022

LeszekKalibrate commented Mar 31, 2023

magnusjak commented Apr 12, 2023

gopala000 commented Sep 11, 2023

shaw-system1 commented Jan 5, 2021 •

edited

Loading

kaluznyt commented Jan 21, 2022 •

edited

Loading