Discussion: Preview of improved Functions http scaling behavior #38

davidebbo · 2018-03-12T21:51:23Z

Discussion thread for Azure/app-service-announcements#90.

nzthiago · 2018-04-12T02:09:45Z

@davidebbo - do you know if the flag is still needed? Or is it enabled for anyone by default for all functions?

davidebbo · 2018-04-12T03:31:36Z

Great question: we actually had it enabled by default for a few days, but had to turn it off due to some issue. So for now, yes, you still need the flag. Normally, in another few weeks, it will be default again. /cc @suwatch

Fabian-Schmidt · 2018-05-01T22:04:46Z

Does this change only apply to consumption based plans?
Can App Service plans also benefit from the changes?

davidebbo · 2018-05-01T22:08:28Z

@Fabian-Schmidt yes, it's only for Consumption.

rikvandenberg · 2018-05-14T12:51:12Z

To add to the discussion. A blog I found from @JamesRandall led me here.
https://www.azurefromthetrenches.com/azure-functions-significant-improvements-in-http-trigger-scaling/

There seems to be an very strong decrease from small peaks we experienced in our HTTP functions. We previously experienced consistent 6ms responses to which suddenly increased to ~600ms.

davidebbo · 2018-05-14T14:21:19Z

@rikvandenberg please provide more details. Are you referring to cold start, or is that the response time you see always? Is this under high load scenario?

rikvandenberg · 2018-05-14T21:12:11Z

@davidebbo I'll try my best to best explain and what we are seeing.

Intro
So we have two simple azure function that does the following.

DistanceFunction: Calculate a geographical distance between 1 origin (lat/lon) and a maximum of 25 destination locations (lat/lon).

Uses System.Device.Location.GeoCoordinate

RouteFunction: Calculate a driving distance between 1 origin (lat/lon) and a maxmimum of 25 destination locations (lat/lon).

The route function uses Google Maps API and caches the results in to redis cache.

We call both functions with the exact same parameters in terms of origin and destination locations at the same time asynchronously from an ASP.NET WebAPI application. We are using a Task.WaitAll(tasks, 500) to also timeout after 500ms.

We require this timeout/performance to prevent the user request from blocking and we wish to continue. Thus we pre-emptively continue our request, as upon refresh, the route information will most likely be in the cache.

Performance Test

Over the span of ~9 minutes we received 265 requests, ~130 for each function. Not a high load scenario.
We always used the EXACT same origin and destinations so that it will always use the CACHED results.
Our livestream metrics in Application Insights indicated at the time our azure function had 10 cloud role instances

Performance Test Results

From those 265 requests, 13 peaked above the 500ms. Results from AI analytics
12 of those requests are caused by the Route function as it has a heavier workload.
The peak doesn't seem to correlate with a "cold" cloud_roleinstance as it has been warmed up.
WEBSITE_HTTPSCALEV2_ENABLED=1

Possible Causes
It seems to me that sometimes the switch to another cloud_roleinstance is the causes of these "random" peaks, but I can't think of any other explainable cause 🤔

If you have any suggestions on how to approach these tests to give you more insights as well, please let me know and I'll see what I can do.

davidebbo · 2018-05-14T21:36:51Z

/cc @suwatch who is the expert.

@rikvandenberg What you're observing is likely the flipside of the new scale behavior. It's scaling faster (you're getting 10 instances), but at the same time you end up hitting more cold starts (one per instance). We may still need to tweak the system further to balance things.

BTW, WEBSITE_HTTPSCALEV2_ENABLED is now on by default which is likely why you might have seen a change. You can also try setting WEBSITE_HTTPSCALEV2_ENABLED=0 to revert to previous behavior.

Would you say that you're functions are CPU bound, or more I/O bound? I would think the later, as waiting for the Google Map result should take very little resources. In that sense, it is odd that it decided to scale this much. @suwatch will dig into it further.

suwatch · 2018-05-16T18:37:00Z

It seems to me that sometimes the switch to another cloud_roleinstance is the causes of these "random" peaks, but I can't think of any other explainable cause

@rikvandenberg Thank for reporting. This was a result of unwarranted cold starts. Our current scale implementation has a flaw when it comes to low load with occasional burst of concurrent requests. The spikes caused us to scale out to more instances. Since the spike was not sustaining, our scale in logic kicked in and removed the instances. This happened alternately every 1-2 mins and, as a result, a moving set on instances was assigned to the function. This explained the 10 instances from Application Insight. They were not at the same time - but rather a different set over a 10 min period. For each new instance assigned, it caused cold start (spike of long latency).

Good news is we have improved this logic by making the scale in less aggressive for this specific situation. The ETA will be 2 weeks. We will let you know to retry your scenario.

rikvandenberg · 2018-05-16T20:08:55Z

@suwatch Thanks for the quick response! I look forward to testing this improvement.

suwatch · 2018-06-03T19:01:47Z

@rikvandenberg The fix rollout takes longer than expected. It will be likely be another week before the fix is available across. If you are eager to experiment, try create a function app in West Central US location where the fix is available. Otherwise, wait for a week.

suwatch · 2018-06-20T19:57:16Z

@rikvandenberg The improvement has been rolled out completely. Do try when you get a chance and provide any feedbacks.

rikvandenberg · 2018-06-21T11:30:44Z

@suwatch We just had our sprint planning yesterday and have some time available to do a small test. I'll try to use the same test scenario. I'll let you know when we have got something.

davidebbo mentioned this issue Mar 12, 2018

Preview of improved Functions http scaling behavior Azure/app-service-announcements#90

Closed

fabiocav closed this as completed Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Preview of improved Functions http scaling behavior #38

Discussion: Preview of improved Functions http scaling behavior #38

davidebbo commented Mar 12, 2018

nzthiago commented Apr 12, 2018

davidebbo commented Apr 12, 2018

Fabian-Schmidt commented May 1, 2018

davidebbo commented May 1, 2018

rikvandenberg commented May 14, 2018 •

edited

Loading

davidebbo commented May 14, 2018

rikvandenberg commented May 14, 2018

davidebbo commented May 14, 2018

suwatch commented May 16, 2018

rikvandenberg commented May 16, 2018

suwatch commented Jun 3, 2018

suwatch commented Jun 20, 2018

rikvandenberg commented Jun 21, 2018 •

edited

Loading

Discussion: Preview of improved Functions http scaling behavior #38

Discussion: Preview of improved Functions http scaling behavior #38

Comments

davidebbo commented Mar 12, 2018

nzthiago commented Apr 12, 2018

davidebbo commented Apr 12, 2018

Fabian-Schmidt commented May 1, 2018

davidebbo commented May 1, 2018

rikvandenberg commented May 14, 2018 • edited Loading

davidebbo commented May 14, 2018

rikvandenberg commented May 14, 2018

davidebbo commented May 14, 2018

suwatch commented May 16, 2018

rikvandenberg commented May 16, 2018

suwatch commented Jun 3, 2018

suwatch commented Jun 20, 2018

rikvandenberg commented Jun 21, 2018 • edited Loading

rikvandenberg commented May 14, 2018 •

edited

Loading

rikvandenberg commented Jun 21, 2018 •

edited

Loading