The cold start problem
Whenever I talk about Azure Functions, the subject of "cold start" invariably causes concern. A detailed overview of cold starts in Azure Functions is available here, but the simple explanation is that the Azure Functions consumption plan adds and removes instances of the functions host dynamically, which means that when your function is triggered, there might not currently be an instance available to handle it immediately. If that's the case, a new instance of the functions host is started on demand, resulting in a brief delay before it handles it's first request - this is called a "cold start". If your function app has been idle for more than about 20 minutes, then you will likely experience a cold start next time a function is triggered.
How long does a typical cold start take? In the early days of Azure Functions it could be quite painful - I often saw waits in the 20 - 30 seconds range. Things have got a lot better since, and my friend Mikhail Shilkov has done a brilliant job benchmarking cold start times in Azure Functions in various configurations. For C# functions, it seems cold starts are typically in the range of 2-3 seconds, although sometimes can be as high as 10 seconds.
Does it even matter?
It's worth pausing to ask whether cold starts really matter for your application. After all, for many trigger types such as new messages on a queue, scheduled tasks, or new blobs appearing in blob storage, a human isn't sat there waiting for a web-page to load, and so the occasional added latency of a cold start might not be an issue.
Even your HTTP triggered functions, are not necessarily being called by a human. For example, if you're implementing a web-hook, the cold start time might not matter too much.
Obviously if your functions are implementing APIs that are called from web-pages, then a cold start potentially introduces a poor user experience, which might be an issue for you. Of course, the cold start time is only part of the bigger picture of responsiveness - if your function code is slow, or it has downstream dependencies on slow external systems, then eliminating the cold start time will only go part-way to addressing your performance issues.
But if cold starts are a problem, what can be done about them? Mikhail has provided a few useful suggestions for reducing cold start times on his blog. He shows that deployment and logging techniques can affect cold start time. So there are ways to reduce the cold start impact.
But in the rest of this article, I want to highlight a few other approaches you could consider if cold starts really do pose a problem for you. Can we avoid them altogether?
Workaround #1 - Warmup request
I've heard of a few people using a timer triggered function to keep their Function App constantly warm. This feels a bit hacky to me, as it essentially exploits a loophole in the consumption pricing plan. I've not tried it myself, but I see no reason why it wouldn't work. You'd need to run at least every 20 minutes (probably 15 to be on the safe side), which would require 2880 invocations per month, which would have minimal cost.
A more elegant variation on this theme would be to try to warm up your Function App just in time. Maybe you know that at a certain time in the morning the Function App is likely to be cold and so you wake it up just before you expect users to come online. Or maybe when a user logs into your system, or visits a certain webpage, you know that a function is likely to be triggered soon. In that case you could send a simple "warmup" request in advance of the real one. Here's an example of this technique in action.
Workaround #2 - App Service Plan
Many people are not aware that with Azure Functions, you don't have to host using the serverless "consumption" plan. If you prefer, you can just use a regular Azure App Service Plan, which comes with a fixed monthly fee per server instance, and use that to run your Function Apps.
With this option, you lose the benefits of per-second billing, and you also lose the rapid elastic scale (although an App Service Plan can be configured to scale out based on CPU or on a schedule).
However, you no longer need to worry about cold starts - your dedicated compute is always available. You also get the benefit that the 5 minute function duration limitation no longer applies.
A variation on this theme would be to take advantage of the fact that the Azure Functions runtime can be hosted in a Docker container. So you could host that on a VM running Docker, or several instances on a Kubernetes cluster if you wanted. You'd have to implement any autoscaler logic yourself at the moment if you required automatic scale out though.
Workaround #3 - Premium plan
Finally, what if we could have the best of both worlds? Imagine we could have some dedicated instances that were always on, to eliminate cold starts, but could still elastically scale out beyond that in the same way that the consumption plan does.
Essentially, you'll have at least 2 always on worker instances, but above that, it scales out dynamically. This plan is still in preview, but it offers a nice upgrade path from the consumption plan for the future if you do need to avoid cold starts.
"Cold starts" are an inevitable consequence of the dynamic nature of the consumption hosting plan. They are not necessarily an issue for all applications or trigger types, so its worth thinking about how important it really is to avoid them. In this article I've presented a few ways you can go about mitigating or avoiding the cold start problem. And hopefully over time we'll continue to see performance improvements to Azure Functions cold start times.
For more reading on the topic of cold starts and scaling in Azure Functions, I highly recommend you checking out some of the following articles:
- Colby Tresness - Understanding Serverless Cold Start
- Mikhail Shilkov - Cold Starts in Azure Functions
- Mikhail Shilkov - Reducing Cold Start Duration in Azure Functions
- James Randall - Azure Functions – Significant Improvements in HTTP Trigger Scaling