In a multi-tenant system, you have multiple different "tenants" (often "customers") who share the same compute resources. This can result in significant cost savings compared with deploying a completely separate instance of your application for every single customer (as well as reducing the operational overhead of managing lots of infrastructure).
There are some disadvantages to multi-tenancy. One is that there is potential for one tenant to flood the system with so much work that the other tenants find that the performance of the system is degraded. This is sometimes referred as the "noisy neighbour" problem, using the analogy of an apartment block where a noisy neighbour negatively impacts the experience of the other tenants.
A common situation where a tenant might cause a noisy neighbour issue is when there is potential for a "bulk ingestion" operation to take place. Maybe a new tenant is being onboarded and they want to import large amounts of data into the system. Or maybe your system exposes an integration API that tenants can directly call themselves. There is potential for them to unintentionally use it to create a denial of service attack on your system that affects all tenants.
This is a problem I've been focusing on quite a bit recently, and have come to the conclusion that there is no single technique that solves the noisy neighbour problem. Instead, there are a collection of strategies that can be applied. In this post, I'll outline what a few of these are, and thoughts on how they might be applied.
In an ideal world, a noisy neighbour tenant would not be able to cause an issue because the system would dynamically scale out to handle the additional load. Serverless platforms like Azure Functions can already do this very well, detecting when load on an API or backlog on a queue has grown too large, and automatically provisioning additional servers. And of course similar capabilities exist with things like the horizontal pod autoscaler in Kubernetes, or Azure App Service plan's built-in autoscaling.
Of course, in the real world, scaling may still not be sufficient to handle a noisy neighbour issue. You probably want to set upper limits on the number of hosts you scale up to, to avoid paying huge cloud bills, and to avoid overloading databases which may not be able to cope with vastly increased numbers of concurrent connections (which can end up cancelling out any benefit you got from scaling out).
2. Rate-limiting APIs
Another option at our disposal for mitigating noisy-neighbour issues is to add per-tenant rate-limiting to an API. With this approach, if a single tenant is calling an API endpoint too frequently, we could choose to reject some calls for that tenant, while allowing calls from other tenants to be accepted.
For example, you could return the HTTP 429 "Too Many Requests" response code, with the "Retry-After" header that tells the client to back off for a certain period of time. This could be useful in a bulk ingestion scenario, where the tool making the data ingestion API calls could respond to this "back-pressure" and reduce the rate at which it is calling your API.
Obviously implementing per-tenant rate-limiting is not necessarily trivial. You'd need to identify that one tenant was indeed monopolising compute resource. One approach would be some form of "quota" where each tenant gets a certain number of permitted operations in a time period before their calls are rate-limited.
The most recent .NET 7 has included rate-limiting middleware which looks like it could greatly simplify the task of implementing rate limiting.
The downside of this approach is that you are effectively denying your tenants from performing operations, which is not a great customer experience. So an alternative strategy would be to wherever possible accept their requests, but use queues to implement the work asynchronously.
3. Queue prioritisation
In theory the noisy neighbour problem is not such an issue when you are working asynchronously. Messages are placed into a queue, and the message handler will work through the backlog, eventually catching up. Scaling out is generally quite easy to achieve with queues. And the other tenants might not even notice that things took slightly longer than normal.
However, what happens if one tenant causes millions of items to be added to a queue, and maybe it takes many hours to work through that. In this situation, other tenants are more likely to notice a degradation in service. So what are our options?
One option you might think of is having a queue per tenant. You could either process all of these queues in parallel, or take a "round-robin" approach where you poll each queue in turn, ensuring that each tenant gets an equal chance for their "top" message to be processed. There is unfortunately a big downside to this approach in that if you have a large number of tenants, the overhead of servicing many queues at once can itself be costly and resource intensive.
Another approach would be to use a "priority queue". A tenant that has submitted too much work has their new messages automatically placed into a "low priority" queue that is only serviced once the high priority queue has been drained. There are many possible variations of this approach, and it can be quite complicated to implement well.
Another option would be to allow a single tenant to flood the queue with messages, but to rate-limit them at the point of handling the messages. If you have processed too many messages for a single tenant in the last hour, you "defer" any further messages for that tenant by reposting them back to the end of the queue (or storing them in some other way). This of course assumes that you know that there is valid work for other tenants present in the queue, so this can also be tricky to implement intelligently.
4. Scheduled jobs
Another consideration in a multi-tenant system is what happens with scheduled jobs. It's not uncommon to have "cleanup" jobs that run out of hours (say 1:00am). The danger here is that sometimes a single tenant will have an unusually large amount of work to do on one particular day, potentially impacting other tenants.
A simple approach here is to take paged or time-boxed approaches to dealing with the scheduled task for each tenant. For example you might decide to process up to a maximum of 10,000 cleanup items for a single tenant before moving onto the next tenant. You can then return back round to do additional batches for the busy tenant after every other tenant has also had the opportunity to have their cleanup task run.
5. Migrating tenants
In extreme circumstances you may decide that a particular tenant is causing so many problems that you want to move them onto an entirely separate system. It's possible that they are consuming so much resource because they are a major customer, and if so, you may be able to justify the cost of making them single tenant.
Of course, this assumes that you have the capability to migrate a tenant from one "deployment" of you application to another. This task is made much easier if their data is stored separately (e.g. having per-tenant databases), which can simply be re-attached to a different set of compute resources.
If you are running a multi-tenant system, you need to consider what the impact of a single tenant monopolising the compute resources will be on the other tenants. I've described a number of the techniques and strategies that I've either made use of or have considered in the multi-tenant systems I've worked on, but I'd be very interested to hear from others about what has (and hasn't) worked in your multi-tenant systems.