Which Database should I use in my Azure Serverless App?
If you're building a new application in Azure and want to use a "serverless" approach, what should you use as a database? Obviously, one of the key goals of "serverless" is to avoid having to manage your own servers, so the classic "IaaS" approach of installing a database on a Virtual Machine isn't a good fit. But there are still plenty of great options. I talked about this in my "Building Serverless Applications in Azure" course on Pluralsight, but things have moved on a bit since then so I thought it was worth revisiting the topic.
As I see it, in Azure there are three main database options to choose between:
- Relational databases - Azure SQL Database being the most obvious choice here
- Document database - Azure Cosmos DB is Azure's offering in this space
- The budget option (or "poor man's" database) - You can also use Azure Storage as a primitive database for minimal cost
For many (if not most) software developers, relational databases are the most familiar, and they are often our go-to option for storing data. They have the advantage of allowing very flexible queries and joins between related entities (hence the name), but do require the schema to be designed up front, and modifying that schema requires some kind of migration to be performed.
Azure offers a choice of relational databases. The main one is Azure SQL Database, which is essentially a fully managed SQL Server in a PaaS offering. But there is also Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostgreSQL available if you are more comfortable with working with one of those databases.
Azure SQL Database is a great choice for a serverless application if you do decide that a relational database is the right choice for you. It's really easy to create one and there are several pricing tiers to support everything from a very small and cheap test system, all the way up to a powerful large-scale production system.
Azure SQL Database makes it really easy to enable key features for production scenarios such as encryption at rest with customer managed keys, backing up (with point-in-time restore), and replication to another region. It even comes with a superb query performance insights blade in the Portal that can tell you which of your queries are performing poorly and what indexes could improve them.
One disadvantage of going for a relational database in a serverless Azure application is that it is a little bit trickier to use from Azure Functions. There aren't built-in bindings like there are for Cosmos DB or Azure Storage, so you need to write your own Entity Framework code to access the database.
Another interesting recent development is that there is now a "serverless" pricing tier for Azure SQL Database. This essentially means that if your database is idle for a certain period (at least an hour) it can hibernate to save you money. It can also automatically scale itself up (within predefined limits) to respond to additional load. This might sound perfect for any serverless application but it does come with some caveats.
First, if your database has gone to sleep, there will be a fairly significant "cold start" penalty to wake it up (resuming takes up to a minute). And secondly, if your database never goes to sleep, then this option can work out more expensive. So beware of having scheduled jobs that run every hour with this approach, as your database will never go to sleep.
Document databases are in many ways a perfect fit for serverless architectures. Because you don't need to predefine your schema up front, they allow you to rapidly iterate and evolve your application over time with minimal fuss. Azure Functions come with some built-in bindings to simplify the code needed to read and store data in a document database.
Although Azure only offers a single document database offering - Cosmos DB, it is an extremely flexible and powerful database. It even supports a variety of different APIs including allowing you to use (for example) the MongoDB API if you're more familiar with that.
One of the most interesting features of Cosmos DB for serverless applications is its concept of a "change feed". This allows you to easily create an Azure Function that can "subscribe" to all changes to documents in a collection. This makes it really easy to generate "materialized views" that allow you to optimize performance and reduce costs of queries.
When Cosmos DB originally came out, the pricing model scared a lot of developers off - the cheapest possible database was three times the cost of the cheapest Azure SQL Database. But things have improved greatly.
Firstly, there is a free tier - allowing you to use a certain amount of resources for free each month which is great for testing and experimenting.
Secondly, Microsoft recently announced a serverless pricing model where the billing will only be based on storage and operations provisioned and could be a good choice for spiky workloads.
Thirdly, you can scale Cosmos DB up and down on the fly, and there is even an "auto-scale" feature that will intelligently scale up and down to save money during idle periods, while meeting demand during peak times.
Using Azure Storage as a poor man's database
Some serverless applications have very simple storage requirements. Maybe you don't often update data, or maybe you don't need rich querying capabilities, and can just look things up by their id.
Azure Storage offers very cheap ways of storing data. For example you could just store data in blobs as JSON or XML files. Or you could use Table Storage, which allows you to store simple table-based documents with a composite key of a "row key" and a "partition key". I've used both options for several small websites and microservices which simply didn't need the cost or complexity of a full database.
This approach can be a great starting point for a proof-of-concept app, and you can graduate later to a "proper" database as your needs change.
The Hybrid Approach
Of course, there's no reason why you have to pick just one of the above options. Especially if you are using a microservices architecture, each microservice can take it's own approach, using the one most appropriate database for the type of data you are storing.
In fact, you may find tht the best approach is hybrid, adding in services like Azure Cognitive Search Azure Redis Cache, Blob Indexer. So don't feel that you have to pick just one database type for storing all the data in your serverless application.
Another option I recently found is to run MongoDb Atlas in Azure. There is a free tier and fairly reasonable paying tiers. The cluster won't be in the same subscription as your serverless app, but it can be in the same region, so I'd hope the impact on performance wouldn't be too bad.Marc Roussy
Thanks for the suggestion, it's not something I've used beforeMark Heath
Is there any equivalent to 'Layers' for packaging up shared code or code with native extensions, like in AWS-land?Dan Engel