0 Comments Posted in:

Closely related to my recent post about how to list and download the contents of a GitHub repo in C#, I also wanted to be able to list the files in a publicly shared Google Drive folder and download them from C#.

There is a Google Drive API we can use for this purpose, but first we need an API key.

You can generate one for use with your application at the Google Developer Console. Obviously you should avoid exposing this key to the end users of your application.

Now we need the folder id of the Google Drive folder that's been shared with you. Usually you'll receive a sharing URL that looks something like this, with the folder id as part of the URL:

https://drive.google.com/drive/folders/4wj98k3bbQsLRuiWm-PrQlRfkrEP6lbNg?usp=sharing

Let's see some code to list the files in the folder. We can do that with a call to the /drive/v3/files endpoint, passing in a query for any files that have the folder id as one of their parents. The results are paged, so we need to keep calling it until we no longer get a next page token.

The resulting JSON object has an array of files each of which has an id and name as well as a kind (which is usually drive#file) and a mimeType. In my code sample I just print out the id and name of each file in the folder.

var httpClient = new HttpClient();
var publicFolderId = "4wj98k3bbQsLRuiWm-PrQlRfkrEP6lbNg";
var googleDriveApiKey = "Your Google Drive API key";
var nextPageToken = "";
do
{
    var folderContentsUri = $"https://www.googleapis.com/drive/v3/files?q='{publicFolderId}'+in+parents&key={googleDriveApiKey}";
    if (!String.IsNullOrEmpty(nextPageToken))
    {
        folderContentsUri += $"&pageToken={nextPageToken}";
    }
    var contentsJson = await httpClient.GetStringAsync(folderContentsUri);
    var contents = (JObject)JsonConvert.DeserializeObject(contentsJson);
    nextPageToken = (string)contents["nextPageToken"];
    foreach (var file in (JArray)contents["files"])
    {
        var id = (string)file["id"];
        var name = (string)file["name"];
        Console.WriteLine($"{id}:{name}");
    }
} while (!String.IsNullOrEmpty(nextPageToken));

And what if you want to download the contents of a file? You can do this by constructing a URL with the file id as shown in the code snippet below. This works for publicly shared files - no authentication is needed.

Here's an example of downloading the contents as a string:

var id = // the id of the file you want to download;
var downloadUri = $"https://drive.google.com/uc?export=download&id={id}";
var contents = await httpClient.GetStringAsync(downloadUri);

0 Comments Posted in:

I recently needed to write some code in C# that could list the contents of a GitHub repository, and download the contents of specific files.

There is actually a GitHub API, which means that this simple URL (https://api.github.com/repos/markheath/azure-deploy-manage-containers/contents) can be used to list all the files in one of my GitHub repos.

Calling this from C# is relatively straightforward, with the exception that you must provide a user agent header or you'll get a 403 response.

Here's some simple sample code that gets the contents of a GitRepo and displays the URL to download each file (and using Newtonsoft.Json to help with the JSON parsing). Notice that for directories, you have to call another URL to get the files in that directory.

var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.Add(
    new ProductInfoHeaderValue("MyApplication", "1"));
var repo = "markheath/azure-deploy-manage-containers";
var contentsUrl = $"https://api.github.com/repos/{repo}/contents";
var contentsJson = await httpClient.GetStringAsync(contentsUrl);
var contents = (JArray)JsonConvert.DeserializeObject(contentsJson);
foreach(var file in contents)
{
    var fileType = (string)file["type"];
    if (fileType == "dir")
    {
        var directoryContentsUrl = (string)file["url"];
        // use this URL to list the contents of the folder
        Console.WriteLine($"DIR: {directoryContentsUrl}");
    }
    else if (fileType == "file")
    {
        var downloadUrl = (string)file["download_url"];
        // use this URL to download the contents of the file
        Console.WriteLine($"DOWNLOAD: {downloadUrl}");
    }
}

If you need to fetch the contents of a specific branch, you can simply append ?ref=branchname as a query string parameter.


0 Comments Posted in:

If you're building a new application in Azure and want to use a "serverless" approach, what should you use as a database? Obviously, one of the key goals of "serverless" is to avoid having to manage your own servers, so the classic "IaaS" approach of installing a database on a Virtual Machine isn't a good fit. But there are still plenty of great options. I talked about this in my "Building Serverless Applications in Azure" course on Pluralsight, but things have moved on a bit since then so I thought it was worth revisiting the topic.

As I see it, in Azure there are three main database options to choose between:

  • Relational databases - Azure SQL Database being the most obvious choice here
  • Document database - Azure Cosmos DB is Azure's offering in this space
  • The budget option (or "poor man's" database) - You can also use Azure Storage as a primitive database for minimal cost

Relational Databases

For many (if not most) software developers, relational databases are the most familiar, and they are often our go-to option for storing data. They have the advantage of allowing very flexible queries and joins between related entities (hence the name), but do require the schema to be designed up front, and modifying that schema requires some kind of migration to be performed.

Azure offers a choice of relational databases. The main one is Azure SQL Database, which is essentially a fully managed SQL Server in a PaaS offering. But there is also Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostgreSQL available if you are more comfortable with working with one of those databases.

Azure SQL Database is a great choice for a serverless application if you do decide that a relational database is the right choice for you. It's really easy to create one and there are several pricing tiers to support everything from a very small and cheap test system, all the way up to a powerful large-scale production system.

Azure SQL Database makes it really easy to enable key features for production scenarios such as encryption at rest with customer managed keys, backing up (with point-in-time restore), and replication to another region. It even comes with a superb query performance insights blade in the Portal that can tell you which of your queries are performing poorly and what indexes could improve them.

One disadvantage of going for a relational database in a serverless Azure application is that it is a little bit trickier to use from Azure Functions. There aren't built-in bindings like there are for Cosmos DB or Azure Storage, so you need to write your own Entity Framework code to access the database.

Another interesting recent development is that there is now a "serverless" pricing tier for Azure SQL Database. This essentially means that if your database is idle for a certain period (at least an hour) it can hibernate to save you money. It can also automatically scale itself up (within predefined limits) to respond to additional load. This might sound perfect for any serverless application but it does come with some caveats.

First, if your database has gone to sleep, there will be a fairly significant "cold start" penalty to wake it up (resuming takes up to a minute). And secondly, if your database never goes to sleep, then this option can work out more expensive. So beware of having scheduled jobs that run every hour with this approach, as your database will never go to sleep.

Document Databases

Document databases are in many ways a perfect fit for serverless architectures. Because you don't need to predefine your schema up front, they allow you to rapidly iterate and evolve your application over time with minimal fuss. Azure Functions come with some built-in bindings to simplify the code needed to read and store data in a document database.

Although Azure only offers a single document database offering - Cosmos DB, it is an extremely flexible and powerful database. It even supports a variety of different APIs including allowing you to use (for example) the MongoDB API if you're more familiar with that.

One of the most interesting features of Cosmos DB for serverless applications is its concept of a "change feed". This allows you to easily create an Azure Function that can "subscribe" to all changes to documents in a collection. This makes it really easy to generate "materialized views" that allow you to optimize performance and reduce costs of queries.

When Cosmos DB originally came out, the pricing model scared a lot of developers off - the cheapest possible database was three times the cost of the cheapest Azure SQL Database. But things have improved greatly.

Firstly, there is a free tier - allowing you to use a certain amount of resources for free each month which is great for testing and experimenting.

Secondly, Microsoft recently announced a serverless pricing model where the billing will only be based on storage and operations provisioned and could be a good choice for spiky workloads.

Thirdly, you can scale Cosmos DB up and down on the fly, and there is even an "auto-scale" feature that will intelligently scale up and down to save money during idle periods, while meeting demand during peak times.

Using Azure Storage as a poor man's database

Some serverless applications have very simple storage requirements. Maybe you don't often update data, or maybe you don't need rich querying capabilities, and can just look things up by their id.

Azure Storage offers very cheap ways of storing data. For example you could just store data in blobs as JSON or XML files. Or you could use Table Storage, which allows you to store simple table-based documents with a composite key of a "row key" and a "partition key". I've used both options for several small websites and microservices which simply didn't need the cost or complexity of a full database.

This approach can be a great starting point for a proof-of-concept app, and you can graduate later to a "proper" database as your needs change.

The Hybrid Approach

Of course, there's no reason why you have to pick just one of the above options. Especially if you are using a microservices architecture, each microservice can take it's own approach, using the one most appropriate database for the type of data you are storing.

In fact, you may find tht the best approach is hybrid, adding in services like Azure Cognitive Search Azure Redis Cache, Blob Indexer. So don't feel that you have to pick just one database type for storing all the data in your serverless application.