0 Comments Posted in:

LINQ provides a very elegant and powerful way to work with sequences of data in C#. You can combine the LINQ "operators" (such as Select or Where) to form "pipelines" that filter and transform elements of IEnumerable<T> sequences.

But one thing that can be a bit tricky is when you need to introduce asynchronous calls into your LINQ pipeline. This might be that generating the initial sequence itself requires asynchronous calls. Or maybe as part of various filtering and mapping stages in a LINQ pipeline you might want to perform asynchronous operations. And of course it's quite common for you to want to consume your LINQ pipeline by performing asynchronous operations, which you might want to do in parallel.

All of this is possible, and a relatively recent addition to .NET is a feature called "async streams", which introduces the IAsyncEnumerable<T> interface, which is similar to IEnumerable<T> but supports working with asynchronous streams of data.

I'm hoping to write a few posts on this topic, and we'll get started by discussing the situations where you might want to consider using IAsyncEnumerable<T>. We'll also see the C# language features that allow us to generate and consume asynchronous sequences thanks to some new capabilities of the yield and foreach keywords.

Let's start by focusing at the "top" of a LINQ pipeline. What if the method that produces my sequence needs to make some asynchronous calls? In this case we have two main options - to use the new IAsyncEnumerable<T>, or to just return a Task<IEnumerable<T>>...

Should I return Task<IEnumerable<T>> or IAsyncEnumerable<T>?

First things first, I am not generally a fan of returning an IEnumerable<T> from a method, as it requires the caller to consider deferred execution. When we receive an IEnumerable<T> we can't assume that the work required to generate each value in the sequence has happened yet, and it is also not safe to enumerate the sequence twice.

For that reason, I tend to return IReadOnlyCollection<T> for all situations where I'm actually returning an in-memory collection of items. I've written more on this here. But the question still remains - when should we return an IAsyncEnumerable<T> instead of just using Task<IReadOnlyCollection<T>>?

For me the key difference is whether there is potentially the need to do some asynchronous work when you move to the next element of the sequence or not. If all the asynchronous work is done up front, then I think the Task option is just fine.

Here's a very common example, where the method must first make a network request to fetch some JSON which it deserializes. But once that's been done, we can return anything that implements IEnumerable<T> allowing it to be easily used in LINQ to objects pipeline.

public async Task<IReadOnlyCollection<Customer>> GetCustomers()
{
    var customers = await httpClient.GetFromJsonAsync<List<Customer>>("/api/customers");
    // all the customers are in memory by the time we return from this method
    // no real need to use IAsyncEnumerable<T> here
    return customers;
}

But sometimes, additional asynchronous work is needed to keep iterating through. Imagine our Customers API returns pages of results, and we'd like to receive a Customer iterator that makes calls to the API to fetch each page on an as-needed basis (rather than fetching all pages up front).

We might want to do something like this, but it won't compile, because you can't use the yield keyword if you're returning a Task<T>:

// WILL NOT COMPILE
public async Task<IEnumerable<Customer>> GetCustomers()
{
    var continuationToken = "";
    do
    {
        var page = await httpClient.GetFromJsonAsync<CustomerPage>
            ($"/api/customers?continuationToken={continuationToken}");
        foreach (var c in page.Customers)
        {
            yield return c;
        }
        continuationToken = page.ContinuationToken;	
    } while (!String.IsNullOrEmpty(continuationToken));
}

This is the scenario in which IAsyncEnumerable<T> is helpful. With one small change to our method signature, we can do exactly what we wanted. Essentially it means we can use yield return in a method that has makes some asynchronous calls.

public async IAsyncEnumerable<Customer> GetCustomers()
{
    var continuationToken = "";
    do
    {
        var page = await httpClient.GetFromJsonAsync<CustomerPage>
            ($"/api/customers?continuationToken={continuationToken}");
        foreach (var c in page.Customers)
        {
            yield return c;
        }
        continuationToken = page.ContinuationToken;
    } while (!String.IsNullOrEmpty(continuationToken));
}

This is in fact exactly how several of the Azure SDKs now work. If you want to list all Blobs in an Azure storage container, we can call the GetBlobsAsync method which returns an AsyncPageable<T> which is a special implementation of IAsyncEnumerable<T> that allows us to either fetch a page at a time, or iterate through without needing to know or care when we've moved on to the next page.

Consuming IAsyncEnumerable<T>

One of the challenges with IAsyncEnumerable<T> is that you need to consume it in a slightly different way to a regular IEnumerable<T>.

C# 8 introduced a new form of foreach that allows us to loop through the IAsyncEnumerable<T> and do perform actions on each element. It's as simple as adding the await keyword before foreach:

await foreach(var c in GetCustomers())
{
    Console.WriteLine(c);
}

If you're wondering about whether you can create LINQ pipelines by chaining methods like Select and Where with IAsyncEnumerable<T>, that is possible, but I'll save that for a future post.

Avoid returning Task<IAsyncEnumerable<T>>

The last thing I want to mention in this article is to avoid returning Task<IAsyncEnumerable<T>>. There are some situations where you might be tempted to do this, but it makes the calling code unnecessarily convoluted, and there's a simple alternative.

Imagine there's already a method that returns an IAsyncEnumerable<T> like this:

public async IAsyncEnumerable<Customer> GetCustomersFromApi(string apiKey)

And you want to create a higher level method that fetches the API key and then just returns the IAsyncEnumerable. You might write something like this:

// avoid this!
public async Task<IAsyncEnumerable<Customer>> GetCustomers()
{
    var apiKey = await GetApiKey();
    return GetCustomersFromApi(apiKey);	
}

However this forces the consumer to write cumbersome code with an additional await:

await foreach(var customer in await GetCustomers()) 

There's a relatively easy workaround, by simply using the yield return syntax in your method. (In fact, I suspect there may be more than one way of doing this so let me know in the comments if there's a better alternative).

public async IAsyncEnumerable<Customer> GetCustomers()
{
    var apiKey = await GetApiKey();
    await foreach (var c in GetCustomersFromApi(apiKey))
    {
        yield return c;
    }
}

Next steps

So far in this article, seen how you can generate and consume an IAsyncEnumerable<T>, as well when to use it in preference to returning an Task<IEnumerable<T>>.

Next up, I want to discuss situations in which the stages in your LINQ pipeline might need to perform asynchronous operations, and we'll see how IAsyncEnumerable<T> can be useful in those situations as well.

Want to learn more about LINQ? Be sure to check out my Pluralsight course More Effective LINQ.