Posted in:

In this post, I want to share a few of my favourite LINQ improvements since LINQ was originally added to .NET way back in 2007. When I created my new LINQ Best Practices Pluralsight course, I was able to make use of several of the new LINQ operators which previously you needed to create yourself or use MoreLINQ for.

MaxBy and MinBy

The original set of LINQ operators only included Min and Max which is fine if you have a list of numbers and just want to find the biggest, but when you have a list of objects such as these books, and you wanted to find the book with the most pages, MaxBy (which was added in .NET 6) is what you really need:

var books = new[] {
	new { Author = "Chris Sainty", Title = "Blazor in Action" , Pages = 400 },
	new { Author = "Andrew Lock", Title = "ASP.NET Core in Action", Pages = 832 },
	new { Author = "Martin Fowler", Title = "Patterns of Enterprise Application Architecture", Pages = 533 },
    new { Author = "Bill Wagner", Title = "Effective C#", Pages = 288 }	};

var longestBook = books.MaxBy(b => b.Pages);

Of course, the downside of having MaxBy as part of LINQ is that it spoiled one of my favourite examples showing the pitfalls of attempting to work around missing LINQ methods by constructing them yourself. The following (bad) example implements a DIY MaxBy equivalent whose performance will be O(N2).

// don't do this!
var longestBook = books.First(b => b.Pages == books.Max(x => x.Pages));

Of course, the very powerful LINQ Aggregate method can be used to achieve the same result more efficiently, but I prefer the readability that comes with well-named methods where possible.

var longestBook = books.Aggregate((agg, next) => 
	next.Pages > agg.Pages ? next : agg);

Append and Prepend

Another very simple method I needed for a couple of my LINQ challenges was the ability to prepend a single element to the start of a sequence. This is particularly helpful when you need to insert a "starting" value to a sequence of numbers.

In the snippet below, I had a sequence of TimeSpans that represented the times at which a runner finished laps of a circuit. However, for the problem I went on to solve, I needed to insert a zero TimeSpan onto the beginning of the sequence. The Prepend method wasn't part of the original set of LINQ operators, but was added when .NET Core 1.0 came out (and is also available in .NET 4.7.1).

"00:45,01:32,02:18,03:01,03:44,04:31,05:19,06:01,06:47,07:35"
	.Split(',')	
	.Select(x => TimeSpan.Parse("00:" + x))
	.Prepend(TimeSpan.Zero) // insert onto the beginning of the sequence

This particular pipeline went on to use the MoreLINQ Pairwise extension method. Unfortunately, LINQ still doesn't include an equivalent of Pairwise (or the more general purpose concept of a "Window" operator). F# actually has both Seq.pairwise and Seq.windowed.

Chunk

Another extension method from MoreLINQ that I needed for one of my LINQ challenges was Batch. This batches elements from the original sequence together, which I used to process a text file that had a repeating structure of 7 lines of text.

Again, new in .NET 6 is the Chunk method which in the example below will return a sequence of arranges of strings of 7 elements each (or fewer on the last chunk).

foreach(var chunk = File.ReadAllLines("example.txt").Chunk(7))
{
	// ...
}

Asynchronous Streams

I've already written about the new asynchronous streams capability of C#. These help you when you need to run asynchronous methods as part of generating a sequence. This means you work with IAsyncEnumerable<T> rather than IEnumerable<T>. As I mentioned in that series, there is a really helpful System.Linq.Async NuGet package which lets you work with LINQ-like chained operators when you have an IAsyncEnumerable<T>.

Performance

The final thing I want to mention, is how impressive some of the performance gains there have been in .NET generally since I released my original More Effective LINQ course back in 2016. Every version of .NET has got faster, and the recent .NET 7 is no exception with some impressive LINQ improvements.

In my LINQ Best Practices course, I have a module devoted to performance, and the main focus is on avoiding common mistakes that can decrease performance. As part of my preparation for that module, I benchmarked a very simple mathematical LINQ pipeline to show that LINQ does add a small overhead compared to a regular for loop.

However, the point wasn't to suggest that people abandon LINQ - I think the situations that would call for that are very rare. Instead, I showed how for this particular scenario, parallelization is a much more effective way to speed up the code.

That remains the case (for my test pipeline at least) even with .NET 7, but what can be clearly seen is how much faster .NET 6 and 7 are than the old framework. If you're still stuck on legacy .NET 4.x then it is well worth prioritizing getting onto .NET 7 for performance reasons alone.

MethodRuntimeMeanErrorStdDevMedianRatioRatioSD
SumLinq.NET 6.0544.4 ms10.76 ms12.81 ms539.1 ms0.710.02
SumLinq.NET 7.0534.8 ms10.52 ms15.75 ms531.5 ms0.700.03
SumLinq.NET Framework 4.8776.0 ms12.53 ms10.46 ms775.2 ms1.000.00
SumForLoop.NET 6.0435.0 ms7.36 ms7.23 ms433.6 ms0.540.01
SumForLoop.NET 7.0434.5 ms8.55 ms8.00 ms432.3 ms0.540.01
SumForLoop.NET Framework 4.8803.2 ms15.20 ms14.22 ms801.5 ms1.000.00
SumParallel.NET 6.0206.4 ms3.98 ms5.58 ms207.0 ms0.720.03
SumParallel.NET 7.0216.7 ms2.65 ms2.47 ms216.6 ms0.740.03
SumParallel.NET Framework 4.8288.2 ms5.68 ms9.81 ms283.1 ms1.000.00

Summary

LINQ may not have changed a lot since it originally came out in 2007, but it has picked up several useful additions and improvements along the way. Hopefully there are more to come, like Pairwise, and a few other convenience methods that are still missing like ToDelimitedString and CountBy etc.

Want to learn more about LINQ? Be sure to check out my Pluralsight course LINQ Best Practices.