Storing coordinates in C# - Performance versus readability

December 27. 2020 Posted in:

This year I enjoyed solving the Advent of Code puzzles once again. And one of the recurring themes was needing to deal with coordinates, both 2D, 3D and even 4D (as well as hexagonal coordinates again).

Today I thought I'd share a slightly long and rambling tale of a rabbit-hole I went down solving one of the puzzles and a few of the things I discovered along the way.

Storing coordinates

In .NET there are many options for storing coordinates, such as Point, although that brings in an unwanted dependency on System.Drawing. There are also some Vector classes kicking around (including a 3D one) although I didn't need or want floating point coordinates in this case.

I could also have chosen an int[], which is flexible enough to store any number of dimensions but can't be used as the key for a HashSet which I needed for several puzzles. And so ValueTuple<int,int,int> was the obvious choice and is what I used initially in all the puzzles this year.

ValueTuple limitations

For the most part, value tuples in C# are fine, but they do have a few rough edges. For example, tuple deconstruction doesn't work in LINQ statements, meaning you have to either use the ugly Item1 and Item2 names, or explicitly declare the names everywhere (e.g. (int X, int Y)) which can get a bit repetitive.

I also wanted to add my own custom methods, such as adding together two coordinates, or enumerating all the "neighbours" of a point. Of course this could be achieved with simple extension methods on an (int,int,int) tuple:

public static (int X, int Y, int Z) Add(this (int X, int Y, int Z) a, 
                                             (int X, int Y, int Z) b)
    => (a.X + b.X, a.Y + b.Y, a.Z + b.Z);

But for the code I was writing it would be really convenient to have a few additional characteristics for the type I used to store coordinates. I wanted it to implement IEnumerable<int> (which ValueTuple<int,int,int> doesn't) and for the coordinate types for 2D, 3D and 4D to share a common base class or interface so I could write generic algorithms that worked against coordinates in any number of dimensions.

So to clean up my code a little I tried a quick experiment to create my own Coord class.

Making a custom Coordinate class

My first idea was super simple. Just store the coordinate values in an int[]. That way I could very easily implement IEnumerable<int>, and support any arbitrary number of points.

I don't have the original version of my Coord class anymore, but it was something along these lines, with a bit of LINQ thrown in to implement Equals and GetHashCode for an arbitrary number of dimensions. I knew I needed Equals and GetHashCode because I was storing instances in a HashSet.

// n.b. this code has some issues - don't copy this!
public class Coord : IEnumerable<int>
{
    private readonly int[] coords;
    public int this[int index] { get => coords[index]; }
    public Coord(int x, int y) { coords = new[] { x, y}; }
    public Coord(int x, int y, int z) { coords = new[] { x, y, z}; }
    public Coord(IEnumerable<int> c) { coords = c.ToArray(); }
    public override bool Equals(object other)
    {
        if (other is Coord ca)
            return coords.Zip(ca.coords).All(x => x.First == x.Second);
        return false;
    }
    public override int GetHashCode() => coords.Aggregate((a, b) => a ^ b);
    public IEnumerator<int> GetEnumerator() => 
                ((IEnumerable<int>)coords).GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() => coords.GetEnumerator();
}

Nice and simple, and although I hadn't particularly thought about performance, I wasn't expecting it to be awful. However, it was terrible. Switching from (int,int,int) to Coord slowed down my solution by almost 100 times!

Performance optimization round one

After a bit of experimentation, I realized that the main source of my performance woes was the implementation of Equals and GetHashCode. I also thought that switching to a struct would likely help, and I also abandoned the idea of using an int[] and just stored each dimension as a separate int.

This would mean I would need to create separate types for 2D, 3D and 4D coordinates, but they could at least share a common base interface (structs are not allowed to inherit from each other in .NET), and could still implement IEnumerable<int>.

This let me rewrite the Equals and GetHashCode in what appeared to be code so simple it had to perform extremely fast right?

public override bool Equals(object other)
{
    if (other is Coord ca)
        return coords.x == ca.x && coords.y == ca.y && coords.z == ca.z;
    return false;
}
public override int GetHashCode() => x.GetHashCode() ^ 
    y.GetHashCode() ^ z.GetHashCode();

Well to my surprise, despite being much faster it was still horribly slow compared to plain old ValueTuple<int,int,int>. What could I be missing?

Proper hash codes

Turns out my hash code algorithm was stupid. The hashcode of an integer in .NET is just the value of that integer. And XORing integers together produces the same result, regardless of the order. So the hashcodes of coordinates (1,2,3), (3,2,1), (1,3,2) etc were all the same. This really hurts the performance of HashSet if you are storing lots of values that have hash collisions.

This led me explore the hash code generation used by ValueTuple<int,int,int>.

The first source code I found here, revealed this implementation at its base:

internal static class HashHelpers
{
    public static readonly int RandomSeed = 
        new Random().Next(int.MinValue, int.MaxValue);

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int Combine(int h1, int h2)
    {
        // RyuJIT optimizes this to use the ROL instruction
        // Related GitHub pull request: dotnet/coreclr#1830
        uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
        return ((int)rol5 + h1) ^ h2;
    }
}

This greatly improved overall performance, but I was still not quite as fast as just using (int,int,int). I think the actual .NET Core hashing algorithms used by ValueTuple can be found here, but in the end I decided that this very simple implementation from Jon Skeet on StackOverflow (who else) would be fast and good enough for my needs:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        hash = hash * 23 + x;
        hash = hash * 23 + y;
        hash = hash * 23 + z;
        return hash;
    }
}

Performance optimizations round 2

By this stage, I'd accomplished my goal of making a Coord type that made my code more generic and readable, and performed reasonably well. But annoyingly it still wasn't quite as fast as ValueTuple.

I got a little bit more of a speedup by directly implementing IEquatable<int> as suggested here.

But at that point I was running out of ideas. Even pre-calculating the hash in the constructor didn't speed me up at all, and a few other off-the wall ideas couldn't quite make my Coord type as fast as just using (int,int,int).

However, I suspect that part of the difference was that I wasn't doing proper benchmarking. My Coord class was compiled under debug, whereas the ValueTuple would have been a release build. So it's quite possible that my Coord type can actually match ValueTuple in a fair fight.

Obviously Benchmark.net would be the ideal tool to use if I were to really want to properly compare the two approaches.

Operator overloading

One of the goals of creating my own Coord type was to make useful helper methods directly available. One of those was an Add method. This is obviously a good candidate for operator overloading, which can be achieved in C# with the following syntax:

public static Coord operator +(Coord a, Coord b)
{
    return new Coord(a.x + b.x, a.y + b.y, a.z + b.z);
}

Tuple deconstructing

One new technique I was able to apply was "tuple deconstruction". This basically enables you "unpack" the elements of the struct into their own named variables just like you can with a regular ValueTuple. All you need to do is implement a Deconstruct method like this.

public void Deconstruct(out int x, out int y, out int z)
{
    x = this.x;
    y = this.y;
    z = this.z;
}

With this in place you can write code like this:

var (a,b,c) = myCoordinate;

And I also added some implicit casting operators in as well making it easy to switch between my Coord type and ValueTuple<int,int,int>:

public static implicit operator (int, int, int)(Coord c) => 
                                (c.x, c.y, c.z);
public static implicit operator Coord((int X, int Y, int Z) c) => 
                                new Coord(c.X, c.Y, c.Z);

This allows me to write code like this, and benefit from the more concise C# syntax of ValueTuples:

Coord pos = (1,6,2);

Performance versus readability

So I eventually managed to achieve the goal of a Coord type instead of using ValueTuple which made my code read a little better and opened the door to writing more generic code for different numbers of dimensions.

But it did come at a slight performance penalty. Which raises the interesting question of what matters most, performance or readability?

The good news is that in many cases, it's not a tradeoff you need to worry about.

First of all, performance and readability are not necessarily at odds - much of the time the simpler your code is, the better its performance and readability will be. In addition, the more readable you code is, the easier it is to spot ways to improve its performance and inefficiencies in its structure.

Secondly, not all the code you write needs to be performance tuned to a high degree. It turned out that certain methods on the type I chose to create were being called millions of times a second in a tight loop, and so even small inefficiencies resulted in big slow-downs.

This is why profiling your code is so important before attempting to improve performance. Find out which bits of code that are actually taking the most time, and focus your efforts on improvement there.

Lessons learned

Obviously this whole exercise was just for a throwaway fun puzzle, but I learned a lot in the process, which is one of the benefits of doing something like Advent of Code.

I certainly learned a few things about how to get fast performance in a HashSet, and this exercise also highlighted the value of having good unit tests. I could very quickly try out different implementations of my Coord class and be sure I hadn't broken anything, as well as being able to use the unit tests as a rudimentary form of benchmarking.

By the way, here's the source code for the Coord class. Sadly I never did get round to extending it to have 2D and 4D versions which was a key reason for making this in the first place, and I also wanted to create a Grid class that provided convenience methods to access elements in a grid by their coordinates.

And of course, I'm sure some of you will be able to let me know in the comments some ways to improve performance further, so I look forward to reading those.

Comments

December 27. 2020 22:50

Mark, I have a simple class I developed for NAudio to read file, write file, play file and play sound data from memory. Playing sound from memory took some digging around. Yes, there are lots of examples but often too complicated. I wanted the fewest number of lines of code. I only tested with a single audio channel. I would like to send you the class file as a contribution. I am sure that others will find it extremely useful in its simplicity, especially the code to play sound data from memory. You can reach me at BKrell@SWArchitects.com. Feel free to reach out.

Bruce Krell

December 28. 2020 20:21

Why your Coord isn't a struct (like Point is)? All its members are primitives what makes it a perfect candidate.

Alexander Batishchev

December 28. 2020 20:38

it is a struct

Mark Heath

December 28. 2020 20:44

Cool, way to go! The only time you have type definition above says it's a class though :)

Alexander Batishchev

December 28. 2020 22:28

yeah, I wanted to show the naive version first, and then explain how I made it faster. The performance cost of using class instead of struct was very small compared to the cost of having slow Equals and GetHashCode for my algorithm. But I did end up with a struct

Mark Heath

December 29. 2020 01:08

Why would you even benchmark against a debug build in your first perf test? Sounds like you missed the mark early on and should've started with a release/jit-optimized build. You invalidated all subsequent benchmarks by doing so.

Dave Black

December 29. 2020 07:37

I wouldn't call what I did benchmarking. I just noticed I'd made things horribly slow and tried to find out why. For a real-world project, I'd probably do exactly what you said. But this was just me sitting in my pyjamas for an hour before breakfast trying to get my Advent of Code star for the day!

Mark Heath

December 29. 2020 08:27

Do you know about HashCode.Combine?

Suchoss Y

December 29. 2020 08:40

No, sounds like that would have been ideal. Presumably similar to the HashHelpers.Combine I found

Mark Heath

December 29. 2020 10:44

Yep. You can just call this function and do not have to think about it. I have discovered it recently.

Suchoss Y

December 29. 2020 13:43

My first thought was that you didn't make the struct readonly which would explain some performance hit since the compiler would have to do defensive copies.
But given that .NET Core is open source we can easily check what they're doing and lo and behold their implementation is mutable so read-only is right out. They do use fields instead of properties which can have some effect on debug builds or when the inline budget is reached, so there's one possible explanation.
The hashcode isn't perfect either. Yes linear congruence using prime numbers is reasonably good, but that method is from the 70s - we've made some improvements in that area since then (23 isn't the best choice for one). In particular if you care about high throughput it's not a great fit for modern CPUs.
All in all I'd stick with the library functions to combine hashcodes except if you knew what you were doing, had good benchmarks and a certain goal in mind.
So all in all I'll go with the option "bad benchmark" - I think the implementation should be in the same ballpark as the ValueTuple one.

Daniel Sturm

December 29. 2020 13:49

Also one fun (and in practice utterly stupid) thing to do would be to write a source generator that generates the source code for a Coord class for N dimensions ;-)
Should be reasonably easy but waaaay more effort than copying the class 2 times and adapting the few lines of code.

Daniel Sturm

December 29. 2020 15:09

great idea!

Mark Heath

December 29. 2020 16:49

Yes linear congruence using prime numbers is reasonably good, but that method is from the 70s - we've made some improvements in that area since then (23 isn't the best choice for one).

Interesting....could you recommend a better solution? Larger prime number or something else? TIA

Dave Black

December 29. 2020 16:51

Fair enough :) Good way to start the day!

Dave Black

December 29. 2020 18:28

Oh I just dabbled in the whole thing years ago, I'm very much not an expert on the topic. Also it's hard to recommend one best algorithm, since there are many things you can optimize for. What data types are you interested in? How important is the distribution of bit patterns across the whole range? How important is throughput (and what CPUs/what native instructions are available?).
A good primer might be the JEP where the Java guys are considering replacing their old hash algorithm: https://openjdk.java.net/je...
You might also want to look into Murmur or CityHash - and honestly I haven't kept much up, so who knows what's out there these days. But certainly not a bad starting point.
And last but not least, there were people doing exhaustive tests to find the best numbers for the LCG method (i.e. those numbers where each input touches the most number of possible bits, it was definitely a number much larger than 23 or 31) - you should be able to find their results with some searching I hope. Very little effort and might be good enough.

Daniel Sturm

January 7. 2021 02:41

Why not a record?

Oisín G.

January 7. 2021 17:22

sure, could have tried a record as well, but wanted to experiment with different implementations of underlying methods

Mark Heath