I recently gave a talk about some of the challenges I encountered writing audio applications in C#. One of the key issues I talked about was how to handle situations where I wanted to use algorithms in C# that existed in other languages, but no "fully managed" implementation was available.
The two main scenarios in which this occurs are:
(1) Using DSP algorithms such as a resampler or a Fast Fourier Transform (FFT). Other examples might be digital filters, low-latency convolution, echo suppression, pitch detection and varispeed playback, all of which are very useful building blocks for audio applications and non-trivial to implement yourself.
And (2) using codecs to encode or decode audio in proprietary formats such as MP3, AAC, Ogg/Vorbis etc. Again, these require specialist knowledge that makes it unrealistic to simply implement your own version just because the programming language you have selected doesn't provide that capability in its core framework.
Porting or Interop?
Every time I ran into issues like this with NAudio (which was very frequent) I had two main choices:
Option 1 was to find an existing (usually C/C++) implementation of the algorithm and port it to C#.
Option 2 was to write P/Invoke wrappers for a native DLL so a .NET application can make use of the existing implementation.
In this article I want to explore the benefits and disadvantages of each approach, and ask the question of whether in the future there might be anything that makes life easier for us to consume code written in other languages.
Interop to native code
Let's start with writing P/Invoke wrappers around a native DLL. There are several advantages to this approach.
First of all, using a native library ought to be faster. Although the performance overhead of porting C code to C# is theoretically minimal, it's likely that an established library implementing a codec or DSP algorithm will have undergone a fair amount of performance tuning that a straightforward port of the code might not fully benefit from.
Secondly, it's less error-prone. Although the syntax of C/C++ and C# are on the surface of things very similar, there are plenty of gotchas waiting for you, particularly when dealing with pointers, or bitwise operations.
Thirdly, and perhaps most importantly, it's much easier to keep up to date. Porting an existing codebase from one language to another is something you want to do once and then forget. However, if the codebase you ported is receiving regular bugfixes or updates, it is a real pain to keep having to migrate the code deltas across.
A good example here is the Concentus open source port of the Opus reference library to C#. The project is an amazing achievement, but the author understandably now no longer has time to maintain the port, resulting in it falling behind of the latest version of Opus. By contrast, a P/Invoke wrapper requires much less maintenance.
It's not all good news though. First of all, a native wrapper is only usable if you have binaries for the specific platform you want to run on. Even for Windows, that often means you need 32 and 64 bit versions of the wrappers (and native DLLs). And if you want to run on other supported platforms (such as Linux, or ARM64), then you can't use regular Windows DLLs - you need to find platform specific native libraries.
And that's not all. There are several scenarios in which your .NET code is running in a sandboxed environment. I initially ran into this with NAudio when I wanted to use it in Silverlight, but other situations such as Windows 10X applications, Blazor applications, or running in environments like Azure App Service you may not be able to call native APIs at all, or be restricted as to which APIs can be used.
In these environments, you often have no option but to find some way to create a fully managed version of the algorithm you need. So let's consider porting next.
Porting to managed code
Despite the considerable amount of additional work, there are several benefits to porting to managed code. Obviously, one huge advantage is that the code is now fully portable and able to run in on any platform that .NET can run on, whether that be in a browser with Blazor or part of a Xamarin Android or iOS app.
Another advantage is that you have the opportunity to reshape the API to be more idiomatic, making it feel more natural for .NET developers.
You also get the safety benefits that come with the managed execution environment, such as protection overrunning the bounds of arrays or dereferencing pointers to memory that has been freed.
One issue I ran into with NAudio and porting was to do with licensing. A large percentage of open source audio code is licensed under GPL or LGPL, which are both incompatible with the more commercially friendly MIT license that I was using.
This meant that even when there were perfectly good algorithms available for porting to C#, I wasn't able to use them. This was especially annoying when I needed a good resampler, and it was quite some time before I found one that I was able to get permission to include in NAudio (the WDLResampler).
And although I listed performance as a benefit of interop, there are actually some potential performance benefits to fully porting an algorithm to managed code. That's because interop itself adds some overhead. In his superb article on performance improvements in .NET Core 3.0, Stephen Toub says "one of the key factors in enabling those performance improvements was in moving a lot of native code to managed".
Isn't there a better way?
For many parts of NAudio, due to the tradeoffs of interop versus managed, I ended up doing both. I wrapped three separate native resampler APIs (ACM, DMO and MFT) in addition to porting a managed one. I also created wrappers for two MP3 decoders, as well as created a fully managed one.
But what if we didn't have to keep doing this? Why isn't there some kind of universal format that would allow us to share code between almost any language? It feels like it's about time that something like that should exist. And maybe we're finally getting close...
The way this would be possible is with some kind of "intermediate representation". You take the original code, and convert it into a common representation that allows it to run in more than one environment.
And of course the .NET framework itself has excellent language interoperability by virtue of the fact that we can compile to CIL (Common Intermediate Language). This means that code I write in C# can be consumed in F# and vice versa.
All this is very nice, but it's far from universal. We just have small families of related languages that interoperate nicely, while everything else is still outside.
For example, Java also has an intermediate representation called "bytecode". The Java Virtual Machine (JVM) can run any code that compiles to bytecode, but can't directly run things compiled to CIL. Likewise the .NET Common Language Runtime (CLR) can't run Java bytecode.
But in recent years, we've seen the emergence of a new intermediate format that shows promise to take cross-language interoperability a lot further.
Is WebAssembly the universal binary format?
What's particularly impressive about WebAssembly is that many languages that are traditionally thought of as low-level languages that compile to native code can also be compiled directly to WebAssembly. This means that code written in C or C++ as well as newer languages like Rust or Go can compile directly into WebAssembly.
There have been some really impressive demos such as running the Doom engine, or AutoCad on WebAssembly. Both are examples of very large legacy C/C++ codebases that previously would be unthinkable to run in a browser.
It gets even better. Although you can't compile C# directly to WebAssembly, (because it relies on capabilities of the CLR that aren't offered by WebAssembly), you can compile a stripped-down version of the .NET CLR into WebAssembly. And this is the magic that powers Blazor. Essentially, this means that you can write C# code and have it run in any browser, which is pretty incredible. Blazor has tremendous momentum and popularity and even those of us who are a bit jaded after the death of Silverlight can see that it has a promising future.
The way Blazor works is something along these lines. The C# code is still compiling to CIL, but because the CLR can run on WebAssembly, it can load and run CIL.
And .NET isn't the only runtime that can execute in the browser thanks to WebAssembly. There's Pyodide which brings the Python 3.8 runtime to the browser via WebAssembly, along with the Python scientific stack including NumPy, Pandas, Matplotlib, SciPy, and scikit-learn. And there's a great list here pointing to several similar projects for other popular programming languages.
What's still missing
All this is pretty cool, but actually it still doesn't get me what I want. Most of the audio-related libraries I want to consume in .NET are written in C/C++, and although they can be compiled to WebAssembly, and theoretically, I could write a C# Blazor app that calls into those WebAssembly libraries, that's not actually what I want to do. The missing piece of the puzzle would be for a regular .NET application being hosted by the .NET CLR to be able to call methods in a WebAssembly library and for them to run as though they were fully managed code. Is something like that possible?
Well theoretically it ought to be possible. After all WebAssembly is a fairly constrained set of instructions, so they could each be mapped to CIL instructions. And it turns out that there are a couple of open source projects attempting to do exactly that.
The first is a project started by Eric Sink, called wasm2cil. As an example of what can be achieved, Eric says "I can start with the C code for SQLite and its shell application, compile it to Wasm with Clang, "transpile" it to a .NET assembly, and then run it with .NET Core. The result is the SQLite shell, entirely as managed code, with no pinvokes involved."
Super cool stuff, and it's open source. However, Eric is also clear that this is an experimental work in progress. There are various bits and pieces not yet supported.
Another project along similar lines I found is dotnet-webassembly by Ryan Lamansky. It appears to be under active development, although it too is not necessarily complete. There is a GitHub issue about whether it is mature enough to run the ffmpeg libraries, which contain a huge toolkit of codec-related capabilities. It seems that it initially did not work, but some progress has been made on this front recently.
So it seems we are getting close, but not there yet. If it does become possible to take a DSP or codec library compiled into WASM, and call it as though it were a fully managed .NET library without a large performance penalty, that would be a huge boost to the .NET platform in general. C# developers like myself could use almost whatever C/C++ libraries they want without needing to do interop to native code or go through the pain of porting.
It's interesting and encouraging to see that the .NET team are taking WebAssembly seriously and have many features planned to improve the experience for running .NET applications on WebAssembly. Of course, that doesn't directly address my concern of running WebAssembly code on the .NET platform, but I'm hoping at some point it will get on the radar.
In conclusion, it can be a frustrating experience when your language of choice (in my case C#) doesn't give you easy access to perfectly good existing code that was written in other languages. It would be wonderful if there was some kind of universal "intermediate format" where code that implements things like DSP and algorithms (which are inherrently portable between languages and operating systems) could be easily used no matter what programming language you were using or what platform you were running on. WebAssembly shows some real potential to be that universal format, and I'll be keeping a close eye on this space over the next few years to see how things evolve.
For now though, when I want access to codecs or DSP in C#, I'm still stuck with choosing between P/Invoke or porting, and more often than I would like, I end up doing both.