Thoughts on Windows RT Audio APIs and BUILD 2014

One of the sessions I was most excited about at BUILD this year was Jason Olson and Pete Brown’s session on Sequencers, Synthesizers, and Software, Oh My! Building Great Music Creation Apps for Windows Store.

I had been hoping before the conference that we might see MIDI support for WinRT...

@Pete_Brown looking forward to seeing the synth one, particularly if any MIDI support is coming for Windows Store apps — Mark Heath (@mark_heath) March 31, 2014

...and I wasn't disappointed. The talk covered a brand new MIDI API for Windows RT (requires 8.1 Update 1), as well as giving some guidance on low latency WASAPI usage.

Let's look first at the MIDI support…

Basically, it allows you to access MIDI in and out devices (MIDI over USB), receiving and sending messages. This is great news, and enables the creation of several types of application not previously possible in Windows Store apps (I’d wanted to make a software synthesizer for quite some time, so that’s first on my list now).

It’s distributed as a NuGet package allowing access from C#. This is huge in my opinion. One of the most painful things about doing audio work in C# has been endlessly creating wrappers for COM based APIs such as WASAPI and Media Foundation. It's really hard to get right in C# and you can run into all kinds of nasty threading issues. I had hoped when WinRT originally came out that it would wrap WASAPI and MediaFoundation in a way that eliminated the need for all the interop in NAudio, but sadly that was not to be the case. But this represents a great step in the right direction, and I hope it becomes standard.

What can't it do:

It doesn't read or write MIDI files, but that's not a particularly big deal to me. NAudio and MIDI.NET both can do that.

It doesn't provide a MIDI sequencer. You'd need to schedule out your MIDI out events at exactly the right time to play back a MIDI file from your app.

It doesn't offer up a software synthesizer as a MIDI out. That would be a great addition, especially if you could also load the software synthesizer up with SoundFonts.

There is no RTP MIDI support but they suggested it might be a feature under consideration. This would be useful for wireless MIDI, and could also form the basis for sending MIDI streams between applications which could also be useful.

And now onto WASAPI

There was some really useful information in the session on how to set up your audio processing to work at a pro audio thread priority, and how to use a “raw mode” to avoid global software based effects (e.g. a graphic equaliser) introducing additional latency. Getting good information on WASAPI is hard to come by, so it was good to see recommendations as well on using event driven mode and when to use exclusive mode.

What I would love to see with WASAPI is exactly the same approach as has been taken with MIDI. Give us a NuGet package that presents both the Render and Capture devices in a way that can be used from C#. Let me easily open an output device, specify if I need pro audio thread priority, raw mode, exclusive mode, and then give me a callback to fill the output buffer on demand with new samples. Ideally it should incorporate resampling too, as it is a huge pain to resample yourself if you are playing a 44.1kHz file while your soundcard is operating at 48kHz.

Likewise with capture. Let me easily open any capture device (or loopback capture), and get a callback whenever the next audio buffer is received containing the raw sample data. This might be possible already with the MediaCapture class but I'm unclear how you get at the raw samples (somehow create a custom sink?). Again, I’d like to be able to specify my capture sample rate and have the API resample on the fly for me, as often with voice recordings, 16kHz is quite acceptable. WASAPI is much harder to use than the legacy waveIn APIs in this regard.

Now Jason Olson did provide a link to a cool sample application on GitHub, showing the use of these APIs in C++. However, there seems to be a general assumption that if you're using WASAPI you are going to be working in C++. This is probably true for pro audio music applications. Despite having invested a huge amount of my time over the last 12 years in creating a C# audio library, I still recommend to people needing the lowest possible latency that they should consider C++.

But I also know that there is a huge amount of audio related development that does not require super low latency, particularly when dealing with voice recordings such as telephony, radio or VOIP. Recording and replaying voice is something that many businesses want to create software for, and a C# API greatly simplifies the development process. Both my experience from supporting NAudio, and having worked a decade in the telecoms industry tells me that there is a huge market for applications that deal with speech in various formats.

Finally, I'd also like to see Microsoft give some thought to creating an audio buss that allows WinRT apps to transmit real-time audio between themselves, allowing you to link together virtual synthesizers with effects. This could form the basis of solving the audio plugin problem for DAW-like applications in the Windows store.

Anyway, they repeatedly asked for feedback during the talk, so that’s the direction I’d like to see WinRT audio APIs going in. Seamless support for using audio capture and render devices from C# with transparent resampling on playback and record.