I'd like to know the state of knowledge about sound quality on a Windows PC, using Kernel Streaming (KS) rather than ASIO or WASAPI.
ASIO, WASAPI Exclusive, and WDM-KS are all bit-perfect as far as a typical software stack is concerned, so there shouldn't be any difference between them.
MME, DirectSound and WASAPI Shared do not provide bit-perfect guarantees. However, if the sample rate matches the one configured in the Windows audio control panel, and assuming you don't have any
APOs ("audio effects") enabled, the worst that can happen is an extra layer of dithering, which is benign. If the sample rate doesn't match, then you're at the mercy of the Windows sample rate converter. I'm not sure how good Windows SRC is; maybe some day I'll do some measurements to investigate. However, it's quite unlikely you'll hear a significant difference even with a crappy SRC.
I've been using ASIO in Foobar and JRiver for several years.
Whenever I play youtube vids (with Kernel Streaming I think), they sound equally good to me.
Unsurprising. You're highly unlikely to hear any difference between audio output methods. That's not what matters.
Here is what we can find on a Jplay forum. This commands discussion I think.
There are a few inaccuracies in the statement you quoted. (My credentials: I wrote two ASIO drivers, including
FlexASIO which is basically a bridge between ASIO and other output methods. As part of that work I investigated the various Windows audio APIs
quite extensively.)
Kernel Streaming is lowest 'audio' layer in Windows: Why go through more layers (=WASAPI) if direct access is possible?
This is wrong. WASAPI Exclusive also (typically) gives you direct access to the hardware audio buffers. The difference between WDM-KS and WASAPI is that the former has a lower-level interface for enumerating and configuring devices. But when it comes to the audio buffers themselves, both should provide direct access to hardware memory (if applicable).
both ASIO & WASAPI require memory copy operation per design
I'm not sure what this statement is supposed to mean. ASIO, WASAPI and WDM-KS can all provide direct access to hardware memory buffers, which is typically what is meant by "zero-copy" operation. Maybe WDM-KS has a way of telling the audio driver to get the data directly from a
user-space buffer, which would indeed remove one copy operation, but that would be the first I hear of it. Also, I don't see the point of discussing copies: copying data can affect performance (although not significantly), but it can't affect audio quality, since it's bit-perfect.
Buffer size – it has been established from experience that various Buffer sizes can have sonic impact with smaller values sounding 'better' for most.
That's an extraordinary claim, and it would take extraordinary evidence for me to believe it.
Not being able to manipulate Buffer size is a clear restriction for ASIO.
That statement is false; ASIO absolutely allows the user to customize the buffer size very easily. It's right there in the core ASIO API that all ASIO drivers have to support. Maybe some ASIO drivers only allow one buffer size, but then that's a limitation of that specific driver, not a limitation of ASIO in general.
In addition, ASIO may require even more memory copy operations then WASAPI as it expects left & right channel to be 'separated' (WAV format has samples interleaved).
True. But again, I don't see the point of discussing copies.
Kernel streaming, on the other had, has no such restrictions: one can even use Buffer of a single sample!
No matter what output method you use, a buffer size of only one sample will never work on a general-purpose OS such as Windows, which doesn't have the necessary real-time scheduling capabilities to make that work. It's also quite pointless. What the author might be referring to is single-sample
granularity (i.e. the application knows the cursor position with a precision of one sample), which I guess is pretty cool, and might be useful for achieving very low latency or precise synchronization between multiple clocks, but again I don't see how that's related to audio quality. (I'm also quite sceptical of the claim that KS can offer single-sample granularity on typical hardware.)
My personal opinion: if I had to choose between bit-perfect output methods, I would choose WASAPI Exclusive because that's the modern API that Microsoft actively supports and that audio devices are tested with nowadays. It's also a simpler API than WDM-KS, which means applications using it are less likely to have bugs where they're using the API incorrectly. ASIO is also a good choice if there is a native driver provided by the audio device manufacturer.