Importance of USB audio stream latency?...

robin79 · Aug 21, 2020

Hi all, I have signed up only very recently, I hope I'm not posting on the wrong forum/sub-forum, please redirect me to the right one in case, thanks.

I'm a happy owner of a DX7 Pro DAC, it is an awesome device and I couldn't be happier.

In my system I stream audio to the DAC from a mini-itx computer built with components from mini-itx.com:

Q1900DC-ITX fanless mobo, with 19V DC connector
M350 enclosure
Samsung 860 EVO 250GB SSD
2 x 2GB DDR3 Crucial SODIMM 1600 (I had them already)
external 90W power supply (taken from a video surveillance system, gift from a friend, no brand on it)

more or less $300 for the whole device, at that time.

I run Debian Linux on it, optimized for low latency and low power at the same time, using ALSA drivers through asynchronous USB.

The OS allows me to do fine tuning on ALSA, specifically on:

sample frequency → always 44.1Khz for me, I only have CDs
size of the output buffer (in terms of number of samples) → my value is '16' samples
period/buffer (how many interrupt will occur to fill the buffer once) → my value is '3'

These parameters imply the latency of the samples: in my case the estimated latency is 0.4ms, that is considered a very low (hence good) value for a little system like mine.

Here come the point: with that low value of latency I obtain a great sound quality, but if I change the parameters and the latency goes higher, then the sound is sensibly worse.
I don't mean stuttering or "pops" or any kind of noise, I mean less beautiful: less detailed and defined, less impact, smaller "space" etc. "Worse" in HiFi terms.

I really cannot understand this behavior: to my knowledge, latency should not have anything to do with the sound quality, it should only be a small delay in the transmission... am I wrong?
Is the DX7 Pro somehow sensible to latency in the USB data stream? Is there maybe a relation between audio data latency and jitter?...

I'll be very grateful if you can explain this mystery to me.
Thanks in advance.

sergeauckland · Aug 21, 2020

I really cannot understand this behavior: to my knowledge, latency should not have anything to do with the sound quality, it should only be a small delay in the transmission... am I wrong?

No, you're not wrong. Latency has nothing to do with sound quality. In fact, it could be argued that a DAC with a very slow Phase-Locked-Loop gives lower jitter and therefore theoretically better sound quality (although probably not audible) but it will have longer latency which is why it's not done much that way.

It is desirable for latency to be low as DACs are often used with video, and keeping sound and vision synchronised is important. For audio-only applications, latency doesn't matter at all, although it would be somewhat inconvenient if the sound came out 30 minutes after pressing play!

S.

ElNino · Aug 21, 2020

robin79 said:
Here come the point: with that low value of latency I obtain a great sound quality, but if I change the parameters and the latency goes higher, then the sound is sensibly worse.
I don't mean stuttering or "pops" or any kind of noise, I mean less beautiful: less detailed and defined, less impact, smaller "space" etc. "Worse" in HiFi terms.

I really cannot understand this behavior: to my knowledge, latency should not have anything to do with the sound quality, it should only be a small delay in the transmission... am I wrong?
Is the DX7 Pro somehow sensible to latency in the USB data stream? Is there maybe a relation between audio data latency and jitter?...

Are you sure that your system is sending data to the DX7 Pro in asynchronous mode? What you're describing is consistent with isochronous mode... in that mode, the DAC has to recover the clock from the packets, and the more infrequent the packets, the less effectively the PLL can work, which means more jitter, which means more IMD.

I don't know enough about audio on Linux to know what transfer mode ALSA chooses by default.

robin79 · Aug 21, 2020

sergeauckland said:
No, you're not wrong. Latency has nothing to do with sound quality. In fact, it could be argued that a DAC with a very slow Phase-Locked-Loop gives lower jitter and therefore theoretically better sound quality (although probably not audible) but it will have longer latency which is why it's not done much that way.

We do agree on the concept, clearly, but the empirical result says something different

I really encourage anyone to make a test, if they can: I have experienced the same with my previous DAC, based on ESS9018, now is the ESS9038Pro, but it's sure that the sound is better feeding it with a lower latency...
I can't explain it, but it is: really, try the same test if you can, I did ABX on it.

sergeauckland · Aug 21, 2020

robin79 said:
We do agree on the concept, clearly, but the empirical result says something different
I really encourage anyone to make a test, if they can: I have experienced the same with my previous DAC, based on ESS9018, now is the ESS9038Pro, but it's sure that the sound is better feeding it with a lower latency...
I can't explain it, but it is: really, try the same test if you can, I did ABX on it.

I can't try it as I don't use external DACs, I use whatever's already in the equipment as perfectly Good Enough.

However, I'm not sure how you can do such an ABX test without knowing which it is, and without a long delay between tests. Nevertheless, latency is not anything I'd ever be concerned about other than audio for video.

S.

robin79 · Aug 21, 2020

ElNino said:
Are you sure that your system is sending data to the DX7 Pro in asynchronous mode? What you're describing is consistent with isochronous mode... in that mode, the DAC has to recover the clock from the packets, and the more infrequent the packets, the less effectively the PLL can work, which means more jitter, which means more IMD.

I don't know enough about audio on Linux to know what transfer mode ALSA chooses by default.

Thanks for the suggestion.
The DX7 Pro explicitly states "USB mode: asynch" in its configuration menu, and looking at the USB device properties from the OS' command line the mode is qualified as "asynchronous" (and that's the only mode that is shown, you can have more than one mode with other devices).
Everything would suggest that it is working in asynch, but I will check all this aspects again and I'll do a followup if I find something strange.

robin79 · Aug 21, 2020

sergeauckland said:
I can't try it as I don't use external DACs, I use whatever's already in the equipment as perfectly Good Enough.

However, I'm not sure how you can do such an ABX test without knowing which it is, and without a long delay between tests. Nevertheless, latency is not anything I'd ever be concerned about other than audio for video.

S.

My girlfriend, she's a programmer (too), I just seat on the sofa with my headphones, she controls the system and:
A) plays a song with low latency
B) plays the same song with high latency
X) plays again one way or the other and asks me which one is it.

Thanks for your contribute!

ElNino · Aug 21, 2020

robin79 said:
Thanks for the suggestion.
The DX7 Pro explicitly states "USB mode: asynch" in its configuration menu, and looking at the USB device properties from the OS' command line the mode is qualified as "asynchronous" (and that's the only mode that is shown, you can have more than one mode with other devices).
Everything would suggest that it is working in asynch, but I will check all this aspects again and I'll do a followup if I find something strange.

How are you playing back audio? I'm mostly unfamiliar with ALSA, but on other OSes, it is possible to have sample rate conversion to handle clock drift (even at identical nominal sample rates) in cases where playback software is written to assume synchronous playback. If you do a blind test regarding the buffer sizes, I would also try other playback software just to see if the results are the same.

robin79 · Aug 21, 2020

ElNino said:
How are you playing back audio? I'm mostly unfamiliar with ALSA, but on other OSes, it is possible to have sample rate conversion to handle clock drift (even at identical nominal sample rates) in cases where playback software is written to assume synchronous playback. If you do a blind test regarding the buffer sizes, I would also try other playback software just to see if the results are the same.

Nice comment.
ALSA is a kernel layer, the abstraction offered by Linux just above the low level device drivers: it's a passive software component, like a library, that "exposes" the devices to higher layers. I guess that the most similar component in Windows may be the Kernel Streaming layer, but I'm not so sure.
Anyway, the point is that, when a software want to stream something to a device through ALSA, it must perfectly know how that stream is expected, since ALSA (by default) will not do any adaptation, it will just return an error (you can use plugins to create virtual devices with implicit transformation etc., but I'm talking about the basic default behavior now). On the other side, ALSA can be queried to discover properties of the device (in order to feed a proper stream into the device).

Since ALSA is passive, you need a program that concretely stream to the device: in my case, that program is called Jack Audio Server.
The reasons for a "middleware" that just do the streaming and nothing else are mainly two:
1) you can assign the highest priority to it, so the streaming won't stop, whatever else is happening around (any other task will be secondary)
2) it handles transaprently some low level settings retrievable from ALSA (like the format of the stream, little-endian vs big-endian, 16bit vs 32bit etc.).
The parameters that I set in Jack are: 0) the selected ALSA device 1) sampling rate; 2) number of buffered samples (buffer size); 3) number of hardware interrupt used to empty/fill the whole buffer one time (periods/buffer).
When a client software asks the Jack Server to stream some PCM information, Jack push the stream into the selected ALSA device, according to the given parameters.
I never thought about a "fake" sample rate conversion (44.1 to 44.1) before your comment, nice intuition!
But I have just made a test that seems to exclude any implicit conversion: I have played a 44.1Khz track forcing a 48Khz rate in Jack → the track played faster and at a higher pitch than normal, meaning that Jack was streaming it "blindly" at the wrong rate, in case of conversion it would have played normally, right?

Above the Jack Server I place the MPD (Music Player Daemon): it has nothing to do with the audio signal, just organizes the music library and expose a network API that allows me to use some specific GUI from a smartphone or my laptop.

So, apparently the software stack do not apply any resampling or transformation on the audio data before the DAC is reached, but I will certainly follow your suggestion experimenting and ABXing with different stacks, to check the coherence of the results.
Thanks again!

ElNino · Aug 21, 2020

robin79 said:
I never thought about a "fake" sample rate conversion (44.1 to 44.1) before your comment, nice intuition!
But I have just made a test that seems to exclude any implicit conversion: I have played a 44.1Khz track forcing a 48Khz rate in Jack → the track played faster and at a higher pitch than normal, meaning that Jack was streaming it "blindly" at the wrong rate, in case of conversion it would have played normally, right?

That indicates that MPD is not sending audio synchronously, which is good. It doesn't necessarily indicate that the clock JACK is using is the same as the DAC's clock. Are you using alsa_out to get data from JACK to the ALSA device? This link seems to suggest that alsa_out uses a resampler to decouple an internal JACK clock from the hardware clock: https://jackaudio.org/faq/multiple_devices.html

If you can, you might try experimenting with MPD speaking directly to the ALSA backend, rather than through JACK, and see if you're hearing the same thing. It would just remove one variable from your setup.

robin79 · Aug 27, 2020

Hi, I took some time to explore and test before replying: thanks to your suggestions I have gained interesting insights and also improvements (yeah!), so thanks again.
I will start answering to your last post and then I'll expand the topic.

ElNino said:
That indicates that MPD is not sending audio synchronously, which is good. It doesn't necessarily indicate that the clock JACK is using is the same as the DAC's clock.

I didn't get this specific comment, actually: I think that the clock being used and the sampling/resampling are unrelated concepts.
Regarding the sampling, that should deal with the number of digital samples per second present in the stream, so the "density" of data, that has not much to do with the refernce clock, I guess.
When I say "no resampling seems to be happening" is because, when I sent audio to Jack saying "hey Jack, take this 48Khz samples!" Jack did it blindly, it played 48K per second when they were 41K, thus it sounded faster and higher. If Jack or ALSA had resampled anything (interpolating), then the speed and pitch would have not changed.
Does it make sense to you?

ElNino said:
This link seems to suggest that alsa_out uses a resampler to decouple an internal JACK clock from the hardware clock: https://jackaudio.org/faq/multiple_devices.html

ALSA is capable of that, that's true, but my parametrization is very clear:

Code:

--disable-format --disable-resample --disable-channels --disable-softvol

basically I explicitly turn off any automatic behavior.

ElNino said:
you might try experimenting with MPD speaking directly to the ALSA backend,

And there it is, you wise man! This is the best suggestion you could give me!
Long story short, I ended up using MPD's "pipe" plugin that streams a raw PCM data to an arbitrary command/script, with a full delegation pattern.
This is the command, in my case

Code:

sox -q -V0 --replay-gain off -D -t raw -c 2 -b 16 -r 44100 -e signed -L - -t wav -b 32 - | chrt -rr 95 aplay -q --disable-format --disable-resample --disable-softvol --disable-channels -D hw:Pro,0 -M --period-size=8 --buffer-size=24

that basically says

(sox →) zero-pad the 16bit raw stream to a stereo 32bit signed little endian wav stream, don't touch the gain, don't do dithering and pipe out
(chrt →) set the real-time priority of the next command (aplay) to the maximum value allowed by the system
(aplay →) play the incoming stream directly to the ALSA kernel module, don't touch the stream in any way, use the memory space of the destination device, use a buffer of 24 frames (each frame many samples), empty the buffer with 3 interrupts (24 divided by the period)

This way, any time MPD streams raw PCM data, a single thread of "aplay" with maximum priority handle everything: I got rid of a lot of threads (Jack in primis), the system load has decreased of a 40% (!), the hardware is colder and the sound is even better than before, it's amazing

So, finally the original topic: latency and sound.
With this new, much more neat configuration, I have started my tests over: I created two "pipe" outputs in MPD (44100_16_lowLat; 44100_16_highLat), one is the above, the other one is pretty identical, but with no constraints on buffer and period, I left the decision to ALSA (that defaults to the maximum buffer).

The low latency pipe implies a ~0.2ms period time (time between to interrupts, that means between two "shots" of data to the DAC), the high latency one a ~5ms period time.

First impression: I (think I) hear a difference, but is much more difficult to hear than it was before.
With Jack in the middle there was something more that was happening and that made the difference more easy to be cought: this time I really needed total silence and attention and the ABX test resulted in a 70% hit against 90%!
But still I feel it different.
I have spent a lot of time reasoning about a possible explanation and below is what I got... I'm deeply unsure about the rightness of the reasoning, so please be "open" while reading.

My point is about the kind of latency we take into account. In this case we are talking about the time between two chunks of frames fed into the DAC. So it is not latency in terms of an initial delay: It is the "density" of the data flow, that is discrete, not continuous, and is paced by the OS interrupts.
Now, I think we agree that the DAC cannot process data faster then how the PC send them, otherwise any possible buffer will be emptied at some point: so the DAC can evaluate my data as soon as I send them, not sooner.
Each frame sent is a "piece of sound", in that it concurs in the building of the analog output wave: after the DAC has received and processed a chunk of frames it cannot process the next chunk before the latency has passed, that is 0.2ms in one case and 5ms in the other.
This should mean that, periodically, the DAC has small windows of waiting in the processing while is outputting the analog wave: doesn't it put a limit on how steep some transient can be rendered?
On this base I have done some more testing (TERRIBLY biased, of course...), but it seems to me that the difference relies exactly in fast transients: like percussions, hand clapping, very fast strings picking (Paco De Lucia), synthesizers (Micheal Jackson, Daft Punk, Depeche Mode) and similar.
It's like everything is sharper and better focused, more tight and "real".

Or at least this is what I hear or I want to hear... psychoacoustic is really near to alchemy.
I hope I didn't bore you too much, you were very helpful, so I wanted to share back.

robin79 · Aug 31, 2020

robin79 said:
Each frame sent is a "piece of sound", in that it concurs in the building of the analog output wave: after the DAC has received and processed a chunk of frames it cannot process the next chunk before the latency has passed, that is 0.2ms in one case and 5ms in the other.
This should mean that, periodically, the DAC has small windows of waiting in the processing while is outputting the analog wave: doesn't it put a limit on how steep some transient can be rendered?

UPDATE
I amend my own conclusion in the previous post: I confirm the reasoning, in general, but I have understood that the above lines are non-sense... :facepalm:

When fed with a low latency data stream, the DAC converts 8 frames each 0.2ms, while in high latency converts many more frames in 5ms: this can't have anything to do with the steepness of the generated transients (it would have been absurd, actually...), it just means a different way of elaborating the data, with frequent conversions of small amounts vs less frequent conversion of bigger amounts.

Thus my suspect is that my DAC "prefers" to work in the former fashion (and this is not non-sense, at least).

I have also found an old post in a forum related to Logitech's Squeezebox where appears that some users had experienced a sound improvement reducing ALSA's period time (hence the latency):
https://forums.slimdevices.com/show...8748e7ce7a68c1&p=704876&viewfull=1#post704876

As a further step, since the whole buffer size and the period size are not bound together (is legitimate to have a big buffer transferred with a very high number of small fractions → small period time, despite of latency), I will make some more tests leaving the buffer size to its default and tuning only the period size.

Vasr · Aug 31, 2020

I would be open to the possibility that the latency difference is correlational rather than causal with the observed results.

There is another angle you can approach this with working backwards from the output. Typically, the kind of difference you heard might be due to any of volume differences, crosstalk differences, tonal balance differences, phase differences, etc., in the output corresponding to the two cases.

One way to investigate this for more data is to do a line-level loopback measurement from the DAC output to the line-in of a sound card (make sure the volume levels don't cause clipping) and using REW to measure the phase alignment of left and right, channel volumes and delays and frequency sweeps and see if there is observable differences in the measurements between the low-latency mode and the higher-latency mode playback for these parameters. Since these observations are relative to each other than absolute, any artifacts introduced by the sound card line-in will be common. It will be more accurate than going through the amp and speaker and measuring the sound with a mic.

If there is some difference that you see in the measurements, you will have more data to come to a conclusion of what might be causing that difference.

Can't say this will be successful but worth a try without requiring much of a set up.

mansr · Sep 1, 2020

Absent any buffer under-runs, a USB DAC sees exactly the same packet stream regardless of the period and buffer size settings. Every 125 μs, the USB host controller sends a packet with the number of samples requested by the DAC. With 48 kHz audio, each USB packet contains 6 samples (per channel) most of the time, occasionally one more or less to account for clock drift. This is far less than any sensible period size. The period size sets how often the host controller informs the driver of its progress (by raising an interrupt and/or updating a status field). Simply put, the buffer size determines the latency between application and DAC while the period size is the (minimum) time before the driver will accept new data after the buffer has been filled up.

For interactive applications (telephony, games, live effects, etc), the smallest usable buffer size is generally desired to keep the latency down. Since PCs are not real-time systems, non-interactive uses (such as music and video) often benefit from a larger buffer to absorb occasional latency spikes elsewhere in the system. There is rarely any benefit to using a larger number of periods than minimum needed to make up the desired buffer size (period size is limited by hardware).

It is not possible in any way for the period/buffer size settings to affect the operation of the DAC. As long as there are no under-runs (which is generally reported by the application), the "sound" will be exactly the same.

Importance of USB audio stream latency?...

robin79

Member

sergeauckland

Major Contributor

ElNino

Addicted to Fun and Learning

robin79

Member

sergeauckland

Major Contributor

robin79

Member

robin79

Member

ElNino

Addicted to Fun and Learning

robin79

Member

ElNino

Addicted to Fun and Learning

robin79

Member

robin79

Member

Vasr

Major Contributor

mansr

Major Contributor

Similar threads