Audibility of Small Distortions

amirm · Mar 3, 2016

This is an article I wrote for Widescreen Review Magazine which came out a couple of years ago. This is a revised and updated version.
----
Audibility of Small Distortions
By Amir Majidimehr

Spend five minutes in any audio forum and you immediately run into food fights over audibility of small distortions. Debates rage on forever on differences between amplifiers, cables, DACs, etc. I do not have high hopes of settling those debates. But maybe I can chart a fresh way through the maze that partially answers the question.

The challenge here is that in some of these cases we can show measurable distortion in the audio band. Therefore we cannot rule out easily that there are no differences. Yes, we can resort to listening tests but who wants to go and perform a rigorous listening test just to buy a new AVR or DAC? It is not like there are people who routinely perform these tests for us. Shouldn’t we have a way to evaluate the performance of these systems just from their specifications? There may be.

Let’s look at a trivial example. If I filter out the audio spectrum above 10 KHz, just about everyone can detect that I have changed the sound. Even the most staunch advocates of “give me a blind listening test or else” would accept that we don’t need to perform a blind test to accept that such a system has audible problems. So there can be specifications which trump the need to perform listening tests.

Given the advances in our audio systems, such easy way outs don’t exist for the most part. In digital systems for example the distortion can be dependent on what the system is doing, or what content is being played. Imagine having a distortion that pops up if the rightmost digits in your audio sample are alternating 0 and 1s but not any other sequence. For such a distortion to be audible we would have to hit on this specific sequence or we would be testing the wrong thing. This is not hard if you know what I just told you. But without that knowledge which is the typical case, you would be lost in the wind trying to find source material that would trigger these sequences.

Let’s take a short detour into acoustics and examine small distortions there and see if they lend us some insight here, namely, the case of resonances in speakers and rooms.

Audibility of Resonances

A resonance is an attribute of a system where it contributes to the amplitude of the source signal, causing the frequency response to have peaks in it. Take a look at the following three resonances:

The “Q” indicates how steep the resonance is in frequency domain. In time domain, the higher the Q, the more “ringing” the system has. Ringing means that a transient signal (think of a spike) will create ripples that go on after it disappears. An ideal system would reproduce that transient with zero ringing. The higher the Q of a resonance, the more ringing the system has.

Reading what I just wrote, if I asked which one of the above resonances is more audible, you will likely say High Q. It seems natural that it has the highest amplitude change and more time domain impact. Yet listening tests show the opposite to be true! The Low Q is more audible.

How can that be? Well, it has to do with statistics. A Low Q resonance spans a wider set of frequencies. That increases the chances that some tone in our source content hits it and therefore will be reproduced at the wrong level. The narrow resonance can do more damage but the probability of us catching it in the act is lower because fewer tones energize it.

Listening tests conducted by Toole and Olive show that we hear variations as low as 0.5 decibels in Low Q resonances. This makes a mockery of the typical industry specification of +-3 dB being good enough for speakers. Clearly if 0.5 dB is audible, +-3 dB represents a huge amount of audible variation from our target neutral reproduction of our source material.

With that background now let’s look at a more challenging situation, namely jitter in digital systems.

Audibility of Jitter
Readers of my past articles are probably familiar with the concept of jitter distortion in digital systems. As a quick review, jitter is a variation in the speed with which we output our digital samples (so called DAC clock). The deviations from the ideal timing generate spikes on either side of our source frequencies. Here is an example of jitter at a frequency of 2.5 KHz acting on a 10 KHz source signal:

Jitter “amplitude” is specified in units of nanoseconds or picoseconds as appropriate. Nanosecond is a billionth of a second. Picosecond is a trillionth of a second or 10 to the power of -12. The above graph shows a jitter level of 7.6 nanoseconds with a profile of a sine wave creating those distortion spikes on either side of our source frequency.

So how do we determine if those spikes are audible? The first step in that is to review listening tests by Bell Labs researchers Fletcher and Munson who investigated our hearing sensitivity as a function of level and frequency as represented in this graph:

For the purposes of this article, we only care about the bottom one labeled “threshold.” This is the lowest audible level (for the average population) of a tone plotted on a per frequency basis. It becomes very clear from this research that our hearing system is far more sensitive in the mid-range frequencies of roughly 2 to 4 KHz. If there is going to be an audible distortion that is where it is going to be.

Pictorially, this is what our distortion needs to do to reach the level of audibility:

The trick with jitter is that the higher the frequency, the higher the distortion products created for identical amount of jitter! This should make intuitive sense in that the higher our music frequency, the higher ratio the timing error will represent. Assuming a worst case scenario of a full amplitude 20 KHz signal, it takes about 30 picoseconds or 30E-12 of timing error for the distortion to exceed threshold of hearing if it lands in that most sensitive hearing range.

Fortunately, we don’t have full amplitude 20 KHz signals in our music or we would have lots of blown tweeters, amplifiers that may oscillate/shut down, etc. The biggest help we have here is something called Perceptual Masking in psychoacoustics science. So let’s get into that.

Effect of Masking
Masking is a rather simple concept: it says that if something loud is playing, frequencies near it may be less or not audible at all. Take a look at these two distortion products in red and blue with equal amplitude relative to our source signal:

The blue distortion is masked by the shadow of the gray source signal (called “masker” above). In that sense it is likely inaudible despite its high amplitude. The red distortion spike on the other is extremely audible since it not only is outside of the masking area of our source signal but also happens to land where our hearing system is most sensitive.

Since we have no control over our source content, i.e. music we play, it is impossible to assert that across all source material masking would help make the distortion inaudible. The equipment distortion could very well become audible in the right circumstances.

Put another way, just like our resonance situation, whether these distortions are audible is a matter of statistics. This means that we could run a number of blind tests and get negative outcomes for audibility not because the distortion is not audible in the absolute, but because we didn’t find the right source material to excite it the right way.

A proper listening test for jitter then starts with using the right material where masking is not constantly covering the distortion. We do this in audio compression where the clips we select reduce the impact of masking as to make it easier to hear codec distortion. Unfortunately in the few jitter listening tests that have gone on, the selection of music has not been made this way.

Of note, picking “audiophile music” is the wrong approach here. There is no telling if audiophile music is more revealing this way. None of the codec test music for example falls in this category.

There is something we can learn here though by inverting this equation and arriving at something more absolute: if our distortion is below the threshold of hearing then we can sleep easy knowing that it is inaudible across all content. Using Hawksford and Dunn research for example, a DAC that generates less than 20 picoseconds would be transparent to its source assuming jitter is the only distortion we are worried about.

This gets us to where we wanted to be, i.e. a specification that tells us that we have achieved the right target fidelity level. Yes, this likely sets too high a bar relative to our ability to hear such non-linear distortions, or hear it in our content. So if you like, you can target higher values. But don’t go too far. There is no way to build a credible case that thousands of picoseconds of jitter as we usually get over HDMI for example is inaudible. The analysis I just provided shoots way too many holes in that line of reasoning. My personal target is a few hundred picoseconds.

References
"Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms," Dr. Floyd Toole, 2008 [book]

“Is the AESEBU / SPDIF Digital Audio Interface Flawed? Chris Dunn and Malcom Hawksford, Audio Research Group, Department of Electronic Systems Engineering, University of Essex, AES convention paper, 1992.

Amir Majidimehr is the founder of audio/video/integration/automation company, Madrona Digital (madronadigital.com). Prior to that, he spent over 30 years in the computer and broadcast/consumer video industries at leading companies from Sony to Microsoft, always pushing to advance the state-of-the-art in delivery and consumption of digital media. Technologies developed in his teams are shipped in billions of devices from leading game consoles and phones to every PC in the world and are mandatory in such standards as Blu-ray. He retired as Corporate VP at Microsoft in 2007 to pursue other interests, including advancing the way we interconnect devices in our homes for better enjoyment of audio/video content.

Opus111 · Mar 4, 2016

amirm said:
Audibility of Jitter
...
The trick with jitter is that the higher the frequency, the higher the distortion products created for identical amount of jitter!

This is indeed the case, what does seem to be missing from this (otherwise comprehensive) article is that today's S-D DACs are converting lots of ultrasonic noise from digital to analog. Seeing as they often run into MHz sample rates this ultrasonic noise can be up to 100X the frequency of the highest audio. What's to stop the jitter modulating this noise in such a way that it impacts the audio band noisefloor?

amirm · Mar 4, 2016

There isn't. That's why we need to measure them and see if such distortions exist.

Opus111 · Mar 4, 2016

Measuring noise is rather tricky, particularly with the usual tool of FFT - how would we go about making measurements of noise floor changes due to jitter do you think?

amirm · Mar 4, 2016

Measuring dynamic behavior is a todo project for me. Have not given it enough thought yet on how to do it

.

Opus111 · Mar 4, 2016

I'm wondering if wavelets hold some promise in terms of being able to optimize time and frequency domains somewhat independently? Do you have any experience with them?

March Audio · Mar 4, 2016

Something I was looking at today whilst making some recordings. I was already aware that my MDAC with certain filters created a bunch of intermodulation and spuria in band. The following plots display some of the out of band noise quite well.

Slow filter

Optimal Phase Filter - just see something 26-30kHz

Optimal Transient XD Filter

amirm · Mar 4, 2016

That's interesting. I trust the noise was there in the source and didn't get filtered and is not the case of XD filter adding them at the corner of pass band?

March Audio · Mar 4, 2016

The file being played back was a 44kHz track which opus supplied for the glare test. I recorded the glare test at 44kHz, but then subsequently had a play at 96kHz.

The optimal transient filters impulse response minimise pre ringing but do create a lot of in band intermodulation and as you can see out of band also. They also have a roll off and are 2-3 dB down by 20 kHz. So yes I think it's added by the filter!

I'll post some plots of the in band later.

Before I measured these filters I was always under the impression they were rolled off and less clean. The measurements seem to confirm this.

Interesting point is the designer thinks this is the best sounding filter, many forum posters agree, but that could be due to the lead given by the designer.

BTW, the 3 tracks in the glare test are original file, rerecorded with Mdac optimal spectrum filter and rerecorded with Mdac optimal transient xd filter. The differences between the filters are audible directly with most music, but it will be interesting to see if it's audible after rerecording. It would be great if you get time to have a listen to those files.

Pio2001 · May 15, 2018

Hi, BE718,
We can see on your third graph (Optimal Transient XD Filter) that the high frequency noise is in fact aliased images of the low frequency content : the two main bands, from 0 to 2 kHz and from 2 to 4 kHz are copied once between 40 and 44 kHz, then there is a dark line, then they appear again between 44 and 48 kHz.
This is nothing else than the output of a DAC with a very smooth low-pass filter instead of a brickwall one, that doesn't remove completely the 44100 Hz 16 bits digital steps.

This is the usual so-called tradeoff between "perfect frequency response" and "perfect time response" proposed in some DACs. Their digital filter, between the oversampler and the digital-to-analog converter itself, is supposed to cut all frequencies above the Nyquist frequency (that is 22050 Hz for 44100 Hz audio).
But an infinitely sharp digital filter would produce an infinite ringing. That's why the actual filters are not completely sharp. For example, a good pattern is to have 0 dB at 20000 Hz and minus infinite at 22050.

Sometimes, various filters are proposed with a much slower attenuation. They are supposed to reduce ringing. That's why your filter is called "optimal transient".
But this view it mistaken. If you plot the sonogram with enough time resolution, you will see that this ringing does NOT affect transient in any way ! The digital ringing occurs at the roll-off frequency and nowhere else. That is above 20000 Hz. The audible band remains completely unchanged, and the duration of the transients are exactly the same under 20000 Hz as far as ringing is concerned.
On the other hand, the frequency response does get affected by these filters.

In conclusion, with these alternate filters, you get a worse frequency response, but you don't get any improvement in transients in exchange. The "optimal transient" is an illusion that comes from an incorrect interpretation of time-domain curves : the 21 kHz ringing is not part of the transients, and is not audible.

By the way, in order to make this ringing appear, you must have energy at 21 khz in the audio data. Otherwise, it doesn't even exist.

RayDunzl · May 15, 2018

Pio2001 said:
By the way, in order to make this ringing appear, you must have energy at 21 khz in the audio data. Otherwise, it doesn't even exist.

I hadn't noticed that, I guess an experiment needs to be performed, unless you have an existing example to display.

Pio2001 · May 15, 2018

I've got no example at hand, but we could make some.
The most enlightening experiment for me was to listen to several audio samples lowpassed at 14, 13, 12, 11 and 10 kHz using a brickwall lowpass filter. the ringing was so obvious ! At it was also obvious that it didn't affect the musical content in any way, except by being played over it.
It was long ago, and these samples are now lost. But now, with the Rephase software, it would be easy to make new ones. All that I need is a free audio sample.
Ideally, it would have no content above 9 kHz at the beginning, so that we can see the difference when there is some content at the rolloff frequency, and when there is none.

Cosmik · May 15, 2018

RayDunzl said:
I hadn't noticed that, I guess an experiment needs to be performed, unless you have an existing example to display.

Archimago did some experiments a few weeks ago:
http://archimago.blogspot.co.uk/2018/02/musingsmeasurements-on-blurring-and-why.html

RayDunzl · May 15, 2018

Not sure if this is applicable, but...

Impulse response at preamp output here:

Log Sweeps:
10 to 10,000, 10 to 18,000, 10 to 20,000, and 10 to 22,049Hz with some offset for visibility

(the cable is a little noisy)

Pio2001 · May 15, 2018

So I guess that this is the impulse response of a DAC running at 44100 Hz. And the top one would be from 10 to 10000 Hz, the next one to 18 kHz, then 20 and 22.049 Hz ?

Normally, REW generates what would be the response to an impulse, from a measurement made with a sweep tone.
However, here, it would be the response to imperfect impulses lowpassed at 10, 18, 20 and 22 kHz, since the measurements are taken from sweeps that were stopped at these frequencies.

The problem is that ringing can come from
-The test signal
-The DAC
-The ADC
-The analysis algorithm

We can see that the oscillations of the green, blue and red curves span about two periods over 100 μs. That is one period = 50 μs. So the oscillation is around 20 kHz.
They look the same for 18, 20 and 22 kHz. But surprisingly, there is very little 10 kHz oscillation around the black curve (at least it indeed oscillates at 10 kHz, as the little bumps are twice as wide as the ones of the other curves). Which means that REW internally applies a soft lowpass filter as a result of its impulse response calculation. If the sweep abruptly ends at 10 kHz, we should expect a lot of 10 kHz ringing around the peak.

The frequency of the ringing is exactly the same for the three coloured curves. So this is not 18, 20 and 22 kHz ringing.

If we assume that this is indeed a ringing that would be present at the output of the DAC (the ultimate confirmation would be to record a digital 44100 Hz impulse with a 96 kHz ADC), then we can see that it was not triggered by the lowpassed test signal, only with the wide band signals.

At what sample rate was the ADC running to record the test signal ?

Cosmik · May 15, 2018

You're welcome.

RayDunzl · May 16, 2018

Pio2001 said:
At what sample rate was the ADC running to record the test signal ?

The signal would have been digitally generated in REW, passed out over USB and converted to Optical.

And, reconsidering at the moment, it went out at 44.1kHz, but would have been resampled to 48kHz at an idle miniDSP OpenDRC-DI, and further resampled in the DAC to 211kHz.

Finally, the signal comes back to my PC Sound Card, for which I'm unsure of the sample rate used in the above.

Sorry!

Let me rethink/rearrange all of this a little.

daftcombo · Jan 8, 2020

amirm said:
The “Q” indicates how steep the resonance is in frequency domain. In time domain, the higher the Q, the more “ringing” the system has. Ringing means that a transient signal (think of a spike) will create ripples that go on after it disappears. An ideal system would reproduce that transient with zero ringing. The higher the Q of a resonance, the more ringing the system has.

Reading what I just wrote, if I asked which one of the above resonances is more audible, you will likely say High Q. It seems natural that it has the highest amplitude change and more time domain impact. Yet listening tests show the opposite to be true! The Low Q is more audible.

How can that be? Well, it has to do with statistics. A Low Q resonance spans a wider set of frequencies. That increases the chances that some tone in our source content hits it and therefore will be reproduced at the wrong level. The narrow resonance can do more damage but the probability of us catching it in the act is lower because fewer tones energize it.

Listening tests conducted by Toole and Olive show that we hear variations as low as 0.5 decibels in Low Q resonances. This makes a mockery of the typical industry specification of +-3 dB being good enough for speakers. Clearly if 0.5 dB is audible, +-3 dB represents a huge amount of audible variation from our target neutral reproduction of our source material.

Could you quantify was is "low Q"? "high Q"?

amirm · Jan 8, 2020

daftcombo said:
Could you quantify was is "low Q"? "high Q"?

High Q is a tall, narrow peak. Low Q is a wide, less tall peak. The latter covers more frequencies so its effect is a lot more audible, even though it has lower amplitude.

daftcombo · Jan 8, 2020

amirm said:
High Q is a tall, narrow peak. Low Q is a wide, less tall peak. The latter covers more frequencies so its effect is a lot more audible, even though it has lower amplitude.

Thanks you but, for instance, in parametric equalizers like the one in RePhase, you can choose Q of 0.1 or 1 or 10 etc.
Which values would you put on "low", "medium" and "high" Q?

Audibility of Small Distortions

Founder/Admin

Addicted to Fun and Learning

Founder/Admin

Addicted to Fun and Learning

Founder/Admin

Addicted to Fun and Learning

Master Contributor

Founder/Admin

Master Contributor

Senior Member

Grand Contributor

Senior Member

Major Contributor

Grand Contributor

Senior Member

Major Contributor

Grand Contributor

Major Contributor

Founder/Admin

Major Contributor

Similar threads