Inside High-res audio: PCM vs MQA vs CD: 2L Sampler Comparison

Herbert · May 8, 2021

Palex said:
As a result it turns out - LP, most honest "format"?)

Not really as the ultrasonics are very likely below vinyl noise.Wikipedia states that 50kHz are possible, proven by test recordings, but I assume these are signals close to full scale.

PierreV · May 8, 2021

voodooless said:
Actually, where it counts you’ll actually gain dynamic range when adding dither.

Perceived dynamic range, dynamic range adjusted for our hearing thresholds. That's right in Stuart/Craven territory btw

https://web.archive.org/web/2016040...an-audio.com/meridian-uploads/ara/coding2.pdf

Blumlein 88 · May 8, 2021

voodooless said:
Actually, where it counts you’ll actually gain dynamic range when adding dither.

Note: I don’t know if YouTube actually preserves all of the dithered audio when compressing.. I can’t hear anything in the last 1/5th anyway

Didn't watch the video. But yes, you can save a file with something at the -110db level just to pick one. If you save it undithered in 16 bit that low level tone is lost. If you save it with dithered 16 bit, especially noise shaping dither you can indeed find and recover that -110 db signal. It is really there within the dithered noise of the medium.

voodooless · May 8, 2021

PierreV said:
Perceived dynamic range, dynamic range adjusted for our hearing thresholds. That's right in Stuart/Craven territory btw

https://web.archive.org/web/2016040...an-audio.com/meridian-uploads/ara/coding2.pdf

You can, as @Blumlein 88 pointed out, measure it. So it does not only appear to be there, it actually is. Calling is “perceived” is therefore a bit misleading.

amirm said:
That would be getting something for nothing. Dither randomizes the LSB so by definition, you lose bits and dynamic range. You can noise shape that so that in the audible band the effect is not there but then you better have ultrasonic spectrum to park the noise.

Sure the gain is not over the whole bandwidth, but it’s there were it counts and let’s you encode signals to about 2 bits below LBS with little to no sonic impact. In my book that’s a net gain. You seem to have little issues when MQA parks a bunch of noise in the audible spectrum. And that doesn’t give you anything unless you have a decoder (and even then it gives you more stuff you can’t hear).

UKPI · May 8, 2021

amirm said:
That would be getting something for nothing. Dither randomizes the LSB so by definition, you lose bits and dynamic range. You can noise shape that so that in the audible band the effect is not there but then you better have ultrasonic spectrum to park the noise.

I'd say that using dithering is a net gain. Distortions by quantization error severely impact the quieter parts of the music without dither. In my opinion, substituting that with noise is a better choice especially with noise shaping. There is a potential problem of increased chance of clipping with noise shaping, but that can be avoided by giving just a little bit of headroom in the recording.

To illustrate my point, I created 9 seconds long 8-bit, 44.1kHz PCM signals with no dither, with dither, and with noise shaped dither from a CD quality music source. The music is from the single album "For You" by Kang Susie if anyone is interested.

Francis Vaughan · May 8, 2021

amirm said:
That would be getting something for nothing. Dither randomizes the LSB so by definition, you lose bits and dynamic range. You can noise shape that so that in the audible band the effect is not there but then you better have ultrasonic spectrum to park the noise.

There are some subtleties here.
When sampling it is a requirement that dither is used. This is often not appreciated. An ADC that has no dither applied is incorrectly implemented and will yield objectionable signal correlated quantisation noise. This is true in any digital system, whether audio or any other system. Dither is not an option. It is a mathematical necessity to obtain correct operation from the quantiser. Dither is minimally AIWN of one half LSB. But it can be other distributions. Correct dither doesn't reduce the available bit depth. Incorrect dither might. Dither other than AIWN can lead to some surprising benefits. (Noise this low in amplitude is often below the inherent noise floor of the recording process. So long as there is AIWN in the chain - and most microphones will oblige, certianly any small diaphragm mic, there is often no real need to add it in explicitly. In early digital recordings this was often all that was done. But you need to know and be sure.)

The overarching theory is of course Shannon. I highly suggest that anyone interested read his original paper. It is perhaps my favourite scientific paper. Well written, groundbreaking, and one of the most important of the last century.

There are a lot of subtleties in the paper that go unappreciated.
The core point is signalling in a noisy channel. Every communication channel is a bandwith limited channel with a defined signal to noise ratio. The maximum information transfer rate is exactly determined by this pair of metrics. However SNR need not be constant over the channel. It is the area under the curve of SNR/frequency that determines the total information capacity. (This is the key insight that allows ADSL to work.)
It is perfectly possible to resolve signal that is well below the noise. Humans can, and radar and sonar systems exploit this to the hilt. There is signal below tape hiss. Detecting it is covered by Shannon's theorem. They key is that you need time to get enough information into the system to allow a signal to be found. And there is no free lunch, you can't detect arbitrary signals, the signal must itself have a low information content, and overall the books will balance. Signals that are self correlated are detectable, since self correlation limits their information content. Eventually you exchange ability to detect signal in one part of the band with ability in another. But it can be done any way you like by perturbing the overall SNR/frequency curve, so long as the area remains constant. The bit depth does not automatically set the signal dynamic range in the channel. It constrains the overall information content, but not the dynamic range at any one frequency.

If we note that the human ear has poor SNR at high frequencies - say the top octave, and certainly above 15kHz, there is a lot of wasted channel capacity. There is no chance any human, even with perfect hearing, can hear down to the LSB at 15kHz. Zip. Even the lowest couple of bits. So, in the digitisation process we can exchange SNR across the audible band. And this is exactly what all the various audio dither mechanisms do. They effectively perturb the dither in such away that the SNR in the human ear critical mid bands is better than 96dB, whilst it is reduced in the higher frequencies.
You can be sure that nearly every CD you ever bought has been so dithered. Back in the early days companies were very proud of their dither algorithms, and sometimes named them on the liner notes. Every DAW you can buy has a final dither step, and usually a choice of dither algorithms.

What is critical is this, dithering is pretty much a final step. It is applied in a DAW when the entire production and mastering is complete and the result is created for distribution. Typically the internal 24 bit samples are being truncated to 16 bit. Again, even if no specialised dither is applied, you cannot truncate the bits from 24 to 16. You get quantisation artefacts. In exactly the same way as ADCs must dither, truncation must dither.

Whilst shaped dither is related to noise shaping in DACs, as both are described by the same theory, the implementation and effects of the two are quite different. Dither results in in-band shaped noise. But the shaping is controlled so that it remains inaudible.

Internal to DAWs there is a lot of care and management needed in the digital chain. In the ideal case there is never any truncation or re-sampling applied to a signal. But if there is it is critical that the algorithms do not lose the below LSB signal. Different plugins can be poorly behaved in this manner, and silently introduce quantisation noise into the chain. A 16 bit audio signal commonly has information below 96dB at some frequencies, and the DAW must ensure that is preserved.

This is also important when considering the ultimate quality of a DAC. A DAC that shows 96dB SNR does not actually provide perfect reproduction of CD quality music. Typically there is another 6 odd dB of real signal in the mid bands. Any DAC aspiring to be CD transparent needs to hit over 100dB SNR in the mid bands.

krabapple · May 8, 2021

I'm glad these videos exist and people are learning from them, but the finding that hi rez offerings typically have either a ton of ultrasonic hash (DSD) or only low level uncorrelated garbage beyond 24 kHz is not news. Even hi rez cheerleader Stereophile was publishing analyses on this back in the 2000's. I've been looking at purchased hi rez stuff with Audition for over a decade and rarely is there much correlated content (which would be ultrasonic harmonics anyway) in the wasteland beyond the audible range. Likely because so much 'hi rez' product is sourced from ....old analog tapes.

Back in the reign of CD (circa 1992) there was a guy name James Boyk, a music teacher /pianist then at Caltech who was pushing hard the idea that we need to capture (and play back) ultrasonic harmonics that instruments generate (in an infamous articke 'There's Life Above 20 KiloHerz') , and , gee whiz here's PS Audio huckster Paul McGowan citing that article just a few years ago. Boyk himself cited the crackpottery of Tsutomu Oohashi (uncorroborated except by himself -- regardless, his papers were very popular in audiophile circles back then ) of course to back up his claim.

Though his doctoral degree was in Agriculture, Oohashi also published on such topics as 'possession trances' and the uses of artificial life, as well as composing (most famously, the music for Akira)

Talk about a circle of confusion!

Francis Vaughan · May 8, 2021

To add, there are really two seminal papers from Shannon. They both stem from the same fundamentals.

The first is the basis of all modern communication theory:
Communication in the Presence of Noise.

The second is what is the basis of all modern information theory:
A Mathematical Theory of Communication.

If there was a Nobel Prize for engineering Claude would have been a no-brainer recipient.

krabapple · May 8, 2021

Amir, what Audition FFT settings do you use in these analyses?

Maxicut · May 8, 2021

Francis Vaughan said:
There are some subtleties here.
When sampling it is a requirement that dither is used. This is often not appreciated. An ADC that has no dither applied is incorrectly implemented and will yield objectionable signal correlated quantisation noise. This is true in any digital system, whether audio or any other system. Dither is not an option. It is a mathematical necessity to obtain correct operation from the quantiser. Dither is minimally AIWN of one half LSB. But it can be other distributions. Correct dither doesn't reduce the available bit depth. Incorrect dither might. Dither other than AIWN can lead to some surprising benefits. (Noise this low in amplitude is often below the inherent noise floor of the recording process. So long as there is AIWN in the chain - and most microphones will oblige, certianly any small diaphragm mic, there is often no real need to add it in explicitly. In early digital recordings this was often all that was done. But you need to know and be sure.)

The overarching theory is of course Shannon. I highly suggest that anyone interested read his original paper. It is perhaps my favourite scientific paper. Well written, groundbreaking, and one of the most important of the last century.

There are a lot of subtleties in the paper that go unappreciated.
The core point is signalling in a noisy channel. Every communication channel is a bandwith limited channel with a defined signal to noise ratio. The maximum information transfer rate is exactly determined by this pair of metrics. However SNR need not be constant over the channel. It is the area under the curve of SNR/frequency that determines the total information capacity. (This is the key insight that allows ADSL to work.)
It is perfectly possible to resolve signal that is well below the noise. Humans can, and radar and sonar systems exploit this to the hilt. There is signal below tape hiss. Detecting it is covered by Shannon's theorem. They key is that you need time to get enough information into the system to allow a signal to be found. And there is no free lunch, you can't detect arbitrary signals, the signal must itself have a low information content, and overall the books will balance. Signals that are self correlated are detectable, since self correlation limits their information content. Eventually you exchange ability to detect signal in one part of the band with ability in another. But it can be done any way you like by perturbing the overall SNR/frequency curve, so long as the area remains constant. The bit depth does not automatically set the signal dynamic range in the channel. It constrains the overall information content, but not the dynamic range at any one frequency.

If we note that the human ear has poor SNR at high frequencies - say the top octave, and certainly above 15kHz, there is a lot of wasted channel capacity. There is no chance any human, even with perfect hearing, can hear down to the LSB at 15kHz. Zip. Even the lowest couple of bits. So, in the digitisation process we can exchange SNR across the audible band. And this is exactly what all the various audio dither mechanisms do. They effectively perturb the dither in such away that the SNR in the human ear critical mid bands is better than 96dB, whilst it is reduced in the higher frequencies.
You can be sure that nearly every CD you ever bought has been so dithered. Back in the early days companies were very proud of their dither algorithms, and sometimes named them on the liner notes. Every DAW you can buy has a final dither step, and usually a choice of dither algorithms.

What is critical is this, dithering is pretty much a final step. It is applied in a DAW when the entire production and mastering is complete and the result is created for distribution. Typically the internal 24 bit samples are being truncated to 16 bit. Again, even if no specialied dither is applied, you cannot truncate the bits from 24 to 16. You get quantisation artefacts. In exactly the same way as ADCs must dither, truncation must dither.

Whilst shaped dither is related to noise shaping in DACs, as both are described by the same theory, the implementation and effects of the two are quite different. Dither results in in-band shaped noise. But the shaping is controlled so that it remains inaudible.

Internal to DAWs there is a lot of care an management needed in the digital chain. In the ideal case there is never any truncation or re-sampling applied to a signal. But if there is it is critical that the algorithms do not lose the below LSB signal. Different plugins can be poorly behaved in this manner, and silently introduce quantisation noise into the chain. A 16 bit audio signal commonly has information below 96dB at some frequencies, and the DAW must ensure that is preserved.

This is also important when considering the ultimate quality of a DAC. A DAC that shows 96dB SNR does not actually provide perfect reproduction of CD quality music. Typically there is another 6 odd dB of real signal in the mid bands. Any DAC aspiring to be CD transparent needs to hit over 100dB SNR in the mid bands.

Finally, someone who knows what they're talking about! It has become common practice to round out the threshold of human hearing to 20/20kHz, but it's actually about 30/18,500Hz when 1st born. There's so much I could talk about here, but I'll refrain lol. Good post!

GWolfman · May 8, 2021

amirm said:
It is only 96 dB if you don't use dither. Depending on the type you use, you will loose 3 to 6 dB. Room noise is also dominated in bass frequencies where our hearing is not very sensitive. Where we are (2 to 5 kHz), there are many rooms that are audibly silent. Room noise is also omni-directional making it less objectionable and audible than what comes out of a speaker.

And of course we also have headphone listening where noise isolation can be provided and getting high SPL is incredibly easy.

Most of what I was trying to say earlier.

Thanks.

krabapple · May 8, 2021

Frank Dernie said:
I have recorded, mainly classical, music for well over 50 years now.
I have never heard of any music with wide enough dynamic range to tax 16-bit. Back in the old reel-to-reel tape days it required quite a bit of skill to get the levels set so that the quiet bits were not in the noise and the loud bits were not overloaded too much on classical music. The first 16-bit recorder I bought setting the levels was easy so there was no clipping and the background noise is, in any case, so low you can't hear it at all at normal listening level. It was also the first time where I couldn't hear a difference between recorder and microphone feed.

Fielder's 1982 JAES paper on "Dynamic-Range Requirement for Subjectively Noise-Free Reproduction of Music" in which he measured orchestral concert music levels, was often cited as indicating the need for more than 16 bits to capture the rare loudest moments.

Francis Vaughan · May 8, 2021

Adding another point I missed addressing earlier.
Dither does not randomise the LSB.

This is a curious misconception that seems prevalent across the audio community. There was a time when the analog enthusiast camp claimed that the point of dither was to drown out quantisation artefacts with noise. Which is just plain wrong. But their misunderstanding of dither seemed to include an idea that dither was full LSB. And that part of the misunderstanding didn't go away.

To understand the subtleties of dither consider the following challenge. I give you a magic black box with two terminals and a lamp on the front. The lamp will light if the voltage across the terminals exceeds 0.5 volts. Otherwise it is off. The box performs perfectly.

I give you a whole stack of other boxes, each with two terminals, that have an unknown voltage between 0.0 and 1.0 volts on the terminals. To what accuracy can you measure the voltages on these boxes?

Clearly all you can do is separate the boxes into over and under 0.5 volts. Your measurement system has 1 bit of resolution.

Now I also give you a box that delivers a random voltage anywhere between -0.5 and 0.5 volts. Every time you connect to the box a different perfectly random voltage is provided, where the distribution is flat. To what accuracy can you now measure the voltages on the unknown sources? What constrains that accuracy?

Connect the noise box in series with the unknown voltages. Measure multiple times and average the result. Every time you double the number of measurements so done, you add one bit of resolution.

This is dither. Dither does not randomise the LSB, it correlates it with signal that is lower than the resolution of the LSB in a manner that encodes that information into the LSB. Again, the constraints are exactly described by Shannon. Your overall information budget remains constant. The actual constraints in either the digital or analog world are identical, and there is a wonderful mathematical duality between hearing music below the hiss and the ability of dither to encode signal below the LSB.

danadam · May 8, 2021

voodooless said:
Note: I don’t know if YouTube actually preserves all of the dithered audio when compressing.. I can’t hear anything in the last 1/5th anyway

It's still there, although the noise seems to get higher. Check the -102 dBFS with 60 dB gain in the attachments.

Frank Dernie · May 8, 2021

amirm said:
It is only 96 dB if you don't use dither. Depending on the type you use, you will loose 3 to 6 dB. Room noise is also dominated in bass frequencies where our hearing is not very sensitive. Where we are (2 to 5 kHz), there are many rooms that are audibly silent. Room noise is also omni-directional making it less objectionable and audible than what comes out of a speaker.

And of course we also have headphone listening where noise isolation can be provided and getting high SPL is incredibly easy.

All true, but there still aren't any music recordings that have 96dB of dynamic range that I know of, if there were headphones would indeed be the only way to listen to them.
I find headphones (with gain riding) handy for hearing recording faults and setting up microphones (though nearfield monitoring and gain riding can do it too).
OTOH I don't consider headphone listening satisfactory for generally listening to music for pleasure, there is absolutely nothing high fidelity about a tiny line of micro-musicians playing in and about my head.

Hayabusa · May 8, 2021

@amirm Also a nice way of finding weird tones is with the spectrograms of audacity:

Frank Dernie · May 8, 2021

krabapple said:
Fielder's 1982 JAES paper on "Dynamic-Range Requirement for Subjectively Noise-Free Reproduction of Music" in which he measured orchestral concert music levels, was often cited as indicating the need for more than 16 bits to capture the rare loudest moments.

It is very rare to need that much, and 24-bit is brilliant to catch instantaneous surprises - I use a 24-bit recorder now.
What I can guarantee you, though, is that when the recording is mixed for release to the general public the big peaks will have had soft limiting applied in almost all cases. There is no point in issuing recordings that 99.99% of buyers will be unable to play back.
I do have a few classical recordings with fairly realistic dynamic range, the average sound level on these is between 10dB and 15dB lower than normal to allow for these peaks and they sound fantastic IMO but it is still the case that even these don't use all the 96dB of CD potential.
They are sadly rare too and unless you are into big stuff like Bruckner and Mahler symphonies or, similar huge ensemble music like Verdi's Requiem you won't be listening to it!

It is no accident recording engineers pretty well all know that 24-bit is handy at the recording stage and 16-bit more than enough for the commercial release.

DavidMcRoy · May 8, 2021

I wonder who among us can distinguish “improved high-frequency air and quicker transient snap,” to borrow subjectivist terminology, from euphonic distortion brought about by the presence of ultrasonics in the signal that manifests in the audible range?

ousi · May 8, 2021

For pure comparison, speaking about the MQA side. Would it be worthwhile to do a test where the MQA file of the same sample is fed through a MQA capable DAC (e.g. the Topping D90), route the analog output to the ADC of the analyzer, record it at 24bit 96kHz WAV. And then, do the same but with a 24bit 96kHz source FLAC file, fed into the same DAC and route the output to the ADC of the analyzer, record it at the same sampling rate and bit depth. And then do a diff between the two recorded files after we compensated for where the starting points are?

The point I want to make is I would like to see how good/bad MQA fared with a lossless file format when played through the same DAC, with the MQA file properly unfolded and decoded. I know that there will be some random-ness in the DA and AD process... Could be an interesting test IMO.

ousi · May 8, 2021

Frank Dernie said:
It is very rare to need that much, and 24-bit is brilliant to catch instantaneous surprises - I use a 24-bit recorder now.
What I can guarantee you, though, is that when the recording is mixed for release to the general public the big peaks will have had soft limiting applied in almost all cases. There is no point in issuing recordings that 99.99% of buyers will be unable to play back.
I do have a few classical recordings with fairly realistic dynamic range, the average sound level on these is between 10dB and 15dB lower than normal to allow for these peaks and they sound fantastic IMO but it is still the case that even these don't use all the 96dB of CD potential.
They are sadly rare too and unless you are into big stuff like Bruckner and Mahler symphonies or, similar huge ensemble music like Verdi's Requiem you won't be listening to it!

It is no accident recording engineers pretty well all know that 24-bit is handy at the recording stage and 16-bit more than enough for the commercial release.

Isn't that what happened during the loudness war? Compressing and filling every spectrum making everything very loud while losing almost all dynamic range :'(

Inside High-res audio: PCM vs MQA vs CD: 2L Sampler Comparison

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Grand Contributor

UKPI

Guest

Attachments

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Active Member

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Attachments

Master Contributor

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Active Member

Active Member

Similar threads