MQA creator Bob Stuart answers questions.

Blumlein 88 · Jun 15, 2019

Correspondingly, the Sampling Theorem simply doesn't technically apply to such practical functions. If we want to perceptually accurately sample the audio signals containing such functions as their components (e.g. such signals as some practically observed transients), we have to first approximate those "inconvenient" functions with other functions, perceptually equivalent, yet satisfying the third technical requirement of the Sampling Theorem.

Show me some practically observed transients. The ones that the Sampling Theorem doesn't apply to.

Sergei · Jun 15, 2019

Blumlein 88 said:
I had a nice response written and erased it. Simply JUST NO. If you make a claim somewhat beyond what is thought to be the case, especially when those doubting your claim have many reasons to think your case holds no water it is up to the claimant to come up with something. And by come up with something I don't mean off the cuff half-baked hypothesis with associations with barely similar odd results saying okay up to you guys to prove otherwise.

Please don't erase your nice responses. Seriously. I value them.

What you describe is not what I meant by the "split". Surely a person defending the status quo doesn't have to do the work proving that a new paradigm describes some phenomena better. I agree with you on that: that's the job of the proponents of the new paradigm.

What I meant by "split" is that the status quo camp still has to come up with an explanation/model/theory in their frame of reference, which describes a confirmed but not yet explained by them phenomena.

Specifically, what would be the explanation of the observed statistically significant preference for high resolution music formats? Or, what would be the explanation of a destructive effect of an acoustic event that may not be reliably captured using the sampling rate that the old paradigm deems sufficient for capturing everything of importance to hearing?

Sergei · Jun 15, 2019

Blumlein 88 said:
Correspondingly, the Sampling Theorem simply doesn't technically apply to such practical functions. If we want to perceptually accurately sample the audio signals containing such functions as their components (e.g. such signals as some practically observed transients), we have to first approximate those "inconvenient" functions with other functions, perceptually equivalent, yet satisfying the third technical requirement of the Sampling Theorem.

Show me some practically observed transients. The ones that the Sampling Theorem doesn't apply to.

A nice list of transients-generating music instruments here: https://www.cco.caltech.edu/~boyk/spectra/spectra.htm. Figure 1 (a) refers to trumpet: as discussed previously on this thread, an instrument capable of generating genuine shockwaves.

Violin bowed string motion includes a sharp triangle corner. Detailed discussion here: http://www.speech.kth.se/prod/publications/files/3321.pdf.

Blumlein 88 · Jun 15, 2019

Sergei said:
A nice list of transients-generating music instruments here: https://www.cco.caltech.edu/~boyk/spectra/spectra.htm. Figure 1 (a) refers to trumpet: as discussed previously on this thread, an instrument capable of generating genuine shockwaves.

Violin bowed string motion includes a sharp triangle corner. Detailed discussion here: http://www.speech.kth.se/prod/publications/files/3321.pdf.

In the Boyk info, there is high frequency info, but nothing so steep it is a problem. I've done recordings of jangling keys and my perforated pizza pan with high energy there, but it isn't an abnormal waveform that violates the outlines of the Sampling theorem.

As for the bowed violin, I gave it a quick look, and will look more deeply, but so far I see it is showing resonance has to occur on the string even with a stick and slip sawtooth wave. Resonance takes time to build and isn't as sharp as imagined.

I suppose this is merely anecdotal, but I've been privileged to record a young violinist. She was 12 when I first recorded her. Some various things indicated she had excellent high frequency hearing. I recorded her with an Earthworks microphone, at various rates up to 192 khz. Asked her to pick the one that sounded best. Let her use my headphones with 50 khz response, and my Soundlab speakers which should work till 28 khz. She didn't have a preference vs 48 khz. I didn't tell her which was which just lined them up and let her listen. She is very picky about how she sounds recorded mostly about mistakes of course like many musicians. She is super picky about timing, and small sound irregularities. Several times asking about things I hadn't noticed, but she did. 48/24 is fine with her on her own recordings.

It is much more likely that various non-linear processing done in recordings might cause something that matters. These are akin to the 'illegal' signals. Illegal in that they couldn't occur in an AD/DA loop, but can be inserted digitally prior to the DA playback.

Sergei · Jun 15, 2019

Blumlein 88 said:
It is much more likely that various non-linear processing done in recordings might cause something that matters. These are akin to the 'illegal' signals. Illegal in that they couldn't occur in an AD/DA loop, but can be inserted digitally prior to the DA playback.

That's a good insight about the 'illegal' signals. I agree. I also agree that 48/24 appears sufficient for reproducing perceptually transparently a limited number of natural music instruments and vocals - for me personally, up to about four.

Continuing on the Sampling Theorem mathematical perfection vs reality. The Theorem only works perfectly if the sampled values are captured with perfect precision, and the reconstruction is done with perfect precision as well. Stuart and Craven (https://secure.aes.org/forum/pubs/journal/?elib=20457) illustrate imperfections caused by the process of quantization, how the imperfections are affected by various types of dithering, and - rare to see in an audio publication - their impact on the accuracy in time domain.

The graphs below demonstrate how doubling the sample rate accelerates the convergence between the true value of signal and what was imperfectly captured through the quantization process. Given enough samples, the variance becomes very small: that's what the well respected gurus of the Sampling Theorem describe to us when they talk about the perfection of digital audio in their video tutorials.

However, such perfection isn't achieved right away. Qualitatively, figuring out the true shape of a signal, when the values of samples are not captured perfectly, requires averaging over time. If the signal doesn't change its shape too much during the characteristic time of averaging, the process converges. Because of that, quantization works very well for a single sinusoid with constant amplitude - a staple in the Sampling Theorem video tutorials.

If we add a second sinusoid, we now need more samples to figure out the shapes of two of them, mixed together, with the same precision as we did for one sinusoid. Qualitatively, we need twice as many samples, yet this is not exactly true, because the two sinusoids effectively start dithering each other, resulting in a quicker conversion toward true value. Still, definitely more samples are needed to achieve the same level of accuracy. Which, at a given sample rate, means more time.

As we add more and more signal components, we need to add more and more samples, in order to capture the signal in such a way that we can reconstruct if with a required level of precision. The graphs below depict averages over simulations of signals meant to approximate what is encountered in real music. You can see that the number of samples required to converge to a desired level of precision (let's say, of about 0.3 units on these graphs) is not small - on the order of thousands for the lower sampling rate. The higher the sampling rate, the quicker such convergence is achieved.

This effect, qualitatively, hints at what might be going on when a complex music - with hundreds of sinusoids exhibiting quickly varying amplitudes and frequencies, intermixed with transients - is quantized. If the sampling rate and bit depth are not high enough, the components of music may never converge close enough to to their true values during an intense music passage. Then the reconstruction - however perfect - results in an analog signal which kinda sorta resembles the fragment of original symphony, yet sounds decidedly fake.

The right question to ask at this point is - how far do we need to go with the sampling rate and bit depth for the music to sound absolutely transparent? And the right answer is "It depends": on particular music piece, on particular sound delivery system, and on particular person, including the person's neurophysiological condition at a particular moment.

Sound delivery systems are not perfect. Human hearing systems are not perfect either: even when presented with live music, we can't sometimes even approximately perceive what a particular musician is playing at a particular moment. So, there is a practical limit to the "digital perfection", beyond which the other inherent imperfections start dominating the total subjective imperfection level.

SIY · Jun 15, 2019

Theorem. You keep using that word. I do not think it means what you think it means.

And seriously, I don't think you have a good grasp of Fourier or Shannon, and until such a time that you do, you're still going to be all over the place with irrelevancies.

solderdude · Jun 15, 2019

It doesn't say (or glanced over the WOT) but is this 24 bit depth or 16 bit depth ?

You worry about a 1LSB error ? (At least that's what the plots seem to indicate)

SIY · Jun 15, 2019

solderdude said:
It doesn't say (or glanced over the WOT) but is this 24 bit depth or 16 bit depth ?

You worry about a 1LSB error ? (At least that's what the plots seem to indicate)

The WOT perfectly illustrates the issue- a very fundamental lack of understanding of the two basic theorems involved. Here, I'll pull out an example for you:

Because of that, quantization works very well for a single sinusoid with constant amplitude - a staple in the Sampling Theorem video tutorials.

If we add a second sinusoid, we now need more samples to figure out the shapes of two of them, mixed together, with the same precision as we did for one sinusoid. Qualitatively, we need twice as many samples, yet this is not exactly true, because the two sinusoids effectively start dithering each other, resulting in a quicker conversion toward true value. Still, definitely more samples are needed to achieve the same level of accuracy. Which, at a given sample rate, means more time.

As we add more and more signal components, we need to add more and more samples, in order to capture the signal in such a way that we can reconstruct if with a required level of precision.

Sergei · Jun 15, 2019

solderdude said:
It doesn't say (or glanced over the WOT) but is this 24 bit depth or 16 bit depth ?

You worry about a 1LSB error ? (At least that's what the plots seem to indicate)

The reference [9] (http://www.aes.org/tmpFiles/elib/20190615/6716.pdf) also only uses LSB units. Given the time of its publication (1992), I believe the context it 16 bits rather than 24 bits. But once again, they only seem to care about the precision expressed in LSB.

And yes, I do care about the 1LSB error. Otherwise, why keep that LSB in the system in the first place? Elsewhere in that JAES issue, there is an anecdote of audio engineers able to hear quantization errors in the 24th bit.

Sergei · Jun 15, 2019

SIY said:
The WOT perfectly illustrates the issue- a very fundamental lack of understanding of the two basic theorems involved.

How do you understand it then? Here's a simple test for you. Mix two sinusoids of different frequencies, amplitudes, and phases; quantize them into a PCM file, give any three consecutive samples of that file to any of the forum members. Ask the member to tell you what were the frequencies, amplitudes, and phases of each of the two sinusoids, based on these three PCM values.

SIY · Jun 15, 2019

Sergei said:
How do you understand it then?

By understanding the Shannon-Nyquist theorem. If you want to run a pointless experiment, feel free, but you haven't shown any sign of wanting to either actually experiment or actually understand the very basics of PCM. Don't give me homework to labor over whatever piece of lint flies off your keyboard.

This assumes that all of this is sincere, but I'll be pretty clear that I don't believe that's the case.

Blumlein 88 · Jun 16, 2019

Sergei said:
The reference [9] (http://www.aes.org/tmpFiles/elib/20190615/6716.pdf) also only uses LSB units. Given the time of its publication (1992), I believe the context it 16 bits rather than 24 bits. But once again, they only seem to care about the precision expressed in LSB.

And yes, I do care about the 1LSB error. Otherwise, why keep that LSB in the system in the first place? Elsewhere in that JAES issue, there is an anecdote of audio engineers able to hear quantization errors in the 24th bit.

Most of your link comes from a doctoral thesis. Here it is in an easier to read form.
http://www.robertwannamaker.com/writings/rw_phd.pdf

Aprude51 · Jun 16, 2019

Sergei - I’ve been reading through this saga with a bit of interest, and although I don’t fully understand the signal processing theories that underpin the discussion, I do see some fundamental issues with your approach. I believe you’re so enamored with the possibility of advancing our understanding that you’ve set aside the scientific method, which ironically, is the only means of actually advancing our understanding.

I don’t doubt your intelligence or training but the way you present your ideas prevents others from effectively challenging them. SIY has pointed out that your assertions are not theories, and this is an important reason why. If you are really interested in progressing our understanding, why not start a new thread where you clearly lay out your ideas so that they can actually be tested?

Sergei · Jun 16, 2019

SIY said:
By understanding the Shannon-Nyquist theorem. If you want to run a pointless experiment, feel free, but you haven't shown any sign of wanting to either actually experiment or actually understand the very basics of PCM. Don't give me homework to labor over whatever piece of lint flies off your keyboard.

This assumes that all of this is sincere, but I'll be pretty clear that I don't believe that's the case.

This was sincere. And very concrete. The exercise shows that the Whittaker–Nyquist–Kotelnikov–Shannon theorem doesn't literally apply to PCM encoding, because the theorem operates on infinitely precise values, defined at discrete time points.

Infinitely precise value is a mathematical abstraction. It doesn't really exist. If it did, we could encode a description of the entire universe in just one number between 0.0 and 1.0, using digits from 0 to 9, placed in a sequence after the decimal point. PCM, unless it has infinite bit depth, can't encode a value with infinite precision.

With PCM, you have to take into account the precision of the encoding of the dimensions: such as frequency, amplitude, and phase. Assuming that the three PCM samples you encoded contain 24 x 3 = 72 bits, that's the core amount of information the decoder would have to work with. Additional information provided to the decoder is that there are two (two bits worth if information) band-limited sinusoids (sets the context for the Math operational in this case).

It is those 72 + 2 = 74 bits of information, plus the context of applicable decoding algorithms, that the decoder has to work with. Using DSP tricks, the decoder can trade the precision of determining amplitudes for the precision of determining the frequencies, trade both or one by one for the precision of determining the phases, yet it can't get something for nothing: the overall amount of information will not increase.

You can apply a straight discrete Fourier transform to the three samples, which will keep the amplitude precision, yet you'll end up with just three frequency bins. If your original sinusoids were, say of 100 Hz and 10,000 Hz, we could deduce from the transform graph, with decent precision, the energies that would go to tweeter and woofer of a three-way speaker.

Let's say we are not that interested in the energies, and want to know instead whether the signal contains musically consonant or musically dissonant sinusoids. In this case, we would apply a discrete Fourier transform after padding the samples, with, say, 4,093 zero-valued samples. That would allow us to resolve the frequencies much better, yet the precision of amplitude's determination would go down.

The human hearing system doesn't operate exactly like the Fourier transform, yet it is not excepted from the fundamental laws of Information Theory. Based on just those three samples, a human would also have trouble determining what that signal represents. Which brings us to the evolutionary value of the transients detection, which employs specialized neural structures, such as the Octopus Cells.

Even just one sample with sufficient amplitude, presented in the absence of any other audio information, could quickly tell the hearing system, and ultimately the decision-making part of the brain, that something is happening. This one bit of timely information could be just enough for an animal to stop doing whatever it was doing, and start listening, looking, and sniffing more attentively, or simply quickly leave the scene without further ado.

If you want to increase the amount of information transferred via a sequence of samples of fixed bit depth, you have to supply more samples. This was my original point. The paper by Stuart and Craven makes it more concrete, by showing how many more samples, on average over a set of quasi-realistic music signals, you'd need to provide in order to achieve a desired amplitude precision.

The required number of samples, for a target amplitude precision, depends on the particulars of the PCM encoding (such as dithering algorithm). It also depends on the sampling rate. Higher sampling rate, even if the number of samples needed to achieve the target precision was exactly the same, results in a shorter required physical time.

SIY · Jun 16, 2019

In that mass of misunderstanding and irrelevance, you left out one thing:

March Audio · Jun 16, 2019

SIY said:
The WOT perfectly illustrates the issue- a very fundamental lack of understanding of the two basic theorems involved. Here, I'll pull out an example for you:

I have turned the individual off ignore just for a moment to see where this thread is going. Absolutely nowhere. There is just waffling all over the place, no coherence and no understanding of the theorems as you say above.

back on ignore again, I might visit the thread again in a few weeks to see if any progress is made.

solderdude · Jun 16, 2019

Sergei said:
The reference [9] (http://www.aes.org/tmpFiles/elib/20190615/6716.pdf) also only uses LSB units. Given the time of its publication (1992), I believe the context it 16 bits rather than 24 bits. But once again, they only seem to care about the precision expressed in LSB.

And yes, I do care about the 1LSB error. Otherwise, why keep that LSB in the system in the first place? Elsewhere in that JAES issue, there is an anecdote of audio engineers able to hear quantization errors in the 24th bit.

They mention 96kHz ... 96/16 as a fomat ? Is it perhaps more likely that they are talking about 24 bits ?
Isn't that buried in noise in any practical situation ?

I can find plenty of anecdotes online and offline made by 'notable individuals' as well as noobs of people hearing things... well claiming to hear things.

solderdude · Jun 16, 2019

Sergei said:
How do you understand it then? Here's a simple test for you. Mix two sinusoids of different frequencies, amplitudes, and phases; quantize them into a PCM file, give any three consecutive samples of that file to any of the forum members. Ask the member to tell you what were the frequencies, amplitudes, and phases of each of the two sinusoids, based on these three PCM values.

I'll (well Amir actually) raises you a few sinusoidial frequencies and you can tell me what frequencies are present in what amplitude.

Don't be fooled by the 'Almost perfect' remark... this remark is about the noise floor not about frequencies and amplitude

Sergei · Jun 16, 2019

Aprude51 said:
Sergei - I’ve been reading through this saga with a bit of interest, and although I don’t fully understand the signal processing theories that underpin the discussion, I do see some fundamental issues with your approach. I believe you’re so enamored with the possibility of advancing our understanding that you’ve set aside the scientific method, which ironically, is the only means of actually advancing our understanding.

Actually, my primary objective is advancing my own understanding

Expressing the ideas makes them subject to peer review. On this very thread, I was proven wrong in some parts of my understanding. I had to deepen my understanding of other parts, to keep the discussion more engaging for members with wealth of knowledge and experience on the subject. And I had to try out different levels of abstraction, and different angles of view, to relay knowledge to members who are less well versed in this area.

Aprude51 said:
I don’t doubt your intelligence or training but the way you present your ideas prevents others from effectively challenging them.

Could you please elaborate. For instance, it would help me if you could rephrase one of the sentences or paragraphs I've written, in a way that would be easier for you to challenge (or agree with).

Aprude51 said:
SIY has pointed out that your assertions are not theories, and this is an important reason why.

I don't understand this statement. Could you please explain this to me in your own words.

Aprude51 said:
If you are really interested in progressing our understanding, why not start a new thread where you clearly lay out your ideas so that they can actually be tested?

Thought about that. Also, this was Amir's suggestion. Are we finished with the MQA discussion then? Or there are still many members believing that Stuart and Craven are quacks?

solderdude · Jun 16, 2019

I don't think they are quacks ... shrude business men... perhaps.

Some of the evidence points towards the launch of MQA to being money driven.
I actually quite like the novel idea of folding ultrasonics and asking money to 'correct it for you'

Since I know the LSB is so important to you... how do you feel knowing the last few LSB's have been removed by MQA designers and are replaced by 'folded lossy HF energy' in a 'noise alike' guise ?

MQA creator Bob Stuart answers questions.

Grand Contributor

Senior Member

Senior Member

Grand Contributor

Senior Member

Grand Contributor

Grand Contributor

Grand Contributor

Senior Member

Senior Member

Grand Contributor

Grand Contributor

Member

Senior Member

Grand Contributor

Master Contributor

Grand Contributor

Grand Contributor

Senior Member

Grand Contributor

Similar threads