• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MQA creator Bob Stuart answers questions.

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,407
Likes
4,004
Location
Pacific Northwest
Doent matter what version of flac you use. White noise is not compressible.
That's true. The only explanation I can think of is that when you generate white noise from Room EQ Wizard, it's in mono (same in each channel) so flac is getting a 2:1 reduction.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,359
Likes
24,661
Location
Alfred, NY
It may not be inaudible though.

Given the magnitude compared to the real-world implementations that currently exist, it would be... surprising. About as surprising as silver vs copper in wires, also claimed "yet to be proven/disproven," and also based on actual numbers (conductivity of silver vs copper).
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
That paper is in the domain of theory. It's looking at the maths. That's not a valid criticism. That would be the next stage of inquiry.
It is valid criticism. It doesn't even have a results section.
You are telling me they did all that math, but never bothered to check what it actually does?
It's not like they need special equipment to perform a convolution on an input signal.
Take a look at other papers, not only they show their own results, but they also compare their method to other state of the art methods, not just "traditional".
For this to be taken seriously they need to show how well this performs in comparison to current state of the art ADC/DAC methods which include dithering, noise shaping, oversampling etc., not just a truncated sinc.

Maybe their filter is great, but that paper doesn't show it.
Theories are a dime a dozen.
 
  • Like
Reactions: SIY

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
It is valid criticism. It doesn't even have a results section.
You are telling me they did all that math, but never bothered to check what it actually does?
It's not like they need special equipment to perform a convolution on an input signal.
Take a look at other papers, not only they show their own results, but they also compare their method to other state of the art methods, not just "traditional".
For this to be taken seriously they need to show how well this performs in comparison to current state of the art ADC/DAC methods which include dithering, noise shaping etc., not just a truncated sinc.

Maybe their filter is great, but that paper doesn't show it.
Theories are a dime a dozen.

I'm getting a bit frustrated with the propensity for hand-waving away anything that might challenge existing orthodoxy that seems to be occurring in this thread. That paper was published in the only peer reviewed journal relating exclusively to audio technology. It would not have passed peer review if it was so obviously flawed as to be able to be so easily discounted.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,359
Likes
24,661
Location
Alfred, NY
There is a difference between "flawed" and "not terribly relevant to the real world." Trust me, I have published several papers in peer reviewed journals that fit the latter description. Theoretically correct but... who cares?
 

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
There is a difference between "flawed" and "not terribly relevant to the real world." Trust me, I have published several papers in peer reviewed journals that fit the latter description. Theoretically correct but... who cares?

Perhaps Amir should stop reviewing dacs then? Certainly no need for the master SINAD list, either a DAC is good enough to be transparent or not. I don't see that happening.

And the fact is, we aren't in a place where we can conclusively say that this is irrelevant, when we have some evidence of the audibility of improvements that can occur for some listeners beyond standard CD quality. It is premature to dismiss things at this stage.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
Strawman?
I was talking about a specific paper.
what are you talking about?

Edit: I am reading academic papers from a view point of an engineer, i dont care much about pure theory.
Edit2:
Another reason I am skeptical about this is because they are just proposing a regular FIR filter with different coefficients.
From my experience in image processing, to get a significant improvement over ye-olde traditional bi-cubic filter, you have to do non linear things.
For example switching filters based on content, bi-lateral filtering, or the newest hotness- neural networks (they do up-scaling and content aware processing quite nicely).
 
Last edited:

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
This is what we have:

a) Preliminary evidence about the audibility of improvements in audio beyond 16 bit 44khz.
b) Hypotheses regarding why the artifacts of filters could be responsible for the outcomes at a) above
c) Pure theory about an alternative approach to standard brick-wall filtering that could address the hypothesised issue
d) Theory regarding the non LTI integration of the human ear supporting a hypothesis for why filter artifacts might be audible
e) A paper from MQA which says the difference is audible, but has been criticised for their choice of 16 bit 44khz filter and dither method, which means we can't say it's reliable, but neither can we dismiss the possibility entirely either.

That leads me to:

It is plausible that either an increase in sampling and bit rates, or a change in the encoding/decoding scheme, or a combination thereof, is necessary for complete transparency in all circumstances.

What that all adds up to is: Further, more comprehensive research is needed.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,359
Likes
24,661
Location
Alfred, NY
What that all adds up to is: Further, more comprehensive research is needed.

That was my key phrase at the end of every paper meaning, "I need more grant money."

Also, none of the things you describe as "theories" have reached anything near that status. They are, to be generous, "hypotheses" at best.

Langmuir is smiling at this.
 

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
That was my key phrase at the end of every paper meaning, "I need more grant money."

Also, none of the things you describe as "theories" have reached anything near that status. They are, to be generous, "hypotheses" at best.

Langmuir is smiling at this.

If 16-bit 44khz is close to but not quite transparent in all circumstances, then you would expect there to be difficulty in measuring the effect and you would expect it to be small. That doesn't show that's it's pathological. I personally have no vested interest in this whatsoever. I'm just interested in the discussion on purely intellectual grounds. That's why I'm prepared to alter my position on reflection and have criticised certain statements made regardless of the ideological position occupied by the poster.

At the end of the day though I don't think there's any disagreement that there are much bigger fish to fry when it comes to improving audio.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,407
Likes
4,004
Location
Pacific Northwest
I can, but I've already looked at them in Audacity and confirmed that the L and R channels are identical. That explains the 50% compression in FLAC.
1560296305237.png

Getting back to the question of theoretical ideal compression: every music file will have a different amount of entropy, so the theoretical max/ideal lossless compression will vary. We'd have to measure the entropy of a few sample music files of different types & instrumentation to get an idea how much it varies.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,451
Likes
36,880
One thing I will say about this thread is that it's helped me to think through the concepts.

Let's put aside audibility for a moment. Why would we do that? For one, there seems to be much more ready acceptance amongst members of this site that the pursuit of engineering excellence is worthwhile, even for it's own sake. Look at the fervour for ever better SINAD on dacs.

Also let's put aside our reservations about DRM and commercial motivation.

Would we not therefore give some credence to a system which, if proven, is measurably closer to perfect analog reconstruction?

From a pure theory perspective, there is an issue with shannon/nyquist sampling in that the sin(x)/x function requires an infinite extent in order to perfectly reconstruct the analog waveform. In practice this means truncating the time extent of the impulse filter. I'm quoting from "Modern Sampling: A Tutorial" by Jamie Angus (which was linked to by a member earlie) here, which I've attached again along with the Stuart paper for easy reference.

1. The filter no longer has an infinite rate of cut-off and thus needs a guard band between the upper frequency of the continuous signal and the lowest frequency of the first alias.
2. More subtly, because the impulse response is now finite in extent, it is impossible to realize a stop-band response of zero (-∞ dB of attenuation) over the whole frequency range of the stop band.
3. In fact, unless the stop band achieves infinite attenuation at infinite frequency, there will always be some alias components present even if the sampling frequency is increased to infinity.
4. The truncated sin(x)/x functions no longer add up to a constant value when the sampled continuous signal is constant. This means that there is some difference between the reconstructed signal and the original signal.
5. The truncated sin(x)/x functions are no longer orthogonal for a time shift equal to multiples of a sample period. This means that the samples are no longer projected properly onto a sampleable space and therefore the samples will have leakage into each other (alias distortion), even if the continuous signal was white noise.

One possible way to overcome this issue is to use a bi-orthogonal approach using β-splines, which can allow for an exponential fall off in impulse response rather than a linear one. By bi-orthogonal the authors mean the effects of the ADC filter being reversed at output with the DAC filter.

This seems to be part of what MQA attempt to do. Now, it seems to me, putting aside the considerations relating to DRM, commercial motive, and audibility, that there is nothing inherently wrong with seeking to pursue a greater degree of engineering excellence in sampling. Especially if we have the means to do so.
You forgot the part where they say for B splines to work this magic, due to the gentler filtering you need to have about 4 times the sample rate. So B splines at 192 khz might provide some advantages to 48 khz PCM with brickwall filtering. There was a supposition that as music and hearing provide some filtering you might get away with only doubling sample rate. For this discussion keep it to 4x I think. So the obvious question becomes do the B splines offer any advantages if we were comparing 192 khz PCM vs using B-splines. So just going by some of Bob Stuarts writings even he would say 352 or 384 PCM is about as good as the lower sample rate (96 khz method). And he may be overselling the point as it may need 192 with B splines for that to actually be true.

If I understood some of Unser's writing about B splines (and I don't claim to know it much), B splines would allow some ability to up or down sample with better performance for various purposes. Most of his writing is in regard to pictures or visual information. Perhaps in those you really are varying perceivable resolution as current methods don't exceed visual perception of humans. If 48 khz totally and transparently encompasses human hearing perception then any potential advantage may amount to nothing unless you wanted to transmit very low bit rates at more limited frequency ranges.
 

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
You forgot the part where they say for B splines to work this magic, due to the gentler filtering you need to have about 4 times the sample rate. So B splines at 192 khz might provide some advantages to 48 khz PCM with brickwall filtering. There was a supposition that as music and hearing provide some filtering you might get away with only doubling sample rate. For this discussion keep it to 4x I think. So the obvious question becomes do the B splines offer any advantages if we were comparing 192 khz PCM vs using B-splines. So just going by some of Bob Stuarts writings even he would say 352 or 384 PCM is about as good as the lower sample rate (96 khz method). And he may be overselling the point as it may need 192 with B splines for that to actually be true.

If I understood some of Unser's writing about B splines (and I don't claim to know it much), B splines would allow some ability to up or down sample with better performance for various purposes. Most of his writing is in regard to pictures or visual information. Perhaps in those you really are varying perceivable resolution as current methods don't exceed visual perception of humans. If 48 khz totally and transparently encompasses human hearing perception then any potential advantage may amount to nothing unless you wanted to transmit very low bit rates at more limited frequency ranges.

"Unser [18] presents graphs that suggest that 3 to 4 times oversampling might be needed, depending on the order of the spline and the level of residual error required."

I'm not really sure how you think that was a relevant omission? Oversampling is a standard strategy employed in most dacs already anyway. The whole point of a strategy like this would be to obtain the benefits of higher precision relative to the same sampling rate using conventional means.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Could you point me to the part of the paper which talks about a fast integrator working over a time frame of microseconds?
The lowest integration time I saw in the graphs there was ~1ms.

Figure 8 on page 7412. Illustrates neural transmission delays. Note: "As a consequence, the differences in first-spike trigger times between fibers of different thresholds in response to a given stimulus are enhanced, rather than reduced or cancelled, as the spikes travel toward the cochlear nucleus." The delays in the 1ms-5ms range encode the level of the stimulus. The actual integration is finished before the impulse starts traveling.

There is no magic though. There are no micro-mechanisms in cochlea that could directly follow amplitude shape of, let's say, a 1 μs pulse. The mechanism, as I mentioned before, involves secondary physical processes. Papers describing how mechanical momentum of a short pulse is transformed and integrated:

https://asa.scitation.org/doi/pdf/10.1121/1.421377?class=pdf
Early understanding. Clicks of 50 μs. Yet already noted that the basilar membrane "rings", rather than follows the shape of the click. So, the integration happens via a regular IHC mechanism, but at multiple frequencies in parallel, excited by the mechanical momentum of the click.

https://www.nature.com/articles/srep05941
Contemporary study. Shorter and more precise laser excitation of mechanical transient in basilar membrane, corresponding to normal listening levels. Qualitatively, same deal: the membrane "rings" as it pleases, rather than follows the shape of the quick pulse, or Fourier-transforms the shape. Also noted that this ringing phenomenon only happens in live cochlea. In a dead animal, a quick transient moves the membrane initially, but then the motion is quickly absorbed, there is no consequent ringing.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005015#pcbi.1005015.ref001
Even more contemporary. A mathematical model of cochlea explains the observed ringing (technically called "coda"). Also, predicts that the particulars of the ringing strongly depend on the irregularities in a particular individual's cochlea: "Depending on the individual cochlea, the temporal gap between the primary impulse and the following coda ranges from once to thrice the group delay of the primary impulse (the group delay of the primary impulse is on the order of a few hundred microseconds). The coda is physiologically vulnerable, disappearing when the cochlea is compromised even slightly."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5449481/
Published two years ago. Explains what brain does with the close in time auditory nerve fibers spikes, including those produced by the IHCs responsible for ringing at multiple frequencies after a pulse's mechanical momentum entered the cochlea. Yeah, this is about the infamous Octopus Cells.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
page 7412.
The paper you linked in the post I replied to starts on page 6151 and ends at 6156
As far as I can tell there are no mentions of microsecond long integration times in that paper.
And even the part you quoted now, what does a 1-5ms (ms as in mili, not micro) transmission delay has to do with the microsecond integration you calimed before?

So now you are saying that paper was irrelevant and linking to yet another paper that studies something else?
Am I supposed to play an endless game of cat and mouse with you where you keep switching papers?
Am i supposed to now ignore your previous claim on microsecond integration and chase the new set of claims?

The papers you are quoting now seem to say that when short pulses are heard, there is a delay and ringing.
Which is exactly what I showed you that happens in a graph when you capture a very short pulse at a low sampling rate.
It doesn't dissappear.

Here it is again in case you forgot:
54o3riZ.png

As you can see the short pulse is captured at 48KHz, and there is ringing in the 48KHz signal, and it doesn't follow the pulse's outline.
Just like your new set of papers say.
In reality there will also be a delay, since the response to the pulse cant start before the pulse even started.
So it will becaptured, it will be reproduced, and according to the papers you quoted now, the 48Khz signal looks like what the ear was going to hear anyway.

If anyting the papers you quoted seem to suggest there is no point in capturing a perfect pulse shape, since the ear is going to hear a distorted version of it, which is in the milisecond, rathen than microsecond range.
The lowest number you quoted so far was (edit: in regards to the ears response, rather than the stimuli) "few hundred microseconds" which would be in the ~3Khz range which 48khz sampling can capture perfectly.
And I will also remind you that the time resolution of 16/44 is in the nano-second range.

It seems you think that quoting random scientific papers at people is going to convince them of unrelated claims.
I am done with this, its pointless.
 
Last edited:

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,359
Likes
24,661
Location
Alfred, NY
It seems you think that quoting random scientific papers at people is going to convince them of unrelated claims.
I am done with this, its pointless.

"A fanatic is one who redoubles his efforts when he has forgotten his aim." -George Santayana.
 

LuckyLuke575

Senior Member
Forum Donor
Joined
May 19, 2019
Messages
357
Likes
315
Location
Germany

Anyone here a statisticians capable of doing some probability work on whether he truly believes in what he's saying, or just lip service for his product offering?
Being a Tidal Hifi subscriber, I like the concept and high res music that stems from it. But I couldn't really conclude anything from what the guy was saying.

I'd like to think that there's some set of standards and quality control that allows music to be "MQA" like the Tidal Masters, that's then streamed at 96 kHz / 24 bit, or am I missing the point here?
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
This is the closest thing i found to suggesting that ultrasonics are perceived by humans:
https://en.wikipedia.org/wiki/Hypersonic_effect
The general idea is that while we cannot consciously hear ultrasonics, we might be subconsciously aware of them (not necessarily via hearing) .
But it is overall inconclusive since some studies have contradicted the initial results.
 

LTig

Master Contributor
Forum Donor
Joined
Feb 27, 2019
Messages
5,759
Likes
9,433
Location
Europe
Depends on the amplitude. The 25 KHz wave may be still subsonic, whereas the 1MHz one could require a transducer cone to move faster than the speed of sound, generating a shockwave.

Regular sound wave may go up to 194 dB SPL: very loud and ear-damaging, but that's it. A strong enough shockwave topples buildings. Or the main character in "Back to the future" :)
I do really hope that I'm never ever opposed to an audio play back chain which is able to kill my hearing or even me on the spot if something goes wrong like having the volume by accident set to 100%.
 
Top Bottom