• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MQA creator Bob Stuart answers questions.

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
...then it was already accomplished for decades.
Is there really no more work to be done? No need to examine the foundations one more time?
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,514
Likes
25,357
Location
Alfred, NY
Is there really no more work to be done? No need to examine the foundations one more time?

In that respect, nope, it's a solved problem. People with commercial interests are trying hard to sow FUD, but doing audibly transparent digital channels is now routine. And further work will be on things like DRM, cost reductions, compression, and the like.

I remember when I took classical thermodynamics, one of the first things the prof said was, "This is a closed field, there's nothing further in the fundamentals to be done." I was astonished, but by the time I was teaching classical thermo, I realized he was right.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
I can't hear digits.

What kind of tweeter do we need?
I can't hear digits.

What kind of tweeter do we need?

I believe you may have it already.

Drawing an analogy with a car: It is not only about maximum speed (the maximum frequency a tweeter can reproduce). As long as it is capable of driving at the maximum speed allowed across the territory where you are going to use the car (+ some fun margin), you should be fine. So, a range up to 20 KHz or moderately over shall be fine.

It is more about sufficient acceleration (does it start reproducing a reasonably accurate 20 KHz waveform starting from the very fist half-period). Electrostatics tend to be able to do that. A tweeter formally rated up to 40 KHz might not. If the tweeter uses moving coil: the ones with strong magnet, rigid yet light and non-resonating dome, and proper damping, tend to fit the bill.
 
Last edited:

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
How and why did you associate this experiment with high resolution audio? If presented with this I wouldn't have made the connection!

You're saying that if a very short pulse of energy is enough to destroy some mechanical sensor hardware then that proves that the sensor and the processor it is attached to would have registered the pulse at lower levels?

A valid question indeed. To me, an important part was the aftermath of the outer hair cells (OTC) destruction: they were left on their sides, tilted inward. Knowing how the inner hair cells (IHC) work, it is clear to me that the type of force that destroyed the OTC opens the IHC ion channels, resulting in sound sensation.

Basilar membrane, OHC, IHC are all parts of micro-mechanical system with reasonably well-understood properties. You scale up and down the amplitude of the incoming wave, look at the equations - there is still transfer of energy, it still has to go to the OHC and IHC responsible for high frequencies.

Theoretically, it could be that the energy used in this experiment would be too high, and the equations would no longer work. In this case, the energy could go to places other than the OHC and IHC responsible for high frequencies. But it doesn't go elsewhere, according to the experiment: OHC and IHS responsible for lower frequencies weren't damaged.

Indeed, a single pulse might not be heard, even if its amplitude is high enough to open the IHC ion channels. Enough ions need to be transferred to reach the threshold. So, for soft pulses we'd need several in a row before we hear anything.

As to the duration of shortest single pulse which can be still heard in the painless range of human hearing, the lowest I've seen mentioned in research papers is 10 microseconds (https://asa.scitation.org/doi/abs/10.1121/1.1912374).
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Re. the destructive pulse experiment: it seems like a fair point, sure enough. But do we know that the 'processor' isn't responding to multiple sensor outputs simultaneously? i.e. individual sensors may be stimulated (in the 'skirts' of their resonances - if that's how it works).

Is the brain guaranteed to register this as a short impulse at all, and not something different? Just as a narrow spectral spike of yellow is seen by the eye as indistinguishable from simultaneous spikes at red and green, is a short high amplitude impulse 'seen' by the ear as something else comprised of lower frequencies?

Maybe we should create a new sampling 'paradigm' that doesn't attempt to reproduce these high amplitude ultrasonic signals faithfully, but instead substitutes what the ear would need in order to reproduce their effects (just as a video recording doesn't reproduce a narrow spectral spike at yellow, but substitutes red and green). Or am I straying into the 'voodoo' area of audio?

High amplitude X Rays would affect vision at some point I imagine. To simulate it, we could add the blue haze, red spots and eventual fade to black to the video recording :).

You are on the right track. MP3 and other perceptual codecs using time-frequency transforms do not reproduce the signal faithfully, yet can be perceptually transparent under certain circumstances.

Transients are no different: for instance, we can use several pulses instead of one, or one instead of several - with proper amplitude scaling of course. For a given output signal level, you may even be able to replace some of the pulses with short sinusoids or bursts of band-limited noise.

By the way, replacing sounds of genuine percussion instruments with short bursts of band-limited noise is a standard pop-music trick. These translate better to low-spec gear. All of us have heard those pss-pss-pss, used instead of cymbal sound.

Yet calculating such equivalent replacement for perceptually faithful reproduction of physical transient-rich instruments could be not so easy algorithmically, similarly to MP3 codec not being a simple program. Perhaps MQA is trying to do something like that?

By the way, transients do not have to be ultrasonic: they are not even sinusoids, so the notion of period doesn't apply to them. They only need to be captured and reproduced with what some people consider "ultrasonic" precision.

Once you think about two frequency limits as two different things, the limit for the accurate time domain reproduction doesn't seem so ultrasonic anymore - it doesn't actually require reproduction of ultrasonic sinusoids.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
I see a risetime of about 50ns. This belongs to a frequency of about 5MHz... ?
I can see how that could be heard, transmitted through air, be recorded faithfully, be reproduced faithfully with an SPL of 120dB and is relevant to audio reproduction.
Even if the sharp rise would be smoothed and one sinewave half would encompass 0.5ms this would mean a frequency of 1MHz (that is 1000kHz).

See no reason to torture any animals for this kind of research... it probably has a reason/function.

Fortunately, we are not rats. Creating a hi-fi system for an animal able to hear up to 100 KHz would be quite a challenge.

You are on the right track regarding the sine wave half. We want to be able to start it with a precision of about 5 microseconds, and see it rise to the amplitude we want in another 5 microseconds.

As to the poor animals, at least this rat lived, and partially recovered her hearing a month after. Some of them are not so lucky.

The paper referred to a previous experiment, done by a different group, who would put a rat under a metal dome and explode a "rat grenade" there. They reported poor reproducibility. I wonder what they meant ...
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,798
Likes
37,713
A valid question indeed. To me, an important part was the aftermath of the outer hair cells (OTC) destruction: they were left on their sides, tilted inward. Knowing how the inner hair cells (IHC) work, it is clear to me that the type of force that destroyed the OTC opens the IHC ion channels, resulting in sound sensation.

Basilar membrane, OHC, IHC are all parts of micro-mechanical system with reasonably well-understood properties. You scale up and down the amplitude of the incoming wave, look at the equations - there is still transfer of energy, it still has to go to the OHC and IHC responsible for high frequencies.

Theoretically, it could be that the energy used in this experiment would be too high, and the equations would no longer work. In this case, the energy could go to places other than the OHC and IHC responsible for high frequencies. But it doesn't go elsewhere, according to the experiment: OHC and IHS responsible for lower frequencies weren't damaged.

Indeed, a single pulse might not be heard, even if its amplitude is high enough to open the IHC ion channels. Enough ions need to be transferred to reach the threshold. So, for soft pulses we'd need several in a row before we hear anything.

As to the duration of shortest single pulse which can be still heard in the painless range of human hearing, the lowest I've seen mentioned in research papers is 10 microseconds (https://asa.scitation.org/doi/abs/10.1121/1.1912374).
Paper isn't accessible to me. So what were the nature and frequency of the low pass filters in your citation?
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,723
Likes
241,657
Location
Seattle Area
You are on the right track. MP3 and other perceptual codecs using time-frequency transforms do not reproduce the signal faithfully, yet can be perceptually transparent under certain circumstances.
The reason they don't reproduce things faithfully is because of the lossy portion (quantization) of the codec, not because it is transform based. Since you are a fan of wiki, here it is: https://en.wikipedia.org/wiki/Transform_coding

"Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, which then results in a lower quality copy of the original input (lossy compression). "​

We don't want to lose fidelity in the process of transformation before we get into the business of lossy compression.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,723
Likes
241,657
Location
Seattle Area
Yet calculating such equivalent replacement for perceptually faithful reproduction of physical transient-rich instruments could be not so easy algorithmically, similarly to MP3 codec not being a simple program. Perhaps MQA is trying to do something like that?
The perceptual model of a lossy codec is dynamic and very sophisticated compared to MQA. MQA makes assumption about audibility and then applies its algorithm. In some sense, a true perceptual codec would eliminate the need for MQA: we know that we can't hear above 20 khz so we truncate everything above that and we are done! No need to attempt to code ultrasonic content as MQA attempts to do!

It reminds me of a talk FHG (research entity in germany that brought us MP3) at an AES conference. They were proud of a lossy codec that went up to 96 kHz sampling. At the end of the talk a guy asked, "if you can't hear anything above 20 kHz, how on earth did you design the lossy algorithm for that portion?" He had no answer of course.

Sometime later we designed a lossy codec for high resolution just as well (WMA Pro). Our answer was simple: just encode some of that is above 20 kHz but heavily quantize it.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Do you realize that 240dBSPL sound translates into 200 atmospheres of pressure at the ear? Do you really think the delicate ear structures will not get destroyed by this pressure, regardless of how short or long the impulse might be? I'd be surprised if the mouse didn't suffer from concussion. At 120dB, the pressure is only 0.0002 of an atmosphere, and we know that level of sound can be damaging to human hearing even with a short exposure.

The article says that they calibrated the blast so that it didn't damage the middle ear structures, yet damaged the most vulnerable structures in cochlea. There wasn't just one rat involved :(

"Do you really think" is not applicable here. This is a peer-reviewed article in one of the two most important general-science publications in the world.

People work very hard for decades to publish a single article in Nature. I recall only one scientist in the last decade who was caught publishing fabricated data there. Soon thereafter, he was no longer working as, or considered to be, a scientist.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,798
Likes
37,713
snip...

As to the duration of shortest single pulse which can be still heard in the painless range of human hearing, the lowest I've seen mentioned in research papers is 10 microseconds (https://asa.scitation.org/doi/abs/10.1121/1.1912374).

So the cited paper shows the frequency spectra difference of rectangular pulses makes them audible when 10 microseconds apart. When low pass filtered at 21,120 hz the differences were still audible at 10 microseconds apart. As the low pass filter is moved lower the gap has to become greater. No big surprise and doesn't seem to support this temporal idea in MQA. The conclusion of the paper:
In conclusion, the data collected in the present experiment demonstrate, that the auditory system is extremely sensitive to small spectral differences.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
@Sergei is caught in a naive trap. Bob Stuart himself says so: https://www.stereophile.com/content/mqa-questions-and-answers-tutorial-temporal-errors-audio :

Let's just cover what we don't mean when we are talking about blurring in the time domain. Too often people immediately assume we have fallen into the naïve trap that imagines the time-base of a single digital channel to be quantised at the sample rate. To quote Stanley Lipshitz (from [3]):

"One often misunderstood aspect of sampled-data systems is the question of their time resolution—can they resolve details that occur between samples, such as a time impulse or step? ... time resolution is in fact infinitely fine for signals band-limited in conformity with the sampling theorem, and is completely independent of precisely where the samples happen to fall with respect to the time waveform . . ."

Stanley suggests the time-base resolution is 'infinite.' Of course that is only true if we are actually asking the question: 'What is the limit of resolution of relative phase of a continuous sinewave below Fs/2 in a uniformly-sampled channel employing TPDF dither?' Even then our ability to prove the precision, like all measurements, depends on signal/noise ratio.

If there is no dither involved and the samples are quantised, there is an approximate estimate of time-base resolution which is possibly relevant to brief impulsive signals (which by definition do not benefit from dither).

Limit = 1/((Fs x Pi x 2((n-1))) where n is number of bits.

For 44.1kHz 16-bit data this resolves to 220ps, not to 22.7µs = (Fs–1).

Notice "a continuous sinewave" in the text above. Does the pulse that the poor rat encountered look like a continuous sinewave?

I believe this is related to what Bob Stuart refers to saying that [sometimes] brain needs to react now. If not sedated and taped to the lab table, do you think the rat, after the blast was delivered, would be standing there, wondering what was the pitch and phase of that sound?
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,726
Likes
10,425
Location
North-East
The article says that they calibrated the blast so that it didn't damage the middle ear structures, yet damaged the most vulnerable structures in cochlea. There wasn't just one rat involved :(

"Do you really think" is not applicable here. This is a peer-reviewed article in one of the two most important general-science publications in the world.

People work very hard for decades to publish a single article in Nature. I recall only one scientist in the last decade who was caught publishing fabricated data there. Soon thereafter, he was no longer working as, or considered to be, a scientist.

Trouble is you are interpreting a study designed to prove a completely different hypothesis to prove your own. It doesn't matter how much time the authors spent on the study or how well peer reviewed it is, it does not prove your point.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
The reason they don't reproduce things faithfully is because of the lossy portion (quantization) of the codec, not because it is transform based. Since you are a fan of wiki, here it is: https://en.wikipedia.org/wiki/Transform_coding

"Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically lossless (perfectly reversible) on its own but is used to enable better (more targeted) quantization, which then results in a lower quality copy of the original input (lossy compression). "​

We don't want to lose fidelity in the process of transformation before we get into the business of lossy compression.

I agree. That's a fact. They all can drop back to bit-perfect representation for specific frames.

Vorbis supports 192/24. So I have to concede: my statement which included "demonstrably can't capture transients" (of the order of 5 μs) was incorrect in regard to Vorbis. With the hybrid lossy/lossless frames, Vorbis is still technically a compressing codec, yet can capture the transients.

Hmm, when I tried to find an MQA vs Vorbis comparison on large set of music files (like sound quality vs size encoded), nothing came up. Have you seen something like that?
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,084
Likes
36,527
Location
The Neitherlands
You are on the right track regarding the sine wave half. We want to be able to start it with a precision of about 5 microseconds, and see it rise to the amplitude we want in another 5 microseconds.

You talk about samples every 5us (192kHz). That would not capture what the rat endured 0.0005us !
What's the relevance and why would you think it is audible? (you claimed this) Simply because the outer haircells were shot and thus must have produced 'a loud sound' ?
A single half sinewave of a 2.5MHz signal ?

What's it have to do with complex music signals ?
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,723
Likes
241,657
Location
Seattle Area
Hmm, when I tried to find an MQA vs Vorbis comparison on large set of music files (like sound quality vs size encoded), nothing came up. Have you seen something like that?
It is impossible to find any MQA comparisons, Vorbis or otherwise.
 

edechamps

Addicted to Fun and Learning
Forum Donor
Joined
Nov 21, 2018
Messages
910
Likes
3,621
Location
London, United Kingdom
A more comprehensive meta-study (http://www.aes.org/e-lib/browse.cfm?elib=18296), combining 400 participants in over 12,500 trials reveals that there is indeed statistically significant difference, yet it is small, 5% at most.
I do not consider meta-analysis to be generally reliable.
I think one should always be very wary of meta-studies.

Oh, come on.

I am quite clearly in the "high res audio is pointless" camp, and I hate to play devil's advocate, but this is ridiculous. Meta-analysis is widely considered to be the best possible level of evidence in scientific research, and is routinely used to make important decisions in fields like medicine, where the consequences are literally life and death. This is not the kind of evidence that you can simply dismiss out of hand, especially on a website that is literally titled "Audio Science Review". Most people who try to defend high res audio (and other fringe audiophile topics) on this forum do so with extremely poor arguments (or even no arguments at all) and get laughed out of the room, and rightly so. This is not one of these cases. When you're presented with evidence like this, it should at least make you pause. Otherwise, you've left the field of logical, reasonable scientific enquiry and you've entered the realm of ideology. You've left the land of scepticism and you've entered the land of stubbornness.

If you want to criticize this study, then at least try to come up with specific concerns with the specific methods used in the study, instead of peddling "meta-studies meh" statements. That's exactly the kind of statement a science-denier subjectivist would make!

The study does show that people can, in fact, distinguish high res audio from standard resolution, and a number of the underlying studies report such a result even on real-world content (not just artificial test signals), which is fairly impressive. What the study also shows, however, if that people certainly can't reliably make this distinction. For untrained listeners, the lower bound on the confidence interval is less than 50% success rate, which pretty much means untrained listeners can't tell the difference. Trained listeners fared better - [57.5, 66.9] confidence interval with a mean of 62.2%. This still means that even trained listeners have a really hard time - it's a really subtle difference, and keep in mind the stimuli used were probably hand-picked for this task.

This is actually not a bad result for the "high res audio is pointless camp": the study shows that the effect is so small and so hard to detect that there's really not much point in spending time (or money) on it, especially considering that the differences we're talking about are dwarfed to a huge extent by differences between, say, loudspeakers, headphones or room acoustics. In fact, it's so subtle that I don't even feel bad when I keep saying "high res audio doesn't make a difference", because that's consistent with the results of that study in the vast majority of scenarios (just not absolutely all of them). I don't understand why people want to spend that much time and energy researching this topic to death, as opposed to, say, discussing how to improve loudspeakers. These are really messed up priorities. It feels like the debate around high res audio is being boosted by economic interests: it's trivial to make and sell (at a premium) high res audio hardware and content, but it's much harder to make a better speaker.

(As a side note… while @Sergei did post the highly relevant story that I've just discussed, don't take that to mean I support most, or even some, of what he said in this thread. In my opinion, the evidence @Sergei presented so far is mostly irrelevant. My personal favourite is the study involving laser pulses(?) and brain response(??) in rats(???) for the purposes of assessing the effects of blast pressure(????) from IEDs that soldiers in Iraq are suggested to(?????). Claiming that this is in any way relevant to the perception of high-res real-world audio content by humans is so far-fetched it's downright laughable.)

Technically, to be 'science' there has to be a hypothesis about why that should be otherwise it can't be taken any further. If people really were able to tell the difference, what would be the mechanism by which they were doing it?

Again, I feel compelled to play devil's advocate here… just because we don't understand why we obtained a given result does not mean it's not "science". Quite the opposite, in fact: lots of scientific discoveries were made because someone noticed an odd result that we have no explanation for. (Probably one of the most famous is the anomalous procession of Mercury, among countless others.) In fact, I would even go so far as to say that unexplained results is what keeps science going - confirming existing models is great, but it gets boring fast.

Science is less about "why" than about discovering facts from which we can make predictions. Besides, before you can try to determine what is causing a given outcome, you need to make sure the outcome you're investigating is real and not just a red herring. If we discarded results because we don't know "why" they happen, I'm pretty sure you could throw the entire field of quantum physics out the window, for example.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,514
Likes
25,357
Location
Alfred, NY
Meta-analysis is widely considered to be the best possible level of evidence in scientific research, and is routinely used to make important decisions in fields like medicine, where the consequences are literally life and death.

It's most often "widely considered to be the best possible level of evidence" in the sorts of research that generates headlines about lousy replicability. Sadly, this is not just the case in social "sciences", but in medicine and epidemiology. Now in (weak) defense of it, it has the most utility in those areas because that's where controls are the weakest or non-existent due to reasons of practicality and ethics. In the areas of science where I've spent nearly all of my working career (physical and analytical chemistry), it's just not a factor because we can actually do rigorous controls, and it's only used when there's a political or legal agenda involved. This is very true in sensory science as well- rigorous controls are practical, so meta-analysis is only used when the outcomes are ambiguous, which even further degrades its reliability.

I was amused (in a sick way) that in one of the research areas I was in for a few years (endocrine receptor binding from plastics extracts), meta-analysis done with three different methods on the same research population gave three different outcomes, and coincidentally, those outcomes tracked very well with the goals of the respective funding sources for the meta-research. That may have given me a more cynical view of its utility.

edit: I agree with your response to Cosmik- we don't need to understand a mechanism in order to demonstrate the reality of a phenomenon.
 
Top Bottom