• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MQA creator Bob Stuart answers questions.

Sal1950

Grand Contributor
The Chicago Crusher
Forum Donor
Joined
Mar 1, 2016
Messages
14,072
Likes
16,605
Location
Central Fl
Conspiracy theorists of the world unite, someone out there is out to get you. :D
Just because I'm paranoid doesn't mean no one is following me. ;)
 

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
3,682
Likes
2,961
The last issue on high resolution audio was in 2004, May's was an update.
May 2014 - including papers on 1-bit audio and MLP among others. IIRC there were discussions going on around that time on the audibility of watermarking that most publishers were including on high resolution formats. Bob Stuart agreed with someone's argument that watermarks had to be audible in order to meet their goal of remaining identifiable when used with a notionally perfect lossy codec as anything inaudible would be discarded. I wonder if the original forum discussion still exists.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant.

In LTI, we care about durations, frequencies, sampling rates, and amplitudes in the time and frequency domains. In MHS, we also have to care about onset times, recuperation periods, levels of perceived loudness, inter-frequencies masking etc. "Four sounds a little louder than One" is not what LTI predicts, yet it makes perfect sense in the MHS framework.

The experiment illustrates at least two things:

(1) In MHS, perceived loudness depends not only on amplitude, but also on duration. This is a robust effect, linked to the hearing system's "slow" integrator, operating over tens of milliseconds. There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.

(2) Some of you will be able to differentiate between One and Two, some not. Or between Two and Three. Virtually everyone will be able to differentiate between One and Four. And this is for the "slow" integrator, considered rather consistent! Individual differences in functioning of the fast integrator are more difficult to elicit experimentally, yet they do exist.

Qualitatively, the number of dimension LTI operates in is smaller than the number of MHS dimensions. If we hold constant the value(s) in one or more of MHS dimension(s), we take the dimension(s) out of play, and then MHS behavior follows the LTI-predicted behavior more closely.

That's the general reason why "simple" music, mostly consisting of a small number of sinusoids slowly changing their amplitudes and frequencies over time, is more readily amenable to LTI analysis. The effects of the perceptual integrators fade away. Onset times matter less.

The "complex" music, with large number of sinusoids exhibiting fast and frequent onsets and fadeouts, chirps, and transients, is not as amenable to LTI analysis. The integrators play an important role in this case. We better preserve the information about the onset times more accurately.
Lots of wheel spinning here. While what say is true enough, but it never gets anywhere. None of this indicates timing is inadequate the way things are done. The timing is like 100 times better than needed, imagine the improvement if it were 10,000 times better. Well I'm not hearing it.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
See @miero thread on signal generation with sox which starts with synthesis of 1kHz tone files. The sox website has downloads for Windows and MacOS so you don't need to be using linux. I think these should produce the signals asked for, although you may want to change the sample rate and depth, and the attenuation from full scale:
Thanks. But that misses the last part of my sentence. I like to see @Sergei run his listening tests and post his observation and files. Then we can get somewhere as opposed to a theoretical discussion, or dismissal of the results after the fact because the test files were not this way or that way.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Start with the irrelevant, end up with the repeated, ummm, misunderstanding. The perfect circle.

What is your explanation of the four-tone experiment?

I did read some of your writings available on the Web. Watched some of your video tutorials. About every twentieth sentence you wrote or uttered there was devoted to disparaging someone, including a professor who taught you the basics of audio science. Not enough information to construct a robust psychological profile, yet enough information for me to not take your insults personally.

However, we are treading across a serious terrain now. LTI and MHS do give different predictions of the severity of impact on the human hearing system caused by certain complex sounds containing significant number of sharp transients. A health issue. A liability issue. "Could be the asbestos of the 21st century" issue. About as appropriate subject for snide unsubstantiated remarks as the Holocaust IMHO.

If you deny the contemporary MHS approach, what is the alternative you propose?
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal. It is an open access paper so free to download. Modern Sampling: A Tutorial.
So it basically says you can use shorter reconstruction filters to get a result that is presumably at least equivalent to using sinc.
(Edit: that's actually nice since it can make better palyback HW cheaper)
But the total freq. response of pre-filter+reconstruction filter looks like a low pass to me. So it should have ringing artifacts as well.
They show that the reconstruction filter alone won't cause ringing, but what about the whole system?
Whats missing for me in this artice is a comparison between analog->sinc->sinc->out vs analog->pre-filter->beta-filter->out for various input signals, such as a square wave.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
What is your explanation of the four-tone experiment?

It's irrelevant to this issue.

I am unaware that I was taught audio science by a professor, much less that I disparaged this non-existent person.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
Let me see, regular filtering is bad, because it is imperfect which causes some level of aliasing. B-splines are also imperfect, and have very slow roll offs above FS, but that is okay because there is usually low energy at high frequencies for audio. Which would also mean the amount of aliasing (related to the strength of the signal) is low in the more normal conversion. Did we get anywhere with this? Oh and there are problems implementing these in practice so additional filters to flatten response will be needed. Oh, oh, while we are at it we should mention sampling rates might need to be 3 or 4 times higher or maybe since audio is somewhat self bandlimited just twice as much will do. To equal the normal Shannon rates mind you.

Now if this type of filtering has an advantage I don't see why someone can't produced ADC's and DAC's to use it. MQA was an attempt to do some of this and lock it in like Dolby has on video. It would be as if the first delta-sigma converters were patented and added to some encoding scheme so you had to have special licensing to use them. Say we call it SSD for Super Sampling Digital. MQA is trying to do something similar to get paid for this by including all the hidden stuff, lossy compression etc to promote authentication.

I would like to have seen a comparison of the error values for the normal method and modern methods. Again if its a better way it can be used. Wouldn't be too hard to have different filters which switch in and out depending upon whether it was traditional PCM or B-spline based reconstruction. But for this complication what is actually gained? Considering the error levels of 96/24 I'd think there is little to gain.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.
Can you link to a paper about the fast integrator?
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
Getting back to the theoretical basis for a moment: it's mathematically proven that the Whittaker-Shannon reconstruction formula perfectly reconstructs the analog wave that was digitally sampled, so long as that original analog wave was bandwidth limited below Nyquist.

So what is the argument here? That we're not actually using the Whittaker-Shannon formula to reconstruct the analog wave? It requires too much computation and read-ahead to be practical so instead, we're using techniques like delta-sigma. Thus they fall short of perfection? If that is the argument, then the degree to which they fall short can be quantified.

PS: or is the argument on the encoding side: that applying the AA bandwidth filter during encoding, distorts the signal in some way?
 
Last edited:

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
3,682
Likes
2,961
Thanks. But that misses the last part of my sentence. I like to see @Sergei run his listening tests and post his observation and files. Then we can get somewhere as opposed to a theoretical discussion, or dismissal of the results after the fact because the test files were not this way or that way.
Fair point. With my devil's advicate hat on I'd argue telling you what you're supposed to hear before you hear it would affect what you hear. Is there a way to do a sealed post with the "here's what you should have heard and why" part that can only be opened some time later, or do we just have to trust people not to open spoiler tags in this sort of situation?

Having said that I'm missing how this specific test is relevant. It's an interesting demonstration of a phenomenon I didn't know about, but it's something that can be captured and reproduced by the existing recording/playback chain.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
So what is the argument here?
That we can do better.
Here's an example: "1sec square wave at 0db"
That took 23 bytes and contains more data than any wav file ever can at any sample rate, since a square wave has unlimited BW.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
... Here's an example: "1sec square wave at 0db"
That took 23 bytes and contains more data than any wav file ever can at any sample rate, since a square wave has unlimited BW.
A perfect square wave is a mathematical construct that doesn't exist in nature, let alone music. Every square-wave-like sound that actually exists, is bandwidth limited. And our perception is also bandwidth limited.

... we can do better. ...
I agree. 44-16 isn't quite fully transparent to all humans. But evidence suggests it doesn't take much more for the digital encoding & reconstruction to be fully transparent. I am all in favor of a higher standard, say 64-24 or whatever it would take to be fully transparent with a reasonable safety margin.

However, once digital encoding & reconstruction is fully transparent, we're not nearly done. Other aspects of the recording process are even less transparent than 44-16, for example the limitations of microphones, placement, room effects, among other things.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
A perfect square wave is a mathematical construct that doesn't exist in nature, let alone music. Every square-wave-like sound that actually exists, is bandwidth limited. And our perception is also bandwidth limited.
Electronic music can contain perfect square waves since its synthesized
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
Electronic music can contain perfect square waves since its synthesized
Actually, it can't because electronics with infinite bandwidth don't exist. Also, an actual sound is made from changing air pressure. And the mathematical derivative of a perfect square wave is undefined at its transition point. That means a perfect square wave, to propagate as sound in the air, would require an infinite rate of change in air pressure, which is not physically possible.

Electronic music can construct square-like waves using wider bandwidth than we can hear. Call that SSBSWLS (supersonic bandwidth square-wave like sounds). But we can't hear the difference between a SSBSWLS and a square-like wave constructed from bandwidth we can hear.

Put differently: construct square wave (A) using 1 MHz bandwidth. Construct square wave (B) using 25 kHz bandwidth. All else equal: frequency, amplitude, phase. We humans can't hear the difference between A and B. At least, I've never seen evidence suggesting this.
 
Last edited:

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
No it can't, because you can't encode for two different voltages at the same time point.
The reason for this is that a square wave moves from one amplitude to another in no time at all. It is an infinitely short period of time between state A and state B. Because sampling works by giving you one sample per time interval, the best you can do is approximate it. It doesn't happen in nature either for obvious reasons. (And no, we are not dealing with quantum effects here haha)
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
Every sound that actually propagates in air (or water or any other medium) and we can hear, is bandwidth limited. And these bandwidth limited waves can be digitally encoded and reconstructed with mathematical perfection, so long as we sample them at more than twice the highest frequency we want to capture.

Digital audio isn't perfect, but its limitations are not theoretical. They are about the bit rates used, and the algorithms used. We're not using quite high enough bit rate to be fully transparent to all humans, and we're not using the mathematically perfect reconstruction algorithm. However, using higher bit rates and depths can account for both of these limitations.
 

Costia

Member
Joined
Jun 8, 2019
Messages
37
Likes
21
I don't think there is a point in actually doing it for practical reasons.
Point was, we can do better than shanon/nyquist.
It could be an interesting academic paper I guess.
 
Top Bottom