# MQA creator Bob Stuart answers questions.

#### amirm

Staff Member
CFO (Chief Fun Officer)
(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?
You have these files already? If so, please post them. Not everyone can create such and even if they do, it may not be what you are saying they should be.

#### nscrivener

##### Member
Good start! Please keep reading. You may approach understanding of what I am saying after you can explain the results of the following experiment, involving four sinusoids with the same amplitude, but of different durations. Make all the sinusoids extended with fade-in and fade-out, let's say of 5 ms, to exclude influence of transients.

(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?
I don't need to perform that test to understand what you are asserting.

You are saying that ringing on transients is perceived by the listener as the onset of that transient beginning at either the start of the ringing, the time of the actual pulse, or later, depending upon varying conditions. This, you say, is equivalent to a frequency dependent phase shift causing a loss of localisation information or 'blur'.

A predictive model of that perceptual effect can then be used to create a filter that reverses that.

Is that approximately what you are saying?

#### solderdude

##### Major Contributor
Will this experiment change how I perceive musical instruments and voices ?

Listening tests and perception test are good fun but in the end the recorded waveform needs to be reproduced correctly.
I would think all DAC's would be able to reproduce the different length 1kHz bursts correctly and all perceived differences would be coming from transducer differences and the way our brain processes the incoming soundwaves.
It does not change how we process music.

#### Blumlein 88

##### Major Contributor
Forum Donor
Good start! Please keep reading. You may approach understanding of what I am saying after you can explain the results of the following experiment, involving four sinusoids with the same amplitude, but of different durations. Make all the sinusoids extended with fade-in and fade-out, let's say of 5 ms, to exclude influence of transients.

(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?
Only did one and four. Well four is longer than one. Four sounds a little louder than one.

Why because somewhere in the 150 millisecond range is the time integrating duration for the ear. At somewhat longer durations the perceived loudness will become stable. I was aware of this effect, but had to look up the particulars. Here is the first place I found the info. It would be in most college level texts on psycho acoustics. Like B.C.J. Moore's text which I have a copy of though it wasn't handy.

It is assumed that the auditory system contains a temporal energy integrator, i.e. it performs a summation of the input signal. A simple way of estimating the relationship between thresholds and durations is to plot threshold against duration on a dB vs. logarithmic-time scale. Data will fall roughly on a straight line with a slope of -3dB per doubling of log duration. Letting J represent the integration time of the auditory system, several scientists have estimated its magnitude, - the estimated values lying in the region of 50-200 ms. Some researchers report that J is greater at low frequencies while others have found no frequency dependency. Another way of determining the integration ability of the auditory system is to present equal energy tone bursts of different durations. An ideal energy integration would imply that the detectability of these tone bursts would be independent of duration. According to Green, this is only the case in the region 15-150 ms outside of which the detectability will fall off. The fall off at long durations indicates that the integration operation is time delimited, while the fall off at very short durations might be a result of the spread of energy over the frequency range that occurs for short duration pulses. Other scientists have found similar results, but again there is some variation in the results, - some scientists reporting frequency dependency (low frequency, long duration and vice versa) and others do not. According to [4], and essentially also [1], the integration time is frequency dependent, about 60 ms up to 1000 Hz decreasing linearly to around 10 ms at 5 kHz. This is fairly consistent with an effective time constant (or integration time) for speech around 20 ms. According to Niese, speech intelligibility can be predicted using an integration of so-called useful energy in the range up to 17 ms (full weight) and a linearly decreasing weight factor in the range 17-30 ms.

You also could look at this powerpoint on the matter.
http://depts.washington.edu/sphsc461/temp_res/temp_res.pdf

Last edited:

#### somebodyelse

##### Addicted to Fun and Learning
You have these files already? If so, please post them. Not everyone can create such and even if they do, it may not be what you are saying they should be.
See @miero thread on signal generation with sox which starts with synthesis of 1kHz tone files. The sox website has downloads for Windows and MacOS so you don't need to be using linux. I think these should produce the signals asked for, although you may want to change the sample rate and depth, and the attenuation from full scale:
Code:
``````sox -V -r 48000 -n -b 16 -c 2 sin1k_060ms.wav synth 0.07 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_120ms.wav synth 0.13 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_240ms.wav synth 0.25 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_480ms.wav synth 0.49 sin 1000 vol -10dB fade 0.005 0``````

#### JohnPM

##### Technical Expert
Technical Expert
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal. It is an open access paper so free to download. Modern Sampling: A Tutorial.

#### mansr

##### Addicted to Fun and Learning
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal. It is an open access paper so free to download. Modern Sampling: A Tutorial.
While mathematically sound, that paper is still utterly ridiculous. The reality is that traditional ("Shannon") sampling, despite the unavoidable approximations, is capable of capturing a music signal with an accuracy well beyond the limitations of the analogue parts of our playback equipment (amps etc), never mind our ears. Until we acquire, whether through engineering or evolution, far better ears than those we currently possess, endeavours to improve on the core sampling process are simply pointless.

#### JohnPM

##### Technical Expert
Technical Expert
Seems rather harsh to call the paper ridiculous. It presents what seems to me a pretty impartial overview of the topic, with pros and cons for the approaches set out. Benefits might eventually come in lower cost and complexity silicon solutions, for example. The notion that existing methods are good enough has not stopped the search for improvements in other fields, why should it be a barrier to examining alternative methods of signal sampling and reconstruction?

#### mansr

##### Addicted to Fun and Learning
Seems rather harsh to call the paper ridiculous. It presents what seems to me a pretty impartial overview of the topic, with pros and cons for the approaches set out. Benefits might eventually come in lower cost and complexity silicon solutions, for example. The notion that existing methods are good enough has not stopped the search for improvements in other fields, why should it be a barrier to examining alternative methods of signal sampling and reconstruction?
It is ridiculous because the supposed problems with current sampling technique are blown way out of proportion. The signals we are recording do not, as the paper implies, extend to infinite frequencies. Actual musical instruments do not produce frequencies above 100 kHz (I'm being generous here), and if they did, the microphones normally used wouldn't pick them up anyway. This means the signal is inherently band-limited, and we can safely sample at 200 kHz or higher without any anti-aliasing filter whatsoever. The impossibility of constructing an actual sinc filter is thus irrelevant. If we so desire, the captured signal can be resampled digitally using filters of arbitrarily high (though not infinite) precision. The "flaws" introduced through imperfections in the sampling process itself are so small as to be undetectable in the presence of analogue noise and distortion.

Sampling methods such as the one discussed do have benefits for certain classes of signals. Audio just isn't one of them.

#### JohnPM

##### Technical Expert
Technical Expert
The signals we are recording do not, as the paper implies, extend to infinite frequencies.
I didn't see any such claim in the paper. Are you referring to the line about a continuous signal being equivalent to infinite sampling rate? That is simply a statement of mathematical fact. The paper has this to say about sample rates:

For audio a sampling rate that is high enough to capture the naturally band limited spectrum of approximately 90 kHz, as discussed in [24], should be sufficient.

The paper is simply a tutorial. It isn't calling on folk to pick up their pitchforks and throw devices using Shannon sampling theory on a fire .

#### solderdude

##### Major Contributor
The paper is simply a tutorial. It isn't calling on folk to pick up their pitchforks and throw devices using Shannon sampling theory on a fire .
That's what 2 or 3 of the members (not you nor most others) are doing here though.
It is, however, quite common for audiophiles to question anything related to digital as nothing is good enough for them, while vinyl, tape and tubes + transformers somehow is.
These folks will probably even doubt that 192/24 is enough.

Nothing wrong with enjoying vinyl, tape and tubes though.

#### dc655321

##### Addicted to Fun and Learning
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal.

Are these journals typically this thematic?

One thing I found conspicuously absent from the paper was a Section 3.3 "How are problems with Nyquist-Shannon sampling typically addressed (with real-world examples)?". For a endorsement tutorial, the absence of quantifying the nature of the "problems" seems suspect (to me)...

#### mansr

##### Addicted to Fun and Learning
I didn't see any such claim in the paper. Are you referring to the line about a continuous signal being equivalent to infinite sampling rate?
I said it was implied. For example, there is this statement: «In fact, unless the stop band achieves infinite attenuation at infinite frequency, there will always be some alias components present even if the sampling frequency is increased to infinity.» The attenuation at infinite frequency is only relevant if the input signal has any actual content there. By stating this as a problem, it is implied that such content exists. In actual fact, it does not.

The paper has this to say about sample rates:

For audio a sampling rate that is high enough to capture the naturally band limited spectrum of approximately 90 kHz, as discussed in [24], should be sufficient.
That sentence is puzzling in that it pretty much contradicts the entire premise of the remainder of the paper, i.e. that there is a problem with sampling as currently implemented.

And then there's this: «Fundamentally, sampling methods based on brick-wall low-pass filtering, as proposed by Shannon and others [1–4], can never be perfect!» While true, this is also a red herring. No digital system can ever be perfect due to the quantisation error. As such, we are forced to choose various parameters such that the resulting performance is good enough, even if it falls short of theoretical perfection. The paper fails to make a coherent, let alone compelling, argument for the notion that current digital audio systems are not good enough. The proposed alternative is thus nothing but a solution in search of a problem.

#### SIY

##### Technical Expert
Technical Expert
The proposed alternative is thus nothing but a solution in search of a problem.
Not singling this paper out, but I'm reminded of something I heard when I was still in academia, working on some issues related to organic semiconductors. We were being interviewed by a prominent science writer who (logically) asked, "So what actual use is polyacetylene?" The answer was, "Creating PhDs."

#### JohnPM

##### Technical Expert
Technical Expert
Are these journals typically this thematic?
Per the guest editor's note, it was a "Special Issue on High-Resolution Audio".

One thing I found conspicuously absent from the paper was a Section 3.3 "How are problems with Nyquist-Shannon sampling typically addressed (with real-world examples)?". For a endorsement tutorial, the absence of quantifying the nature of the "problems" seems suspect (to me)...
Conspiracy theorists of the world unite, someone out there is out to get you.

#### somebodyelse

##### Addicted to Fun and Learning
Per the guest editor's note, it was a "Special Issue on High-Resolution Audio".
That part's not open access. Is there anything else in there that would give non-members some context for this special issue?

#### dc655321

##### Addicted to Fun and Learning
Conspiracy theorists of the world unite, someone out there is out to get you.
Not at all. Or, at least if there is some conspiring, it's pretty clumsy

I am curious who the target audiences are for both the B. Stuart "Hierarchy" paper and the paper listed above.
It cannot seriously be other scientists/engineers, can it?

#### Sergei

##### Senior Member
Forum Donor
Only did one and four. Well four is longer than one. Four sounds a little louder than one.

Why because somewhere in the 150 millisecond range is the time integrating duration for the ear. At somewhat longer durations the perceived loudness will become stable. I was aware of this effect, but had to look up the particulars. Here is the first place I found the info. It would be in most college level texts on psycho acoustics. Like B.C.J. Moore's text which I have a copy of though it wasn't handy.

It is assumed that the auditory system contains a temporal energy integrator, i.e. it performs a summation of the input signal. A simple way of estimating the relationship between thresholds and durations is to plot threshold against duration on a dB vs. logarithmic-time scale. Data will fall roughly on a straight line with a slope of -3dB per doubling of log duration. Letting J represent the integration time of the auditory system, several scientists have estimated its magnitude, - the estimated values lying in the region of 50-200 ms. Some researchers report that J is greater at low frequencies while others have found no frequency dependency. Another way of determining the integration ability of the auditory system is to present equal energy tone bursts of different durations. An ideal energy integration would imply that the detectability of these tone bursts would be independent of duration. According to Green, this is only the case in the region 15-150 ms outside of which the detectability will fall off. The fall off at long durations indicates that the integration operation is time delimited, while the fall off at very short durations might be a result of the spread of energy over the frequency range that occurs for short duration pulses. Other scientists have found similar results, but again there is some variation in the results, - some scientists reporting frequency dependency (low frequency, long duration and vice versa) and others do not. According to [4], and essentially also [1], the integration time is frequency dependent, about 60 ms up to 1000 Hz decreasing linearly to around 10 ms at 5 kHz. This is fairly consistent with an effective time constant (or integration time) for speech around 20 ms. According to Niese, speech intelligibility can be predicted using an integration of so-called useful energy in the range up to 17 ms (full weight) and a linearly decreasing weight factor in the range 17-30 ms.

You also could look at this powerpoint on the matter.
http://depts.washington.edu/sphsc461/temp_res/temp_res.pdf
Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant.

In LTI, we care about durations, frequencies, sampling rates, and amplitudes in the time and frequency domains. In MHS, we also have to care about onset times, recuperation periods, levels of perceived loudness, inter-frequencies masking etc. "Four sounds a little louder than One" is not what LTI predicts, yet it makes perfect sense in the MHS framework.

The experiment illustrates at least two things:

(1) In MHS, perceived loudness depends not only on amplitude, but also on duration. This is a robust effect, linked to the hearing system's "slow" integrator, operating over tens of milliseconds. There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.

(2) Some of you will be able to differentiate between One and Two, some not. Or between Two and Three. Virtually everyone will be able to differentiate between One and Four. And this is for the "slow" integrator, considered rather consistent! Individual differences in functioning of the fast integrator are more difficult to elicit experimentally, yet they do exist.

Qualitatively, the number of dimension LTI operates in is smaller than the number of MHS dimensions. If we hold constant the value(s) in one or more of MHS dimension(s), we take the dimension(s) out of play, and then MHS behavior follows the LTI-predicted behavior more closely.

That's the general reason why "simple" music, mostly consisting of a small number of sinusoids slowly changing their amplitudes and frequencies over time, is more readily amenable to LTI analysis. The effects of the perceptual integrators fade away. Onset times matter less.

The "complex" music, with large number of sinusoids exhibiting fast and frequent onsets and fadeouts, chirps, and transients, is not as amenable to LTI analysis. The integrators play an important role in this case. We better preserve the information about the onset times more accurately.

#### SIY

##### Technical Expert
Technical Expert
Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant...
... We better preserve the information about the onset times more accurately.
Start with the irrelevant, end up with the repeated, ummm, misunderstanding. The perfect circle.