• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MQA creator Bob Stuart answers questions.

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,384
Location
Seattle Area
(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?
You have these files already? If so, please post them. Not everyone can create such and even if they do, it may not be what you are saying they should be.

What are your answers by the way?
 

nscrivener

Member
Joined
May 6, 2019
Messages
76
Likes
117
Location
Hamilton, New Zealand
Good start! Please keep reading. You may approach understanding of what I am saying after you can explain the results of the following experiment, involving four sinusoids with the same amplitude, but of different durations. Make all the sinusoids extended with fade-in and fade-out, let's say of 5 ms, to exclude influence of transients.

(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?

I don't need to perform that test to understand what you are asserting.

You are saying that ringing on transients is perceived by the listener as the onset of that transient beginning at either the start of the ringing, the time of the actual pulse, or later, depending upon varying conditions. This, you say, is equivalent to a frequency dependent phase shift causing a loss of localisation information or 'blur'.

A predictive model of that perceptual effect can then be used to create a filter that reverses that.

Is that approximately what you are saying?
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
15,891
Likes
35,912
Location
The Neitherlands
Will this experiment change how I perceive musical instruments and voices ?

Listening tests and perception test are good fun but in the end the recorded waveform needs to be reproduced correctly.
I would think all DAC's would be able to reproduce the different length 1kHz bursts correctly and all perceived differences would be coming from transducer differences and the way our brain processes the incoming soundwaves.
It does not change how we process music.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,521
Likes
37,050
Good start! Please keep reading. You may approach understanding of what I am saying after you can explain the results of the following experiment, involving four sinusoids with the same amplitude, but of different durations. Make all the sinusoids extended with fade-in and fade-out, let's say of 5 ms, to exclude influence of transients.

(1) Listen to a segment of 1 KHz sinusoid, with duration of 60 ms.

(2) Listen to a segment of 1 KHz sinusoid, with duration of 120 ms.

(3) Listen to a segment of 1 KHz sinusoid, with duration of 240 ms.

(4) Listen to a segment of 1 KHz sinusoid, with duration of 480 ms.

--- Do (1) and (2) sound like having the same loudness? Yes or No? Why?
--- How about (2) vs (3)?
--- How about (3) vs (4)?
--- Finally, how about (1) vs (4)? Can you explain what you hear?
Only did one and four. Well four is longer than one. Four sounds a little louder than one.

Why because somewhere in the 150 millisecond range is the time integrating duration for the ear. At somewhat longer durations the perceived loudness will become stable. I was aware of this effect, but had to look up the particulars. Here is the first place I found the info. It would be in most college level texts on psycho acoustics. Like B.C.J. Moore's text which I have a copy of though it wasn't handy.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.3611&rep=rep1&type=pdf

It is assumed that the auditory system contains a temporal energy integrator, i.e. it performs a summation of the input signal. A simple way of estimating the relationship between thresholds and durations is to plot threshold against duration on a dB vs. logarithmic-time scale. Data will fall roughly on a straight line with a slope of -3dB per doubling of log duration. Letting J represent the integration time of the auditory system, several scientists have estimated its magnitude, - the estimated values lying in the region of 50-200 ms. Some researchers report that J is greater at low frequencies while others have found no frequency dependency. Another way of determining the integration ability of the auditory system is to present equal energy tone bursts of different durations. An ideal energy integration would imply that the detectability of these tone bursts would be independent of duration. According to Green, this is only the case in the region 15-150 ms outside of which the detectability will fall off. The fall off at long durations indicates that the integration operation is time delimited, while the fall off at very short durations might be a result of the spread of energy over the frequency range that occurs for short duration pulses. Other scientists have found similar results, but again there is some variation in the results, - some scientists reporting frequency dependency (low frequency, long duration and vice versa) and others do not. According to [4], and essentially also [1], the integration time is frequency dependent, about 60 ms up to 1000 Hz decreasing linearly to around 10 ms at 5 kHz. This is fairly consistent with an effective time constant (or integration time) for speech around 20 ms. According to Niese, speech intelligibility can be predicted using an integration of so-called useful energy in the range up to 17 ms (full weight) and a linearly decreasing weight factor in the range 17-30 ms.


You also could look at this powerpoint on the matter.
http://depts.washington.edu/sphsc461/temp_res/temp_res.pdf
 
Last edited:

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
3,682
Likes
2,959
You have these files already? If so, please post them. Not everyone can create such and even if they do, it may not be what you are saying they should be.
See @miero thread on signal generation with sox which starts with synthesis of 1kHz tone files. The sox website has downloads for Windows and MacOS so you don't need to be using linux. I think these should produce the signals asked for, although you may want to change the sample rate and depth, and the attenuation from full scale:
Code:
sox -V -r 48000 -n -b 16 -c 2 sin1k_060ms.wav synth 0.07 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_120ms.wav synth 0.13 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_240ms.wav synth 0.25 sin 1000 vol -10dB fade 0.005 0
sox -V -r 48000 -n -b 16 -c 2 sin1k_480ms.wav synth 0.49 sin 1000 vol -10dB fade 0.005 0
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal. It is an open access paper so free to download. Modern Sampling: A Tutorial.
While mathematically sound, that paper is still utterly ridiculous. The reality is that traditional ("Shannon") sampling, despite the unavoidable approximations, is capable of capturing a music signal with an accuracy well beyond the limitations of the analogue parts of our playback equipment (amps etc), never mind our ears. Until we acquire, whether through engineering or evolution, far better ears than those we currently possess, endeavours to improve on the core sampling process are simply pointless.
 

JohnPM

Senior Member
Technical Expert
Joined
Apr 9, 2018
Messages
340
Likes
901
Location
UK
Seems rather harsh to call the paper ridiculous. It presents what seems to me a pretty impartial overview of the topic, with pros and cons for the approaches set out. Benefits might eventually come in lower cost and complexity silicon solutions, for example. The notion that existing methods are good enough has not stopped the search for improvements in other fields, why should it be a barrier to examining alternative methods of signal sampling and reconstruction?
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
Seems rather harsh to call the paper ridiculous. It presents what seems to me a pretty impartial overview of the topic, with pros and cons for the approaches set out. Benefits might eventually come in lower cost and complexity silicon solutions, for example. The notion that existing methods are good enough has not stopped the search for improvements in other fields, why should it be a barrier to examining alternative methods of signal sampling and reconstruction?
It is ridiculous because the supposed problems with current sampling technique are blown way out of proportion. The signals we are recording do not, as the paper implies, extend to infinite frequencies. Actual musical instruments do not produce frequencies above 100 kHz (I'm being generous here), and if they did, the microphones normally used wouldn't pick them up anyway. This means the signal is inherently band-limited, and we can safely sample at 200 kHz or higher without any anti-aliasing filter whatsoever. The impossibility of constructing an actual sinc filter is thus irrelevant. If we so desire, the captured signal can be resampled digitally using filters of arbitrarily high (though not infinite) precision. The "flaws" introduced through imperfections in the sampling process itself are so small as to be undetectable in the presence of analogue noise and distortion.

Sampling methods such as the one discussed do have benefits for certain classes of signals. Audio just isn't one of them.
 

JohnPM

Senior Member
Technical Expert
Joined
Apr 9, 2018
Messages
340
Likes
901
Location
UK
The signals we are recording do not, as the paper implies, extend to infinite frequencies.
I didn't see any such claim in the paper. Are you referring to the line about a continuous signal being equivalent to infinite sampling rate? That is simply a statement of mathematical fact. The paper has this to say about sample rates:

For audio a sampling rate that is high enough to capture the naturally band limited spectrum of approximately 90 kHz, as discussed in [24], should be sufficient.

The paper is simply a tutorial. It isn't calling on folk to pick up their pitchforks and throw devices using Shannon sampling theory on a fire :).
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
15,891
Likes
35,912
Location
The Neitherlands
The paper is simply a tutorial. It isn't calling on folk to pick up their pitchforks and throw devices using Shannon sampling theory on a fire :).

That's what 2 or 3 of the members (not you nor most others) are doing here though.
It is, however, quite common for audiophiles to question anything related to digital as nothing is good enough for them, while vinyl, tape and tubes + transformers somehow is.
These folks will probably even doubt that 192/24 is enough. :rolleyes:

Nothing wrong with enjoying vinyl, tape and tubes though.
 

dc655321

Major Contributor
Joined
Mar 4, 2018
Messages
1,597
Likes
2,235
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal.

Thanks for the link.

Are these journals typically this thematic?

One thing I found conspicuously absent from the paper was a Section 3.3 "How are problems with Nyquist-Shannon sampling typically addressed (with real-world examples)?". For a endorsement tutorial, the absence of quantifying the nature of the "problems" seems suspect (to me)...
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
I didn't see any such claim in the paper. Are you referring to the line about a continuous signal being equivalent to infinite sampling rate?
I said it was implied. For example, there is this statement: «In fact, unless the stop band achieves infinite attenuation at infinite frequency, there will always be some alias components present even if the sampling frequency is increased to infinity.» The attenuation at infinite frequency is only relevant if the input signal has any actual content there. By stating this as a problem, it is implied that such content exists. In actual fact, it does not.

The paper has this to say about sample rates:

For audio a sampling rate that is high enough to capture the naturally band limited spectrum of approximately 90 kHz, as discussed in [24], should be sufficient.
That sentence is puzzling in that it pretty much contradicts the entire premise of the remainder of the paper, i.e. that there is a problem with sampling as currently implemented.

And then there's this: «Fundamentally, sampling methods based on brick-wall low-pass filtering, as proposed by Shannon and others [1–4], can never be perfect!» While true, this is also a red herring. No digital system can ever be perfect due to the quantisation error. As such, we are forced to choose various parameters such that the resulting performance is good enough, even if it falls short of theoretical perfection. The paper fails to make a coherent, let alone compelling, argument for the notion that current digital audio systems are not good enough. The proposed alternative is thus nothing but a solution in search of a problem.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
The proposed alternative is thus nothing but a solution in search of a problem.

Not singling this paper out, but I'm reminded of something I heard when I was still in academia, working on some issues related to organic semiconductors. We were being interviewed by a prominent science writer who (logically) asked, "So what actual use is polyacetylene?" The answer was, "Creating PhDs."
 

JohnPM

Senior Member
Technical Expert
Joined
Apr 9, 2018
Messages
340
Likes
901
Location
UK
Are these journals typically this thematic?
Per the guest editor's note, it was a "Special Issue on High-Resolution Audio".

One thing I found conspicuously absent from the paper was a Section 3.3 "How are problems with Nyquist-Shannon sampling typically addressed (with real-world examples)?". For a endorsement tutorial, the absence of quantifying the nature of the "problems" seems suspect (to me)...
Conspiracy theorists of the world unite, someone out there is out to get you. :D
 

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
3,682
Likes
2,959
Per the guest editor's note, it was a "Special Issue on High-Resolution Audio".
That part's not open access. Is there anything else in there that would give non-members some context for this special issue?
 

dc655321

Major Contributor
Joined
Mar 4, 2018
Messages
1,597
Likes
2,235
Conspiracy theorists of the world unite, someone out there is out to get you. :D

Not at all. Or, at least if there is some conspiring, it's pretty clumsy :)

I am curious who the target audiences are for both the B. Stuart "Hierarchy" paper and the paper listed above.
It cannot seriously be other scientists/engineers, can it?
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Only did one and four. Well four is longer than one. Four sounds a little louder than one.

Why because somewhere in the 150 millisecond range is the time integrating duration for the ear. At somewhat longer durations the perceived loudness will become stable. I was aware of this effect, but had to look up the particulars. Here is the first place I found the info. It would be in most college level texts on psycho acoustics. Like B.C.J. Moore's text which I have a copy of though it wasn't handy.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.3611&rep=rep1&type=pdf

It is assumed that the auditory system contains a temporal energy integrator, i.e. it performs a summation of the input signal. A simple way of estimating the relationship between thresholds and durations is to plot threshold against duration on a dB vs. logarithmic-time scale. Data will fall roughly on a straight line with a slope of -3dB per doubling of log duration. Letting J represent the integration time of the auditory system, several scientists have estimated its magnitude, - the estimated values lying in the region of 50-200 ms. Some researchers report that J is greater at low frequencies while others have found no frequency dependency. Another way of determining the integration ability of the auditory system is to present equal energy tone bursts of different durations. An ideal energy integration would imply that the detectability of these tone bursts would be independent of duration. According to Green, this is only the case in the region 15-150 ms outside of which the detectability will fall off. The fall off at long durations indicates that the integration operation is time delimited, while the fall off at very short durations might be a result of the spread of energy over the frequency range that occurs for short duration pulses. Other scientists have found similar results, but again there is some variation in the results, - some scientists reporting frequency dependency (low frequency, long duration and vice versa) and others do not. According to [4], and essentially also [1], the integration time is frequency dependent, about 60 ms up to 1000 Hz decreasing linearly to around 10 ms at 5 kHz. This is fairly consistent with an effective time constant (or integration time) for speech around 20 ms. According to Niese, speech intelligibility can be predicted using an integration of so-called useful energy in the range up to 17 ms (full weight) and a linearly decreasing weight factor in the range 17-30 ms.


You also could look at this powerpoint on the matter.
http://depts.washington.edu/sphsc461/temp_res/temp_res.pdf

Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant.

In LTI, we care about durations, frequencies, sampling rates, and amplitudes in the time and frequency domains. In MHS, we also have to care about onset times, recuperation periods, levels of perceived loudness, inter-frequencies masking etc. "Four sounds a little louder than One" is not what LTI predicts, yet it makes perfect sense in the MHS framework.

The experiment illustrates at least two things:

(1) In MHS, perceived loudness depends not only on amplitude, but also on duration. This is a robust effect, linked to the hearing system's "slow" integrator, operating over tens of milliseconds. There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.

(2) Some of you will be able to differentiate between One and Two, some not. Or between Two and Three. Virtually everyone will be able to differentiate between One and Four. And this is for the "slow" integrator, considered rather consistent! Individual differences in functioning of the fast integrator are more difficult to elicit experimentally, yet they do exist.

Qualitatively, the number of dimension LTI operates in is smaller than the number of MHS dimensions. If we hold constant the value(s) in one or more of MHS dimension(s), we take the dimension(s) out of play, and then MHS behavior follows the LTI-predicted behavior more closely.

That's the general reason why "simple" music, mostly consisting of a small number of sinusoids slowly changing their amplitudes and frequencies over time, is more readily amenable to LTI analysis. The effects of the perceptual integrators fade away. Onset times matter less.

The "complex" music, with large number of sinusoids exhibiting fast and frequent onsets and fadeouts, chirps, and transients, is not as amenable to LTI analysis. The integrators play an important role in this case. We better preserve the information about the onset times more accurately.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant...
... We better preserve the information about the onset times more accurately.

Start with the irrelevant, end up with the repeated, ummm, misunderstanding. The perfect circle.
 
Top Bottom