• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Is high-resolution audio audible or not audible and a waste of data?

But to highlight phenomena, it is always necessary to start from simple signals.
“Simple” is not the problem here. It’s just not a realistic scenario. No music will ever have such a transient. And even if it would, no such transient will ever reach your ear. First of, only a fraction tweeters go beyond 20 kHz, and even less go beyond 40 kHz.

And yet, we can generate microsecond ITD samples with only 44.1 kHz sampled audio. How amazing, Isn’t it?

As pointed out by many, the time resolution of bandwidth limited, PCM sampled audio is not contingent on sample rate.

Nor can you just infer conclusions about audibly based on those vague ideas of yours. Where is the evidence that any of these so called “phenomena” are audible, let alone have influence on things like sound stage or fatigue, like you claimed in your topic start?
 


What impact can the "spreading" effect have on perception? Potentially, it can cause more difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and a sense of fatigue. This effect is probably more or less real depending on the musical contents, context, recording quality, reproduction system quality, but physically the effect on the signals is there… What do you think?​

Such "spreading" is a direct consequence of bandwidth limiting. This does not mean you can not detect 10 microseconds, or 5 microseconds, in the right binaural stimulii, with unimpaired hearing.

And, yes, you ***can*** send that signal via 16/44. The old 1/FS whopper remains a whopper, and the misapplication of the time/frequency tradeoff of a Gaussian, likewise, does not say you can't, either, because the QUESTION is signal detection, not JUST time-frequency tradeoff. Give it up. If there's a problem, that's not it.

(remember the widest filter bandwidth in your ear is under 5kHz (better considered as 2.4kHz or so, but one can argue limits), and that restricts the "edges" even more, and yet you STILL can hear effects under some carefully designed circumstances down to 5-10 microseconds)

Go here, please, and watch:


Thank you. While you're at it, consider the comments given that are carefully presented as pure speculation.
 
I knew it was one of them! Never bet against MIT! ;)

20120822153620-1.jpg
MIT was more fun my freshman year at Yale:

 
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?

In interaural terms, yes, it can be discerned under perfect conditions, but that does not say anything much about the ability of PCM to provide the necessary information for that. People relentlessly forget the anti-imaging filter. There are no steps in a proper DAC output. Nope. None.
 
I went down this path myself once. I went hunting for some plausible justification for high-res. I found the same papers, I think.

The basic error is confusing ILD/ITD detection with transient length. Two different psychoacoustic things. It took me a bit to understand that.

Because timing is also encoded by amplitude we get much better than 5us timing resolution.

Because the ear is physically unable to pass higher-than-20khz frequencies on to the brain, detecting a difference between a 44.1 and 96khz transient is very implausible, especially considering the lack of experimental data to support it.

Don't know why this thread made it to 6 pages.
 
You break through an open door with me...:)
In my studies I have always tried to run tests with "music-like" signals (you can find an example in this link). But to highlight phenomena, it is always necessary to start from simple signals.

Any such simple signals must be bandwidth limited.
 
Well you are mixing ITD, ILD with transients. I forget the details, but a single impulse has variable thresholds depending upon how loud the transient is. I think the time from memory is in the dozens of milliseconds range. Some very short transients may be audible if they are loud enough. I don't know if that is so short as 5 microseconds. I get the idea the spreading in time done by our ear which is essentially a filter to such things is why the perception of such things is related heavily to how loud it is. Such transients effectively have energy across all frequencies or very wide band energy. So you are only perceiving a portion of the wide band energy in the below 20 khz range. I don't know of info that says transient impulses or signal edges are perceptible at microsecond range levels. So such information is what everyone here would like for you to post.
I have already answered on the relationship between aspects related to ITD/ILD and transient steepness: the latter changes the flow of information that reaches our brain to process spatial information.
Your reasoning is correct if we analyze the signal in its entirety. But our ear doesn't know the future time course of the signal: it relies on what it has detected up to that moment to decode it. Now, while for the analysis of timbre (an unscientific term, I know) the harmonic analysis reveals a lot of what we can perceive, for the analysis of spatial information what is important are the instants in which the variations occur, which the classic analysis with Fourier does not highlight. In the more sophisticated models wavelets are used (it is no coincidence that the filters with which the cochlea's behavior is simulated have a pattern that closely resembles them), but here things get quite complicated.
 
Last edited:
for the analysis of spatial information what counts are the instants in which the variations occur, which the classic analysis with Fourier does not highlight.
Of course it does. We have frequency and phase components. If it didn’t, a proper FFT, iFFT would not result in the exact same waveform.
 
Of course it does. We have frequency and phase components. If it didn’t, a proper FFT, iFFT would not result in the exact same waveform.
I'm not sure if he's serious, now,
 
the relationship between aspects related to ITD/ILD and transient steepness: it changes the flow of information that reaches our brain to process spatial information.
Hmm, can you elaborate or cite something on this point? I don't totally understand. How would a steeper transient arriving at the same time carry spatial information that the lower-bandwidth transient doesn't, if we can't actually perceive the extra frequency content?

The filtering of the ear before the signal reaches the brain would look something like the digital filtering in your first post, right?

Also: consider that if timing information were limited to 5 microseconds resolution in a 44.1khz file, the pitch of sounds would be quantized and aliased terribly. You could only represent pitches that were a multiple of 5 microseconds. This is, as we know, not something that happens in digital audio.
 
I have already answered on the relationship between aspects related to ITD/ILD and transient steepness: it changes the flow of information that reaches our brain to process spatial information.
Your reasoning is correct if we analyze the signal in its entirety. But our ear doesn't know the future time course of the signal: it relies on what it has detected up to that moment to decode it. Now, while for the analysis of timbre (an unscientific term, I know) the harmonic analysis reveals a lot of what we can perceive, for the analysis of spatial information what counts are the instants in which the variations occur, which the classic analysis with Fourier does not highlight. In the more sophisticated models wavelets are used (it is no coincidence that the filters with which the cochlea's behavior is simulated have a pattern that closely resembles them), but here things get quite complicated.

Wavelets are irrelevant here, unless they are wavelets designed to mimic the ERB structure of the ear, including the eardrum and middle ear. Since wavelet transforms aren't lossy, that's not going to happen. (making a multiresolution system does not necessarily imply wavelets, please, there as many ways as there are needs, if not more)

Short term signal analysis does not require wavelets, and that can just as easily be done using a Fourier basis. One could also use a Cosine Transform Basis, a Sine transform basis, or many others, however the complex exponential is very handy in relating analysis to the behavior of the ear. The statement about 'a signal in its entirety' simply ignores the actualities of what's long-since done. One can window even the infinite Fourier transform, or use a window known to match the ear's overall window, of course, one can better just use a good impulse response model of a given cochlear filter. Or even a complex model, to capture the energy as a function of time, in a given bandwidth. For that, please see "gammatone" filters, as well as more complex stuff that's available in the periphery of the literature (that is more useful for analyzing masking thresholds, for instance).

The widest bandwidth ERB is under 5kHz, I'd argue for 2.4kHz. That bandwidth (note, this is not 'maximum frequency' or any such nonsense, but bandwidth) determines the width of the impulse response, REGARDLESS of minimum phase, constant delay, whatever kind of impulse response you care to mention. Fourier analysis exists not only in an 'infinite time' range but also as the time-limited fourier transform, and as the DFT, all of which are informative. What's more, by using the fourier transform, one can even develop the signal envelope as a function of time (in a given frequency range or not, take your choice, and with a particular filter shape or not, take your choice, the duality theorem still works fine) by simply making the analytic signal (hilbert signal) and calculating the signal envelope from that. David Hilbert was a very smart man.

No, wavelets are not very similar to cochlea filters, either. It would be convenient if that were the case, but it's not, and they don't. A proper wavelet requires a particular set of conditions, and a wavelet transform is not a lossy transform. The EAR is extravagantly lossy in very unusual ways.

As to "changes the flow of information", no, you haven't shown that, or even proposed a proper method. Furthermore, the nonlinearity involved is that of firing the inner hair cell. Yes, a bad filter can cause what's effectively a pre-echo and affect the partial loudness stream, no doubt, as has been demonstrated many times to matter down to sub-one-MILLIsecond range. The first firing of an inner hair cell has almost exactly (surprise!) the time detection accuracy one would expect from elementary mathematics, too (gets you down into the 5 microsecond range using reasonable noise considerations for the inner hair cell). Using a very generous bandwidth for the widest cochlear filter and the highest reasonable SNR one gets to about 1 microsecond. None of this is either unknown or particularly telling.

The ***ONLY*** issue that remains that might be a plausible mechanism is interaction of a steep filter with the cochlear filter, causing pre-firing of the inner hair cell. This is why I would prefer higher than 44.1 sampling rate BUT THERE IS NO EVIDENCE THIS IS ACTUALLY A PROBLEM, it remains, as it has been for 30 years now, a hypothetical mechanism. Remember that it's not the highest frequency a low pass filter allows that controls its impulse response length, but rather the steepness of its transition band and the required rejection in the stop band.

PCM easily reaches the ITD limits of the ear. ILD is not an issue in this discussion, and distortions, likewise. The only question is signal detection.
 
Last edited:
. Now, any phase shift between the channels of the ITD value, let's say about 5usec which corresponds to 1 sample for a 192KHz sampling, can be "detected" by our brain for both signals in the figure in my first post. But while the top one, at 192KHz, all the information to recognize the transition is available within 5usec, in the bottom one at 48KHz the same information takes 20usec to appear. Thus, the localization mechanism works on different information, it "strains" more to recognize the same event.

Why is sampling rate EVEN MENTIONED in this example? It's a total straw man. This appears to be spreading the whopper that the time resolution of a PCM system is limited to its sampling period again. Please, enough of that. It's just not a valid comparison.

I could just as well write that sentence as ... let's say about 5 microseconds, which corresponds to more than 1000 times the available time resolution of standard REDBOOK CD, and which therefore shows conclusively that there is no problem with time resolution in that regard...

But I didn't. It's still true that 5 microseconds is the period of a 200kHz sine wave. It's completely and without any dispute whatsoever irrelevant.
 
Last edited:
I love these discussions about tiny measurements! "OOH! It's 0.001dB difference!!" Ha ha ha ha ha! "OOOH! its a harmonic at 200kHz!!!" Haaaaaaa! I hate to break it to you chaps but honestly, if you're a male over 40 you're gong to struggle to perceive a 10kHz tone emanating from a set of speakers. All of this toing and froing over minute measured "problems" in audio reproduction are just academic, nothing more. I know that we don't usually get carried away with stuff like this on the forum but it's something that is appealing to us as males, y'know, "problem solvers". My philosophy is to enjoy the music whatever it's coming out of, a HiFi rack, a bluetooth speaker, a pair of headphones, an actual live band....
 
I love these discussions about tiny measurements! "OOH! It's 0.001dB difference!!" Ha ha ha ha ha! "OOOH! its a harmonic at 200kHz!!!" Haaaaaaa! I hate to break it to you chaps but honestly, if you're a male over 40 you're gong to struggle to perceive a 10kHz tone emanating from a set of speakers. All of this toing and froing over minute measured "problems" in audio reproduction are just academic, nothing more. I know that we don't usually get carried away with stuff like this on the forum but it's something that is appealing to us as males, y'know, "problem solvers". My philosophy is to enjoy the music whatever it's coming out of, a HiFi rack, a bluetooth speaker, a pair of headphones, an actual live band....
Well, you'd be dead wrong on that estimate of "over 40", but yes, to some extent this is indeed an argument about angels and pinheads.
 
all the information to recognize the transition is available within 5usec, in the bottom one at 48KHz the same information takes 20usec to appear. Thus, the localization mechanism works on different information, it "strains" more to recognize the same event.

No, this is making the sample rate = phase resolution mistake again. After you filter the audio above ~20khz (as the ear does) the two signals will look the same.

And as a practical matter: How is your ear going to "strain" to hear 75khz content that isn't in the recording to begin with and can't be heard even if it was?
 
But I didn't. It's still true that 5 microseconds is the period of a 200kHz sine wave. It's completely and without any dispute whatsoever irrelevant.
It is irrelevant because we speak about interaural time difference, not about single channel BW.
 
Honestly if you're worried about that kind of resolution, your speakers are probably the weak link by orders of magnitude.
And yet, in spite of their failing, they don't mask many issues with electronic equipment or lossy vs lossless formats.
 
It is irrelevant because we speak about interaural time difference, not about single channel BW.
No, now that is NOT quite an accurate statement. But it doesn't matter because interchannel or intrachannel, the time resolution is there in the Redbook standard. Co-articulation is a thing. It helps us understand speech in noisy environments.
 
Back
Top Bottom