• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Is high-resolution audio audible or not audible and a waste of data?

Pinox67

Member
Editor
Joined
Apr 8, 2020
Messages
85
Likes
149
Location
Italy
I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:​
  • Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.​
  • Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.​
We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). However, downsampling does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.
Screenshot 2023-01-29 at 19.54.18.png

Above is a step signal sampled at a frequency of 192KHz; the transient here is of 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and could allows us to distinguish between the two trends.

What impact can the "spreading" effect have on perception? Potentially, it could cause more difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and a sense of fatigue. This effect is probably more or less real depending on the musical contents, context, recording quality, reproduction system quality, but physically the effect on the signals is there… What do you think?​
 
Last edited:
There are entire threads here about this. I think the conclusion is that you're wasting your time with anything more than 16 bits for playback. For use in a studio for signal manipulation, a lot more bits of resolution is common and is accepted as superior.
 
What do you think?
Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.
 
Honestly if you're worried about that kind of resolution, your speakers are probably the weak link by orders of magnitude.
 
I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:​
  • Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by low-level musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.​
  • Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.​
We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). Unfortunately it does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.

Above is a step signal sampled at a frequency of 192KHz; the transient here is 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and allows us to distinguish between the two trends. The "spreading" effect can cause greater difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and consequent fatigue effect.

So, at 44KHz we are missing something. Probably this loss is more or less perceptible depending on the musical content, recording quality and playback system quality, but there is…​
What do you think?
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?
 
Oh brother. Not again. And especially after this week’s DAC, Measurement and Breakin audibility nonsense and fairytales.

Edit. And I forgot the all time favorite in yet another season of “the difference in power cables” featuring GR.
 
Last edited:
@j_j has been tirelessly tried to refute the totally incorrect and debunked "claim" that the "time resolution" of Redbook CD is equal to 1/44100 (= 1/fs), but it keeps coming back. Please see the post below and the ones following it.
 
download.jpg
 
I'm quite sure that they could hear the difference, provided that a) the sample audience was composed of bats and dogs, or b) the sample audience was given 2 hits of high quality LSD.
You'll need some bizarre tweeter capable of reproducing 200kHz or so.
 
I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive
 
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?

It's certainly not me... it's what you find in scientific texts that talk about these aspects.
 
The problem is the whole audio chain.
If you have an AVR or a room correction that downsample the audio to 44khz there no use for a high rez audio input stream.
If it is a direct DAC to output then there is only benefits to get a high rez audio input stream.
 
Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.

Yes, I had already seen this interesting video some time ago. But the theme here is slightly different: we are talking about the time spread of transients when switching from signals with a high sampling frequency to a lower one and the fact that our auditory system could detect this effect in the form of an alteration of the localization of the sources in the space.
 
Last edited:
  • Like
Reactions: JRS
I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive

Even when I listen to music in the car or with the iPod, I can hardly tell if it's mp3 quality (naturally with little compression) or CD, and I'm happy about it. But these are not listens that I consider "quality". Comparisons must be made on quality playback chain, controlled environments and above all good audio material, with ABX tests and more persons.
 
Back
Top Bottom