• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Is high-resolution audio audible or not audible and a waste of data?

Pinox67

Member
Editor
Joined
Apr 8, 2020
Messages
85
Likes
148
Location
Italy
I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:​
  • Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.​
  • Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.​
We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). However, downsampling does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.
Screenshot 2023-01-29 at 19.54.18.png

Above is a step signal sampled at a frequency of 192KHz; the transient here is of 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and could allows us to distinguish between the two trends.

What impact can the "spreading" effect have on perception? Potentially, it could cause more difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and a sense of fatigue. This effect is probably more or less real depending on the musical contents, context, recording quality, reproduction system quality, but physically the effect on the signals is there… What do you think?​
 
Last edited:

fpitas

Master Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
9,885
Likes
14,191
Location
Northern Virginia, USA
There are entire threads here about this. I think the conclusion is that you're wasting your time with anything more than 16 bits for playback. For use in a studio for signal manipulation, a lot more bits of resolution is common and is accepted as superior.
 

danadam

Addicted to Fun and Learning
Joined
Jan 20, 2017
Messages
956
Likes
1,496
What do you think?
Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.
 

fpitas

Master Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
9,885
Likes
14,191
Location
Northern Virginia, USA
Honestly if you're worried about that kind of resolution, your speakers are probably the weak link by orders of magnitude.
 

fpitas

Master Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
9,885
Likes
14,191
Location
Northern Virginia, USA
I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:​
  • Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by low-level musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.​
  • Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.​
We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). Unfortunately it does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.

Above is a step signal sampled at a frequency of 192KHz; the transient here is 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and allows us to distinguish between the two trends. The "spreading" effect can cause greater difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and consequent fatigue effect.

So, at 44KHz we are missing something. Probably this loss is more or less perceptible depending on the musical content, recording quality and playback system quality, but there is…​
What do you think?
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?
 

HarmonicTHD

Major Contributor
Forum Donor
Joined
Mar 18, 2022
Messages
3,326
Likes
4,829
Oh brother. Not again. And especially after this week’s DAC, Measurement and Breakin audibility nonsense and fairytales.

Edit. And I forgot the all time favorite in yet another season of “the difference in power cables” featuring GR.
 
Last edited:

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,656
Likes
5,819
Location
US East
@j_j has been tirelessly tried to refute the totally incorrect and debunked "claim" that the "time resolution" of Redbook CD is equal to 1/44100 (= 1/fs), but it keeps coming back. Please see the post below and the ones following it.
 

fpitas

Master Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
9,885
Likes
14,191
Location
Northern Virginia, USA
download.jpg
 

fpitas

Master Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
9,885
Likes
14,191
Location
Northern Virginia, USA
I'm quite sure that they could hear the difference, provided that a) the sample audience was composed of bats and dogs, or b) the sample audience was given 2 hits of high quality LSD.
You'll need some bizarre tweeter capable of reproducing 200kHz or so.
 

Talisman

Addicted to Fun and Learning
Forum Donor
Joined
Mar 27, 2022
Messages
897
Likes
2,546
Location
Milano Italy
I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive
 
OP
Pinox67

Pinox67

Member
Editor
Joined
Apr 8, 2020
Messages
85
Likes
148
Location
Italy
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?

It's certainly not me... it's what you find in scientific texts that talk about these aspects.
 

valerianf

Addicted to Fun and Learning
Joined
Dec 15, 2019
Messages
691
Likes
442
Location
Los Angeles
The problem is the whole audio chain.
If you have an AVR or a room correction that downsample the audio to 44khz there no use for a high rez audio input stream.
If it is a direct DAC to output then there is only benefits to get a high rez audio input stream.
 
OP
Pinox67

Pinox67

Member
Editor
Joined
Apr 8, 2020
Messages
85
Likes
148
Location
Italy
Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.

Yes, I had already seen this interesting video some time ago. But the theme here is slightly different: we are talking about the time spread of transients when switching from signals with a high sampling frequency to a lower one and the fact that our auditory system could detect this effect in the form of an alteration of the localization of the sources in the space.
 
Last edited:
  • Like
Reactions: JRS
OP
Pinox67

Pinox67

Member
Editor
Joined
Apr 8, 2020
Messages
85
Likes
148
Location
Italy
I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive

Even when I listen to music in the car or with the iPod, I can hardly tell if it's mp3 quality (naturally with little compression) or CD, and I'm happy about it. But these are not listens that I consider "quality". Comparisons must be made on quality playback chain, controlled environments and above all good audio material, with ABX tests and more persons.
 
Top Bottom