Is high-resolution audio audible or not audible and a waste of data?

Pinox67 · Jan 29, 2023

I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:

Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.
Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.

We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). However, downsampling does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.

Above is a step signal sampled at a frequency of 192KHz; the transient here is of 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and could allows us to distinguish between the two trends.

What impact can the "spreading" effect have on perception? Potentially, it could cause more difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and a sense of fatigue. This effect is probably more or less real depending on the musical contents, context, recording quality, reproduction system quality, but physically the effect on the signals is there… What do you think?

bkatbamna · Jan 29, 2023

I had a hard time telling apart 320kb/s mp3 files from WAV files. I'm sure distinguishing 16 bit 44kHz wav files from higher resolution files would be even harder for me.

BinkieHuckerback · Jan 29, 2023

My ears are fine with 16/44.1; my wallet agrees.

fpitas · Jan 29, 2023

There are entire threads here about this. I think the conclusion is that you're wasting your time with anything more than 16 bits for playback. For use in a studio for signal manipulation, a lot more bits of resolution is common and is accepted as superior.

danadam · Jan 29, 2023

Pinox67 said:
What do you think?

Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.

Time resolution of Redbook (16/44) PCM

Ok, that old BS about "only one sample time resolution" came up again elsewhere. I made the following demonstration to put the screws to that. I generated a 10kHz sine wave at 44.1 kHz sampling rate. The matlab file is attached. I generated 128 discrete phases uniformly split over one cycle...

www.audiosciencereview.com

voodooless · Jan 29, 2023

Pinox67 said:
What do you think?

I think it’s a load of nonsense.

fpitas · Jan 29, 2023

Honestly if you're worried about that kind of resolution, your speakers are probably the weak link by orders of magnitude.

lowgain · Jan 29, 2023

Pinox67 said:
What do you think?

Prepare a sample that displays this problem and let us ABX it.

fpitas · Jan 29, 2023

Pinox67 said:
I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:

Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by low-level musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.

Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.

We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). Unfortunately it does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.

View attachment 260918

Above is a step signal sampled at a frequency of 192KHz; the transient here is 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and allows us to distinguish between the two trends. The "spreading" effect can cause greater difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and consequent fatigue effect.

So, at 44KHz we are missing something. Probably this loss is more or less perceptible depending on the musical content, recording quality and playback system quality, but there is…
What do you think?

Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?

HarmonicTHD · Jan 29, 2023

Oh brother. Not again. And especially after this week’s DAC, Measurement and Breakin audibility nonsense and fairytales.

Edit. And I forgot the all time favorite in yet another season of “the difference in power cables” featuring GR.

NTK · Jan 29, 2023

@j_j has been tirelessly tried to refute the totally incorrect and debunked "claim" that the "time resolution" of Redbook CD is equal to 1/44100 (= 1/fs), but it keeps coming back. Please see the post below and the ones following it.

High Resolution Audio: Does It Matter?

* I am not an expert on pyschoacoustics and I dare say, no one on this planet is either. I am. I know quite a few others. This kind of "mankind can never know" is really tedious. No, nobody knows EVERYTHING, but it is possible to determine limits, and live within them. Now don't take that as...

www.audiosciencereview.com

fpitas · Jan 29, 2023

Steven Holt · Jan 29, 2023

steabert said:
Prepare a sample that displays this problem and let us ABX it.

I'm quite sure that they could hear the difference, provided that a) the sample audience was composed of bats and dogs, or b) the sample audience was given 2 hits of high quality LSD.

fpitas · Jan 29, 2023

Steven Holt said:
I'm quite sure that they could hear the difference, provided that a) the sample audience was composed of bats and dogs, or b) the sample audience was given 2 hits of high quality LSD.

You'll need some bizarre tweeter capable of reproducing 200kHz or so.

markanini · Jan 29, 2023

Still relevant:

Talisman · Jan 29, 2023

I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive

Pinox67 · Jan 29, 2023

fpitas said:
Wait, let me get this straight. You're claiming your ears (not mine, for sure) can accurately assess 5uS steps?

It's certainly not me... it's what you find in scientific texts that talk about these aspects.

valerianf · Jan 29, 2023

The problem is the whole audio chain.
If you have an AVR or a room correction that downsample the audio to 44khz there no use for a high rez audio input stream.
If it is a direct DAC to output then there is only benefits to get a high rez audio input stream.

Pinox67 · Jan 29, 2023

danadam said:
Literally, oh boy, not this again.

The 5 to 10 µs is for interaural time delay and 16/44 is more than enough, with a huge spare, to handle this.

Time resolution of Redbook (16/44) PCM

Ok, that old BS about "only one sample time resolution" came up again elsewhere. I made the following demonstration to put the screws to that. I generated a 10kHz sine wave at 44.1 kHz sampling rate. The matlab file is attached. I generated 128 discrete phases uniformly split over one cycle...

www.audiosciencereview.com

Yes, I had already seen this interesting video some time ago. But the theme here is slightly different: we are talking about the time spread of transients when switching from signals with a high sampling frequency to a lower one and the fact that our auditory system could detect this effect in the form of an alteration of the localization of the sources in the space.

Pinox67 · Jan 29, 2023

Talisman said:
I don't know if anyone is capable of feeling such differences. I can't distinguish a 320 mp3 file from FLAC, imagine if I can distinguish 16/44 from 24/96.
Music in 16/44 is all I need, but only for the psychological tranquility of having lossless files, but the reality is that a 320kbs mp3 already has all the audio quality that I can perceive

Even when I listen to music in the car or with the iPod, I can hardly tell if it's mp3 quality (naturally with little compression) or CD, and I'm happy about it. But these are not listens that I consider "quality". Comparisons must be made on quality playback chain, controlled environments and above all good audio material, with ABX tests and more persons.

Is high-resolution audio audible or not audible and a waste of data?

Member

Senior Member

Addicted to Fun and Learning

Master Contributor

Major Contributor

Grand Contributor

Master Contributor

Member

Master Contributor

Major Contributor

Major Contributor

Master Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Major Contributor

Member

Addicted to Fun and Learning

Member

Member

Similar threads