I have found several sites, articles and tests on the web in which it is claimed that the CD quality music distribution format, 16bit/44KHz, is more than sufficient to preserve practically all the characteristics of the sound compared to higher resolution formats. 24bit/sample and higher sample rates such as 192KHz or more are only needed in the sound processing stages; once the final result has been produced, it can be converted into the CD quality by adopting the appropriate precautions without apparent loss of quality and with a considerable saving of space. In more detail, the operations performed are:
- Downsampling. Since our auditory system is unable to detect frequencies above 20KHz, and in any case these frequencies are rarely generated by musical instruments, the sampling frequency can be lowered to 44KHz without major problems. To avoid aliasing phenomena, i.e. that any frequencies higher than 22KHz enter the audio band, brick-wall type filters are applied here.
- Re-quantization. With 16bit/sample, 96dB of dynamics can theoretically be represented. It's a lot, but to preserve a good part of the dynamics that can be represented with 24bit, dithering and noise shaping algorithms are used which reshape the distortion due to re-quantization. The dynamics thus obtained exceeds 110dB in the audio bands to which we are most sensitive, which is a more than sufficient value to represent the dynamics of musical contents.
We observe that these mechanisms and the related justifications are valid as long as we take into consideration the aspects of perception concerning the harmonic analysis that our ear does to determine the timbre of the sound (not exactly a scientific term, but it is clear what we are referring to in practice). However, downsampling does not take into consideration a mechanism of our exceptional auditory system: that of the localization of sounds, i.e. the ability to identify sound sources in space, their size and to follow them over time if they are in motion. This function "works" in parallel with that of timbre perception, indeed, it acts before this, based on the analysis of the instant of arrival of the signals to our ears and/or their relative level (the harmonic contents are detected approximately after the first 2 cycles). Well, several studies have confirmed that the ability of our ear to distinguish transient events over time is very high: we are talking about values that oscillate between 6μs and 10μs, which determine a spatial resolution of about 1 degree. With a simple calculation it can be seen that using a sampling frequency of 44KHz, we will be able to correctly represent transients no shorter than 22μs; If they have a shorter duration, these will be "spread" over time. This aspect can be seen in the following figure.
Above is a step signal sampled at a frequency of 192KHz; the transient here is of 5μs. Below the same signal first downsampled to 48KHz (with anti-aliasing filter) and then upsampled again to 192KHz with a sinc type filter (linear phase). The lower slope of the transient, now lasting more than 20μs, is evident. Of course, one could argue here that the frequencies of the step signal which are cut off by the anti-aliasing filter are well above 20KHz and therefore in practice the first curve would be perceived as the second. This is true if we limit ourselves to the classic "steady-state" spectral analysis, on which our ear actually reaches this limit in the perception of the spectral components; but as far as reported, the analysis of transients takes place from a different part of our auditory system, with different mechanisms and with higher resolution, and could allows us to distinguish between the two trends.
What impact can the "spreading" effect have on perception? Potentially, it could cause more difficulty for our auditory system to distinguish sounds, penalizing the perception of the soundstage and a sense of fatigue. This effect is probably more or less real depending on the musical contents, context, recording quality, reproduction system quality, but physically the effect on the signals is there… What do you think?