That's a good insight about the 'illegal' signals. I agree. I also agree that 48/24 appears sufficient for reproducing perceptually transparently a limited number of natural music instruments and vocals - for me personally, up to about four.
Continuing on the Sampling Theorem mathematical perfection vs reality. The Theorem only works perfectly if the sampled values are captured with perfect precision, and the reconstruction is done with perfect precision as well. Stuart and Craven (
https://secure.aes.org/forum/pubs/journal/?elib=20457) illustrate imperfections caused by the process of quantization, how the imperfections are affected by various types of dithering, and - rare to see in an audio publication - their impact on the accuracy in time domain.
The graphs below demonstrate how doubling the sample rate accelerates the convergence between the true value of signal and what was imperfectly captured through the quantization process. Given enough samples, the variance becomes very small: that's what the well respected gurus of the Sampling Theorem describe to us when they talk about the perfection of digital audio in their video tutorials.
However, such perfection isn't achieved right away. Qualitatively, figuring out the true shape of a signal, when the values of samples are not captured perfectly, requires averaging over time. If the signal doesn't change its shape too much during the characteristic time of averaging, the process converges. Because of that, quantization works very well for a single sinusoid with constant amplitude - a staple in the Sampling Theorem video tutorials.
If we add a second sinusoid, we now need more samples to figure out the shapes of two of them, mixed together, with the same precision as we did for one sinusoid. Qualitatively, we need twice as many samples, yet this is not exactly true, because the two sinusoids effectively start dithering each other, resulting in a quicker conversion toward true value. Still, definitely more samples are needed to achieve the same level of accuracy. Which, at a given sample rate, means more time.
As we add more and more signal components, we need to add more and more samples, in order to capture the signal in such a way that we can reconstruct if with a required level of precision. The graphs below depict averages over simulations of signals meant to approximate what is encountered in real music. You can see that the number of samples required to converge to a desired level of precision (let's say, of about 0.3 units on these graphs) is not small - on the order of thousands for the lower sampling rate. The higher the sampling rate, the quicker such convergence is achieved.
This effect, qualitatively, hints at what might be going on when a complex music - with hundreds of sinusoids exhibiting quickly varying amplitudes and frequencies, intermixed with transients - is quantized. If the sampling rate and bit depth are not high enough, the components of music may never converge close enough to to their true values during an intense music passage. Then the reconstruction - however perfect - results in an analog signal which kinda sorta resembles the fragment of original symphony, yet sounds decidedly fake.
The right question to ask at this point is - how far do we need to go with the sampling rate and bit depth for the music to sound absolutely transparent? And the right answer is "It depends": on particular music piece, on particular sound delivery system, and on particular person, including the person's neurophysiological condition at a particular moment.
Sound delivery systems are not perfect. Human hearing systems are not perfect either: even when presented with live music, we can't sometimes even approximately perceive what a particular musician is playing at a particular moment. So, there is a practical limit to the "digital perfection", beyond which the other inherent imperfections start dominating the total subjective imperfection level.
View attachment 27753