- Such a pulse is an artificial signal which is not bandlimited to 96 kHz. A DAC would add some ringing at 96 kHz (the amount depends on the characteristics of the reconstruction filter) and probably aliasing components as well.
- This pulse contains all frequencies from 0.5 Hz up until the maximum depending on the risetime (the extreme of this is the dirac pulse which contains all frequencies up into infinity) with equal amplitude.
- Therefore you can hear such a pulse, even when using only a tweeter, as long as the tweeter emits frequencies below 20 kHz.
Aha, it feels like we are getting somewhere. I see it differently. The pulse,
inherently, doesn't contain any periodic frequencies. It only happens once.
A function of time is just a function of time. You can elect to approximate it with a linear combination of other functions. It is more mathematically convenient to calculate the coefficients of the approximation if the other functions are orthogonal in some mathematical sense.
The sinusoids used in Fourier analysis is just one kind of such function families:
https://en.wikipedia.org/wiki/Orthogonal_functions. In fact, researchers showed that the pulses we are discussing can be much more compactly represented by certain families of wavelets, rather than by sinusoids.
In order for a function to be approximated by a linear combination of other orthogonal functions, there must be something - some mechanism - that calculates the approximation. The mechanism employed by the mammal hearing system does not use Fourier transform.
It uses a different type of transform, based on mechanics and neurophysiology of cochlea, which assigns extracted frequencies to bins spaced approximately logarithmically, rather than at equal intervals like in the case of Fourier transform. In some ways, the cochlear transform is cruder than the Fourier. In other ways, it is more advanced, honed by millions of years of evolution.
A sufficiently short pulse arriving to cochlea does transform into all the cochlea frequency bins. However, it was experimentally proven that the distribution of extracted frequency coefficients doesn't depend in a detectable way on the rise time or other shape-related parameters of the pulse - as long as the pulse duration is much shorter than the characteristic time of the highest detectable sinusoidal frequency (approximately 50 μs for humans).
What matters is the mechanical momentum transferred, and timing of the pulse. What it means is: if an audio delivery system preserves these two characteristics of such pulses, a human will hear on such system the transients he or she would hear in a high-end studio mastering room while listening to the master record.
Unfortunately, as we've seen with examples of cymbal sounds captured at 176,400 sampling rate, downsampling such signals to 44.1
may,
sometimes, result in the mechanical momentum and timing of such pulses not being preserved, which introduces transient distortion.
It appears that using 192/24 PCM throughout the audio delivery chain ensures that the momentum and timing of transients is preserved with sufficient accuracy for the overwhelming majority of music records. Some people, who record large symphonic orchestras, argue that 384/24 works even better for their use cases.