If it takes time to move, then it seems like it is analogous to shacking a bowling ball?
But the ear can hear to 20kHz, so it obviously can wriggle around at that rate which is 0.05 msec, or 50 usec
One scientist has described it as moving a spring surrounded by corn syrup, but it's actually the thing the spring is attached to -- the basilar membrane -- and waves in the corn syrup that surrounds it that -- are moving and kind of pushing the inner hair cells (the outer hair cells move that way but also in different ways and do a different job).
I thought this in particular was a good explanation, of how the inner ear works --
This also is good though longer, and not as clear for the lay person and with less of a look at the active gain control and frequency selectivity system, but shows the corn syrup analogy:
But it also take time even after the hairs move for the potassium and calcium ions that rush into the hair cell upon hair deflection, depolarized the cell nuclear, release the neurotransmitters and cause an action potential an a neuron and then revert and then happen again.
That's where our time/sound resolution gets a little spotty. We process sounds in ear inner ear simultaneously using two sources of frequency information -- location on the basilar membrane --- the location on the membrane and the particular hair cell attached to it most activated by a specific frequency -- and nerve firing pattern -- the timing pattern of the nerve spikes are tracked and used by the brain. Below 5kHz our nerve spike patterns are phased to the signal, they repeatedly fire at the same point in the phase of the frequency being resolved, but above 5kHz the nerve firing process isn't fast enough to remain phase locked. Not coincidentally, when frequencies get above about 5kHz, we can still hear the sounds but we don't perceive them the same way -- they no longer sound like musically related pitch intervals. If you play "Mary Had a Little Lamb with frequencies only above 5kHz, you won't really hear it as having a melody.
So, our hearing is fast, but even within the range of sounds we can hear, it's not perfectly fast -- our time resolution of frequency breaks down over 4-5kHz.
It is certainly complex, but the idea that there is an importance on attack and timing seems more germane than steady-state behaviour.
In fact the diagonal coaming back from the brain to the ear, sort of squelch the steady-state sound after a while.
We're not really talking about "steady-state" behavior here, we're talking about periodic signals. And auditory science testing is done with all kinds of different signals from music to trains of clicks -- so very much pure attack sound and very momentary.
We know a lot about how the ear/brain responds to fast clicks, and fast trains and patterns of clicks. We're not as good as 50 uSec, never mind 10 uSec when it comes to, say, being able to hear a gap in pulses of broadband noise, or modulation in trains of clicks. 2-3 msec is more of out threshold there. We can process differences between sounds at one ear vs. another on shorter time frames. But our timing discrimination in other auditory processes are not equally accute.
That's why I say there is more that one element of "speed" in our hearing and difference speed and timing characteristics are involved in different processes. A lot of these inferences look at the fasted number -- 10 usecs of ITD or something -- and say, well, that's the speed humans can resolve.
Attack and sound envelope characteristics seem to be crucially important contributors to our perception of timbre, and so obviously they're crucial contributors to our experience of music. I don't think anyone would ever say otherwise, certainly I wouldn't. And not just impulse timing but system ringing after impulse, and also the timing of the harmonic spectrum envelope as well as the blended whole.
As I understand it, but I could have misunderstood, we have measurements that cover the performance of equipment in these areas -- a 10kHz square wave as a rise time in the nano seconds and has frequency components at the second harmonic that are at the very edge of human hearing -- that'll tell us if our system is fast enough to cover speeds we can hear beyond 10 uS, we can look at group delay to see how linear a system is likely to be with respect to harmonic spectrum envelope, we can look at impulse response to see how quickly or not a system stops after an impulse. In the video, IIRC, the guy asserts we don't have measurements capable of addressing these characteristics, and even if we do we don't have measurements that can resolve them at the speed our ears work. But that's just wrong. We do have these measurements and they do cover these characteristics and they are as fast or faster than our hearing.
The interaural time delay difference effects are more problematic to measure if we're all going home and listening to music over loudspeakers in our homes because room effects are going to totally swamp channel signal differences at 10 usec. But we can look at channel balances to tell us how linear things are from channel to channel up until the point when they hit the room.