"Fast" is meaningless. Any transducer that can reproduce any particular frequency is by definition doing it at that frequency (speed=frequency). A speaker generating a 1kHz tone is vibrating back and forth at exactly 1 kHz. There is no such thing as a slow or fast 1 kHz tone.
The point that I'm interested to hear explained better is not speed, but how transient response and resonance after the transient affect the perception of "fast impulse response". I can imagine the magnet in a dynamic driver responding at a given frequency to a transient signal, but the propagation of the impulse throughout the speaker cone occurring over time. This is distortion of the fundamental into the time domain, either as a smearing of the fundamental or as some form of resonance. The addition of distortion is, indeed, expressed in the frequency response, but is still distortion, and could contribute to a perception of one speaker (for example, with a stiff, light, beryllium cone) as "fast" or "resolving", whereas another speaker (for example with a soft, paper cone and a stiff surround) as "slow", or "soft". Both speakers are generating, say, the same 1kHz transient at the same 1kHz frequency, but sound very different. One has a spiky impulse graph with immediate decay, and the other's is smeared out over time with a sloped attack, distortion and/or a trailing resonance/decay.
Why wouldn't graphing or quantifying these aspects of transient response be useful? Why wouldn't we want to do so and correlate the results, if possible, with either subjective or objective measures of fidelity? Do we (or can we) get a useful picture of this aspect through any other of our routine measurements? Surely THD percentage is too blunt a measurement to provide a useful analysis of this? Why aren't impulse response graphs or square waves useful in doing this?