Assuming we're listing to music and not pink noise, I find it's all about a system's ability to play those brief transient musical peaks with no distortion.
I won't argue against low distortion.
For example, for some number of milliseconds (microseconds?) the attack of a well-recorded snare drum is orders of magnitude louder than just about anything else likely to be happening in your music. That's the sort of trick that requires efficient speakers and/or absolute gobs of power (and, I guess, an impressive slew rate?) to pull off in a convincing way.
The snare may be perceptually louder, but it is unlikely to be prompting the amplifier to output more power.
The slew rate of any competent amplifier should be sufficient to handle the frequencies and signal slope.
The output wave of an amplifier (that's voltage) will follow the wave that is input to it.
Since "most" of our modern music presses the limits of the digital medium, I don't see any additional burst of power magnitudes greater than what just came before when some different sound is produced.
The bass guitar note that digitally presses 0dBfs will excite the amplifier to produce the same voltage levels as a snare hit that presses 0dBfs. That, to me, infers the same instantaneous power output.
There are perceptual differences between the bass and snare, because of the different frequencies involved, and their concentration and timing, but if I see the same voltage level produced by different "sounds" in the recording, I don't see any of them asking the amplifier to output "absolute gobs of power" beyond what it was already doing on different sections of the music.