Cosmik,
Its a good question, the issue is a lack of standards in testing that exercise amps fully, some reasons two solid state amps measuring the same thd and noise at say 100 watts into a resistor, could sound different and not caught on your "typical and easy and safe audio tests".
the input impedance is not the same, so the output FR will not be the same
the output impedance is not the same, so the output FR will not be the same
the noise does not vary the same with output level
one amp has more or less feedback than the other, one sounds thinner than the other
the actual power reserves of one are barely able to meet the power output, the other one has several db of headroom for transients
the group delay between the two amps is different
the harmonic spray is different, audible differences in sound
one amp feedback network does not handle a fast transient without creating more harmonics
one clips different than the other, one soft, one hard
channel separation is quite different across the band
thd and imd is quite different across the audio band
damping factor is quite different between them
slew rate is different between them
one amp can meet full power with both channels driven, the other only when one channel is driven
the ability to handle reactive loads is different
crossover distortion at low levels is different, here is where the benchmark power amp shines way above a lot of much more expensive amplifiers
total bandwidth is different
amps distortions vary differently with power and load
one amplifier is 20 or 50 times the price of the other, so it always sounds better on sighted tests ahahahhah