Yes, I'm assuming both doing both what I mention in post #221 above (with you quote in part) and what I suggest previously at post #217. And I'm assuming rigour. There's no point otherwise. The rest is blather (or from another perspective, politics).
But the steps in the process I suggest would be:
1. Identify an amp thought to be, say, “impulsive”.
2. Identify an alternative amp thought not to be “impulsive”.
3. Identify a control amp of unknown impulsiveness
4. Set up a controlled comparison that exactly matches levels, and that keeps all amps in their specified operational range.
5. It may turn out that the adjectives emerge from clipping behavior, which might suggest an alternative setup. For that, I would record the sound of the amp through reference speakers in a way that is exactly similar for all amps, and the play back the recordings in a level-matched ABX test. The amps would be overdriven to clipping some carefully controlled percentage of the time, say 1%. Arguments would ensue about the choice of reference speakers, the recording methodology, the recording medium, and the phase of the Moon, of course.
6. Determine that perceived differences in amps are statistically valid when compared blind.
7. Only then can one attempt to correlate perceived differences with what one might see on an AP analyzer.
Rick “always suspecting that clipping behavior contributes to these impression” Denney