Well we have used several different models as reference amplifiers alongside with my old school and very bulky class A amplifier.
I.e we used nCORE 500OEM, ICEPower 1200AS2 to compare the audio experience to our own design.
Compared to ICEPower The nCore has the best measured performance, but the ICEPower is a full bridge design, where the nCore is halfbridge.
This leads to different behavior in the real world. Full bridge will not suffer from power supply pumping, but it will be more noisy, as the noise stems from 2 amplifiers in bridged mode.
They also sound different, all though data should be sufficient for transparency (se datasheets)
The 1200AS2 combines a single stage PFC converter with a 2x 1200 W high performance, ICEedge based class D amplifier. Besides extreme audio performance and power it also features monitor outputs for amplifier temperature, voltage and current output. The DC hanger bus can be used for powering...
icepower.dk
The Hypex NC500 OEM is an extremely high-quality audio power amplifier module which operates in class D. Output power: 2Ω - 550W; 4Ω - 700W; 8Ω - 400W
www.hypex.nl
Our own design would be a bit more like the nCore, since it is also a halfbridge design, but with much more gain (28dB), and can thus be driven by i.e. a unity gain buffer, or a strong pre-amp directly if wanted.
I do believe in good measured performance, I´ve never heard HI-FI gear with problematic measured performance giving af superior audio experience, but it is also my experience, that you cannot always cook everything down to measurements alone.
Different design principles can also contribute to the final experience, despite they might have almost equal performance in the lab.
The example above with half bridge vs full bridge is just one of theese, they sound different maybe because they are different, but they do not differ that much in the testlab.
Some other class D designs performs very well because of i.e. very low dead-time, this is a very important aspect in class D design.
Excessive dead-time leads directly to distortion, thus dead-time is important, and should be lowered, but this is not easy to do.
First of all insufficient dead-time leads to shoot through in the switching devices and is fatal in milliseconds, very low dead-time rases the risc of excactly that. It also develops excessive heat, if the dead-time is to low, so dead-time has to be sufficient, allowing higher distortion, if you consider reliabillity important.
The solution to this problem is feedback technology, advanced feedback loops of higher order can reduce the distortion stemming from dead-time to very low values, and at the same time reducing stress placed on the switces and adjacent components significantly, thus improving reliabillity.
Also these two ways of reducing distortion sounds different, despite that measured performance indicates that this should not be the case.
In this example one could say, that you have "Clever engineering" vs "Precision engineering".
Low levels of distortion, Zout, noise and any unwanted signals are always preferable, but this will not always result in excactly what you expected.
This is not spooky, but probably just something that doesn´t show significantly in a normal set of measurements, or what do I know, and that is yet to be discored I think.