In my experience, having made hundreds of commercial recordings (but admittedly a long time ago) and listened to them at work and at home:
Virtually none of it is the signal. Virtually all of it is the speakers and the room - and they
must be considered together. Think of a headphone for a minute - there's a driver and a cup working together, and nothing else. Your speaker is the driver, and your room is the cup, working together, and there's nothing else.
What creates the "best stereo image" or "soundstage" is tight pair matching between the speakers, and the absence of local aberrations at the speaker positions, such as port chuffing, panel resonances, and so on. Position the speakers with an open mind, not a pre-planned scheme. Treat the room so that reflections are well under control.
Such measures will provide the best platform to examine the recording. As
@Blumlein 88 notes above, some are spectacular - I made a few crossed-pair Blumlein recordings which are staggering - and some are pretty bad, but as long as there was some panning and potting going on, there will be some kind of image.
To answer the question, therefore, the metrics would be measuring the pair matching, and any non-musical rattling, buzzing, hooting or smearing from the cabinets. Sadly almost no one tests for either.