What my research, and that done at Harman, was focused on was sound quality. If a loudspeaker is colored or distorted who cares what the spatial presentation is like? All loudspeakers of all possible directional configurations were evaluated for sound quality, and the prime requirement - neutrality/absence of resonances - was equally obvious in measurements made on all directivity options. How this fits into differing listener expectations for "soundstage and imaging" is a second level of judgment, where the consistency of directivity - whatever it is - is likely the key factor. So to the extent that there is "standardization" it is really only an attempt to ensure timbral accuracy of the reproduced sound, so that if a listener hears something he/she does not like - don't blame the loudspeaker. The recordings are also "weak links". Unfortunately, in a stereo reproduction it is difficult to alter "spatial" factors in loudspeakers that are affordable - active array designs are elegant solutions for the price of a decent car. So, "gasp", upmixing remains a viable option for most recorded material. That requires an upmixer that does not corrupt the soundstage or timbre, and that is a whole different discussion.
How can an upmixer not corrupt the soundstage?
2-channel stereo puts sound sources, reflections and reverb on or between the speakers.
Is there an upmixer which is able to put only the reverb of a symphony hall behind the listener, the audience only to the sides and behind him, the side-wall and ceiling only reflection coming from the sides and ceiling, and only the phantom sources and frontal reflections in front of the listening spot?
The reproduction of recorded music requires from the listener a certain level of abstraction, just like a photograph. The stereo illusion makes it more realistic and attractive to the ear but a good mono recording is perfectly capable of conveying a sense of space and of providing a powerful and emotionally charged listening experience. I get it from my Tivoli mono tabletop radio, in spite of it's warm and dark tonal balance, limited dynamics and curtailed bandwidth, and I listen mostly to orchestral music.
Some people find 2-channel stereo overly lacking, probably because of the absence of visual cues and limited "envelopment" or "immersiveness", and they will chase multi-channel or even audio-video. For others 2-channel is sufficient. Different types of speaker topology will produce different effects with different levels of "envelopment", and "spaciousness", image sharpness and tonal balance. The wider the dispersion the more chance of impacting the tonal balance and of masking the recorded ambience cues, because not all rooms are simmetrical, not all furniture layouts are simmetrical and absorb/reflect different frequencies in a similar manner, not every system is setup simmetrically in the room, sometimes the listening spot is not even in the apex.
Some years ago I remember quizzing audiophiles in 2 different forums about their listening setup and habits. Hardly anyone was listenting to a multi-channel system (perhaps because the sample was European where houses are smaller), but what I found most surprising was that around half of the participants wasn't not sitting at the apex, and some even had the system on a side wall.
Going back to the begining, different mics and mic'ing setup produce different results in terms of tonal balance, the amount of mechanical sound capture, the direct/reflected sound ratio, etc. Heavily processed studio mixes are often manipulated not to sound real but to produce a particular effect or presentation. Bad mastering practices not only compress the dynamic range but also affect the tonal balance. And because some genres are more affected than others it is legitimate to pursue a presentation which makes the sound of those recordings more pleasant or tolerable, particularly when tonal balance controls are no longer standard in amplifiers and most people don't use other forms of EQ. The overal balance of the system also depends on taste, with less demanding people generally preferering more bass and more treble than neutral. Some artifacts or distortions make the sound of dry or poor recordings more exciting or interesting. And depending on the listener's priorities in terms of spatial illusion he will choose narrow, wide, dipole, omni.
In a perfect world of simmetrical rooms and audiophile recordings perhaps standardisation might make sense. But reality is a lot more challenging.