Well I frankly don't understand your point. I found this graph, which I believe you posted, and is basis of your argument.
This is a study done by Floyd Toole in 1983 where he shows the sound quality and spatial ratings of three loudspeakers in mono and stereo.
The graphic tells us the listeners were less discriminating in stereo compared to mono, for both sound quality of spatial quality. Most of the difference in spatial ratings between tests can be attributed to the Quad, a dipole loudspeaker.
In mono, its colorations and higher directivity produced lower fidelity and spatial ratings. In stereo, the colorations were apparently less audible and it had higher sound quality and spatial ratings. As Floyd shows in the paper, the Quad's spatial ratings varied significantly with program material (unlike the others) where independent ratings were given. For choral and pop recordings its spatial ratings were last, and approximate its ratings given in mono. I wonder if its sound quality ratings also fluctuated with program. Unfortunately, it doesn't show this data, but it timbre tracked spatial that would indicate they are not independent.
First, it seems a bit selective to discount mono tests based on 1 test and 1 loudspeaker. I don't think any conclusions or generalizations can be made from one sample. I don't know if separate tests were done for sound quality and spatial ratings so the ratings were independent judgements. This would have been more work but minimize potential halo bias effects ( ratings for different attributes tend to be highly correlated with preference). In other words, did the change in timbre across tests influence the spatial ratings?
Also, the results of this test, did not convince Floyd to abandon mono testing and do tests in stereo. In fact, if anything it convinced him to stick to mono and abandon stereo because mono are more sensitive tests. The spatial differences in the stereo tests were largely isolated to one minimal and largely variable and attributed to the recordings.
His conclusions:
" Conducting the listening tests in monophonic and stereophonic modes revealed some important similarities and some interesting contrasts in the results. In general, assessments of sound quality were very similar in both modes, except for loudspeakers with significant imperfections, in which case the monophonic evaluations result in lower ratings than the stereophonic tests. In other words, in respect of Judging sound quality - the transduction accuracy of the loudspeaker - monophonic tests appear to be more demanding, or stereophonic tests less sensitive. "