Very interesting. This might be related to why, when Harman evaluated the Quad electrostatic in mono,
its score was terrible: Not enough reflections for the narrow-pattern Quad to be enjoyable. Then when they evaluated it in stereo its score improved significantly, BUT it still ranked third out of the three which were being compared. Toole draws the conclusion that the ranking is not changed by stereo listening but is more differentiated in mono listening, therefore mono listening is preferred for evaluation purposes. And in general I agree... BUT one can also draw the conclusion that mono listening penalizes some types of loudspeakers, as evidenced by the significant (though not decisive) rise in the Quad's score when auditioned in stereo.
In my opinion, in a home audio listening room there is, in effect, a COMPETITION between two sets of spatial cues:
1) The (highly desirable) spatial cues on the recording, whether they be real or engineered or both; and
2) The (generally undesirable) cues inherent to our relatively small, inadequate playback rooms.
At the risk of oversimplifying, the ear will select whichever set of cues seems most plausible. Unfortunately this tends to be the "small room signature" of our playback rooms. What we'd like to do is, have the spatial cues on the recording dominate. This is easier said than done, because we only have a poverty of recording venue cues, relative to our playback room's inherent small-room-signature cues. [I can hear Kal Rubinson saying, "There's a fix for that, ya know..."]
Simply absorbing or otherwise nullifying all reflections in the playback room doesn't work. In order to most effectively convey a sense of immersion in the acoustic space on the recording, we need to have high quality (spectrally correct) reflections arriving from many directions. The multi-channel guys got THAT right!! So if we're doing two-channel, then we NEED a lot of spectrally-correct in-room reflections in order to effectively present the recording venue cues from many directions.
But it's not quite that simple. We need to MINIMIZE the "small room signature" conveyed by playback room, without simultaneously minimizing all those desirable spatial cues on the recording. Remember, we want the recording venue cues to dominate so we can enjoy that delicious sense of envelopment, assuming a suitable recording.
My understanding is that the EARLIEST reflections play a disproportionately large role in conveying the size of a room. The reverberation tail also plays a role. We want a fairly long (but not too long) reverberation tail because it conveys ambience cues from the recording, so we need to focus on minimizing the early reflections while encouraging the later ones. Done correctly, this results in the spatial cues which are on the recording dominating over the otherwise default "small room signature" cues. This is something good multichannel does very well... better than we can do with two-channel, but imo we can still make worthwhile improvements.
(One source I came across, and unfortunately did not record, stated that the ear looks at the time gap in between the first-arrival sound and the "center of gravity" of the reflections to judge the size of the room. To the extent that we can push that "center of gravity" back in time, we can reduce "small room signature", and unmask the spatial cues in the recording. I find this paradigm to be particularly useful, just wish I could remember where it came from.)
One way we can do this in a two-channel system is to set up along a room diagonal. This geometrically precludes early lateral reflections in the listening area, without reliance on absorption. Floyd Toole used this technique in one of his listening rooms in a house he used to live in. There are other techniques of course; the main point is to be aware of the general principle of avoiding early reflections, while encouraging later ones.
In this context, I consider "early reflections" to be those arriving less than about 10 milliseconds behind the first-arrival sound, and Geddes is my source for this number. Linkwitz suggested 6 milliseconds. There is an EBU standard which mentions 15 milliseconds, which was taken into consideration when Harman designed their speaker-shuffling Multichannel Listening Lab. So there is arguably some uncertainty about exactly where the target lies, but I think the general concept is valid.
Note that the playback room does not need to have a 100+ millisecond decay time in order to convey the sense of envelopment that we might get in a concert hall, as long as that 100+ milliseconds of decay is on the recording. The playback room is the DELIVERY SYSTEM for that 100+ milliseconds of decay, we just need to get the playback room's own signature out of the way as much as we reasonably can.