I agree, a large performance space must be impossible to recreate perfectly with two speakers in a small room. A stereo recording in such a space will include reverb tails that give an impression of the space at least.
My understanding is that, in the listening room, there is in effect a "competition" between the "small room signature" of the playback room and the venue acoustics on the recording.
At the risk of oversimplifying, a room's package of spatial signature cues largely consists of three things: The time between the direct sound and the first reflections; the temporal "center of gravity" of the reflections (how much time between the direct sound and the "middle" of the reflections); and the reverberation tails. This applies to the playback room's spatial signature as well as to the recording venue's spatial signature, whether the latter be real or engineered or both. If the goal is a "you are there" presentation, then we want to suppress and/or disrupt the "small room signature" package of cues while simultaneously preserving/enabling the "venue acoustic signature" on the recording.
Here is one possible approach:
Use speakers with a fairly narrow and well-behaved radiation pattern and toe them in severely such that they "miss" the same-side wall, the first significant lateral reflections of each speaker therefore being the long across-the-room bounce off the opposite-side wall. Then if the room is fairly shallow, ideally we'd have something like a large wedge-shaped reflector (or perhaps a diffusor) behind the listening area to re-direct the first reflections off the back wall away from the listening area or at least weaken them. We want to do all of this WITHOUT reliance on absorption or excessive diffusion in the first-reflection zones, to preserve the high frequency content in the reflections.
The arrival time of the direct sound is unaffected, but the arrivals of the lateral (and ideally rear) reflections are pushed back in time significantly. The temporal "center of gravity" of the reflections is also pushed back in time. So the net result is, a somewhat fuzzy package of playback-room-size cues which does not correspond with reality, resulting (theoretically at least) in a "weakening" of the "small room signature" of the playback room. Imo we do NOT want to absorb the reverberation tails in the playback room because they function as the "carriers" for the reverberation tails on the recording.
Turning now to the "venue" package of spatial cues on the recording, unfortunately the venue's first reflections are not ideally presented by the direct sound because the arrival direction is wrong, so we start out with that handicap, but at least the arrival timing is preserved. The venue reflections' "center of gravity" will be effectively presented by the in-room direct sound and subsequent reflections, assuming their spectral content is correct (or close enough). And last but not least, the venue's reverberation tails can do a good job of conveying a sense of venue size (though not venue shape) assuming their spectral content has been (largely) preserved AND they arrive from many different directions. Imo delivering the recording's reverberation tails spectrally-intact and from many directions is the function of the in-room reverberation tails. Note that a good multichannel system uses the surround channels to deliver the venue's reverberation tails from many directions, spectrally intact, and sufficiently LOUD to convey an immersive sense of the venue space. Reliance on ONLY the two stereo channels puts a premium on NOT attenuating the in-room reflections too quickly so that they are still hopefully loud enough, but this aspect IS somewhat recording-dependent.
So if all goes as hoped, the "small room signature" package of cues will have been muddled and weakened because we pushed back the arrival times of many of the first reflections, while the "venue acoustic signature" on the recording, and in particular the "venue size" aspect, will be effectively presented largely due to our preservation of the reverberation tails on the recording. Again if all goes as hoped, the ear/brain system will find the (admittedly imperfect) "venue acoustic signature" package of spatial cues to be more plausible than the somewhat disrupted "small room signature" package of cues, such that the perceived acoustic space will be an approximation of that which is on the recording, assuming the recording contains sufficiently loud venue cues (reflections/reverberation). This would be a "you are there" presentation.
Easier said than done! And obviously still "imperfect" and recording-dependent. Perhaps worthwhile, and perhaps not.
Just to be clear, I am NOT saying this is the ONLY way to arrive at a "you are there" presentation. I'm sure there are smarter people than me who can get there with a wider-pattern speaker than what I have in mind.