It seems like some people are a bit "stuck"
* in thinking that the physical dimension of the listening room and the loudspeakers placed in that room is all there is for creating the soundstage, maybe they have never had the opportunity to listen to a system set up in a way that the recorded information is the perceptually dominant thing over the "small room signature" interference that their listening environment creates, which otherwise overshadows the recorded room information and the extension of depth that can go way deeper than the wall boundaries of the listening room.
As most of this has to do with the room information on the actual recording, there are a lot of things that are left to get right if that information will reach our ears more clearly. The loudspeakers must be positioned so that the listening environment interferes less destructively, the listening positioning in the room and a shorter distance to the loudspeakers will help to increase the direct sound ratio, and the reflections from the listening room could be tamed to increase the amount of direct sound further.
But beyond that, I also think that the choice of loudspeakers can affect the perception of the recorded soundstage, not that they should create something on their own (at least not from a pure reproduction point-of-view), but they should, in the best way possible, be able to reveal the actual information on the recording without too much destructive interference on their own. As I believe the room sound is part of the "smaller and more sensitive" information on the recordings, I think good matching in certain frequency areas between the two loudspeakers in a stereo pair can be of importance among other things as low distortion, not having the crossover points in the wrong places and that the drivers play in a time-wise unified way (which is probably the reason why point-source speakers are considered great when it comes to the soundstage).
* There is of course a completely different view on the matter. Some people are more into creating the sensation of "the musicians are here in my listening room" instead of the "I'm there where the recording took place" kind of sensation, and there are speakers that are made in ways to fully take advantage of the reflections of the listener's environment, like back-firing dipole speakers and different kind of ortho-acoustic speakers. Nothing wrong with that view as long as that rocks someone's boat.