Here is my thoughts.
Most of the depths we hear from a recording and two channel playback is only an illusion . When Im recording chamber musicians, I often have them positioned 4 meters from a big stonewall . This brings a nice acoustic reverberant sound on the recording , but the debth is often an illusion in the listening room . The microphones are often placed near the instruments . When I only use 2 microphones I have to adjust the placements of the microphones for the best illusion , and this means that while the best listeningplace in the concert hall is at row 5, - 8 meters from the musicians , the best recording position for the two microphones is much less distance, often less than 2 meters if there is 3 musicians playing .
When I record a grand piano in a big stone church I set up the two omni microphones about 1 meter from the center of the open lid piano , spaced 53 cm apart.
This often gives a nice reverberant sound on the recording , sounding like you sit 5 meters from the grand piano .
So, in my opinion there is no such thing as a natural depth in the 2 channel recording , even if the recording is done with only two microphones in a purist way . The stereo system is a very flawed system.
In my opinion , the illusion of depth at home comes from how your speakers are set up, how the walls behind your speakers interplay with the direct sound from the speakers , the size of the room , the walls , and the listening position .
Paul McGowan explains this in an entertaining way . And in this particular video, I believe he is right .
This is an interesting take on it.
I've been recording weekend warrior bar bands and free concerts in the park as a hobby, using a variety of techniques.
For a while I tried using a 10 channel mid-side microphone array using 6 multi-polar microphones (Behringer B2 Pro, got them for $500 total from Musician's Friend refurbed) plus a Zoom R24 recorder with 6 channels of phantom power.
I configured one mike as omnidirectional mid and put the other 5 mikes as figure 8 sides in close proximity, with a variety of hardware all screwed together to hold them steady.
The advantage of this arrangement is that I could mix the recording down to 7 channels and write a surround sound file that played back in 9 channel DTS expansion on my home theater. This more or less faithfully reproduced what I heard at the original performance, except of course contaminated with my playback room acoustics. That was helpful because with my hearing loss and especially the distortion of my damaged ears I was having difficulty hearing what I was mixing and I needed a quieter reference playback to be able to objectively evaluate my own mix so I could learn how to compensate for my hearing issues at high SPL when attempting to interpret my live mix in real time.
The disadvantage of this arrangement is that such a monstrous microphone array doesn't fit in front of the band. It looks like some godforsaken aerial antenna from the 1970s hanging off a mike stand that threatens to tip over with a crash if some dancer bumps it, and it obstructs the view. I had to put it by the mixer out in the audience and that is so far away that the direct sound is greatly contaminated by the room acoustics.
Also, it's difficult to fit 6 large diaphragm condensers in close proximity to each other. The ability of that mid mike to discriminate and capture two channels from each side mike was limited by the physical constraint. Either the side mikes were shadowing each other with their ginormous bodies, or they were physically interfering with each other especially in their bulky shock mounts, or they were so far from the mid mike that they weren't doing a very good job of multiplexing the sound and the result was similar to a tweaked Blumlein rather than an XY or mid-side with coincident placement. Front-rear separation was minimal and nearly nonexistent in the treble where smaller misalignments in positioning create huge variation in phase.
It was a nice experiment though. Had a lot of fun with it.
There are some surround sound microphones with four capsules in a single tetrahedron with mikes at the four vertices and DSP to extract a surround sound channel configuration per the user's desire. I think this would have produced better results, but it is an expensive way to go. Field recording engineers who market environmental sounds like trains, traffic, birdsong, babbling brooks etc. use such microphones.
The issue with recording from the perspective of the audience is that the room acoustics of the venue contaminate the recording with a lot of ambient reflections and that is highly evident when playing it back. This is also the case even for outdoor performances where ground reflections, nearby buildings, and treble attenuation from the air are all factors. Plus if the speaker systems at the venue aren't up to snuff or tuned improperly the resulting capture reproduces all those flaws and that's not so great from the perspective of fidelity. The resulting sound when playing back in a home theater is that the sound isn't natural, but for headphones it can sound okay, sort of.
Putting the microphones very close to the sound source minimizes that contamination and allows the stereo or surround speaker system in the playback venue to better emulate the sound coming from the stage in the recording venue, but then the perspective being reproduced is from the performers rather than from the audience.
Adding in some mikes in the audience to the mikes on stage only works during the applause, and then it's still not great, so it's only turned up between songs and turned down during songs.
To best reproduce a 'you are there' recording from the perspective of the audience, it would be best to play back on headphones, or to have stereo or surround speakers right at the listener MLP during playback. That's not a home theater however. It's also not what this guy suggested when he said to turn the walls into speakers.
To pull this off successfully, it's more like putting the speakers into a set of open-back surround-sound headphones that encircle the ears with the captured wavefront of the original recording from the subjective position of a listener in the audience. Then the contamination of room reflections and room ambiance is minimized by maximizing the direct sound reaching the listener, and this should present the best approximation of 'being there' from my 10 channel mid-side mix.
Maybe some day I'll try it with a set of bookshelf speakers in a tight array around my head...could work in just about any listening room so that solves a lot of problems!
But it won't be a good playback system for traditional recordings because it will be lacking the critical ambient contribution of my listening room acoustics that we usually depend upon to approximately model the acoustics of the original venue. Also, with the speakers that close, there won't be much tactile from the deep bass and the room/sofa won't shake. It will lose a lot in translation from not having that physical listening room contributing to the illusion of 'being there'.