It seems generally accepted that a pair of microphones can suffice to capture a stereo recording intended for reproduction by a pair of loudspeakers
(or perhaps more credibly, with suitable provisions and processing, stereo headphones).
That sound pressure captured at 2 points in space should be acceptable for recreating an entire sound space
or the sound image from musicians in that space seems absurd to contemplate, but is generally accepted.
Mathematically, one would need to record sound pressure over the entire surface of the space to reproduce it.
Two capturing points are all that is needed for a distance to occur between them on the horizontal plane, and the distance to the sound sources will capture the distance that we hear as depth.
Those two distances will be enough to capture both width and depth and will together capture the three-dimensional space of the recorded venue. If we try to capture the space with even more capturing points, that's when the perceived distance information gets more diffused, and the reason why multi-mono recordings are losing that natural width and depth where we can otherwise pinpoint different sound sources in an ensemble to their natural place in the three-dimensional space.
The downside of a recording made with only two capturing points is unfortunately obvious, different types of instruments will not be fully captured at the same distance, and it’s pretty clear that microphones don't work like our sense of hearing.