However, going back to my OP, my point is not about what is desirable. My point was actually far more abstract, seeking to spark a discussion about the goals of audio recording and reproduction, which are often stated to be something along the lines of: "recreation of the original performance/acoustic event."
...
I was trying to demonstrate why, on a philosophical level, this goal is not achievable when, in recordings, we have sounds radiating from multiple sources within a reverberant space captured at a single point in space (the mic), and then reproduced out of a single source (or sources) in a listening space.
If you need applied evidence of this, think of the way an orchestra is typically mic'd for concert hall recordings. Normally, we don't simply use a Blumlein pair placed a head's width apart in the best seat in the house, even though it is the acoustic at precisely this point that we are trying to create in the recording. Most engineers agree that this is not the best way to "capture" the desired spatial cues.
Instead, we place mics in all sorts of very different locations (depending on the engineer). So we are not maintaining the hall's spatial and acoustic cues, and there is little acoustically natural about the placement of our mics. Quite the opposite: we are using creative microphone placement in an attempt to artificially generate these cues in a way that merely seems to reproduce the effect of being in the best seat in the house when played back on typical loudspeakers.
I totally agree with this approach by the way. It's a way of compromising for the inadequacies inherent in actually using the true spatial cues a listener in the hall would experience at the live performance (i.e. the spatial cues present in the best seat in the house).
So I think that at best, a fairly convincing and enjoyable, but very much artificial, effect is achieved.
I have several issues. First, in attempting to "recreate the original performance/event", I think we agree that mics don't just hear performers or reflected energy. Rather they hear a sample of the sound field in space created by those, just as we do in the audience. But, yes, the mics have directional pickup patterns that might not accurately simulate the pickup patterns of our ears. Omni mics, for example, are more omni than each of our ears are. And, so forth, with directional mics that may restrict sounds detected more than our ears do. Binaural doesn't really solve this, and it imposes a fixed, dummy head HRTF on the sound.
Are these big problems? We know recording engineers are well aware of these and other mic characteristics, like non-linear frequency response. So, they artfully try to select the right mics for the purpose based on experience, testing and seat of the pants. Some may succeed to a greater extent than others, but we cannot really know for sure. We are lost in Toole's circle of confusion, only able to make a subjective judgement about the recording based on whatever vague criteria that are locked in our heads. Did it leave some sonic cues out? Did it include sonic stimuli that would not have been heard live? Only in rare egregious cases that somehow were not edited out can we tell for sure.
So, we are not off to a good start in terms of certainty. But, let's assume we can tell and be satisfied when we think the engineer has done a good job.
The other issue I tried to answer earlier is that I believe you are overfocused on the single points - the mic, the speaker - and you may be ignoring the value of the array of mics and speakers in recording and playback. We don't listen to single speakers conveying what a mono mike picked up much any more. We listen to the sound field an array of mics picked up from multiple points simultaneously (subject to placement time delays) when played back through a corresponding array of 2 or more speakers.
Phantom imaging between L and R speakers seems to me to do a pretty decent job of conveying whether a single performer was hard L, hard R or any point in between. If it's a string quartet, we get a pretty good idea where the 1st and 2nd violins, viola and cello are in the soundstage while playing simultaneously. Things may get a bit more congested and complicated with a large orchestra, especially massed strings, but solo passages always seem to put the performers in the right places. And, all this with 2 stereo speakers reproducing the output from 2, but usually more mics. And, if it is a good recording, the phantom imaging simultaneously conveys a sense of depth behind the plane of the front speakers.
So, the whole is more than the sum of the two speakers because that array can convey much more than just L or R. It can also convey points in between for many instruments playing at the same time, including also depth cues from delays, reflections, etc. due to the interactions between the stereo channels. And, this all works because it records and plays back a fairly complete sample of the sound field created by the event, rather than the individual performers. It is not reproducing the event. It is reproducing a sample of the sound field created by the event, extensively using phantom imaging.
Does it sound artificial? Well, yes, stereo now sounds somewhat artificial to me because I have heard a lot of discrete Mch reproduction that includes much more of the reflected hall acoustic that is an inseparable part of what I hear naturally live. I no longer wonder why recordings don't sound more like the real thing live. I now know why. I also don't much think that my music sounds artificial at all. But, I think stereo is a kind of more benign artificiality in that something is omitted rather than that something more noticeable or unnatural is being imposed.
Yes, I know this stuff seems almost trivially basic, and it's nothing really new to you or anyone else here. But, I am going to this elementary level because I don't think I am truly understanding what you are seeking or your logic.