Well, ideally we would wish to provide with 100% fidelity what someone heard in another location. So yes the fully accurate signal sent to each ear. Binaural promises this except HRTFs and outer ear shapes vary plus mics don't fully capture directionality the way the ear would. Unless you put it inside an ear shape and everyone's ear is shaped differently enough it won't transfer to everyone well. Maybe one day omni mics employed on an earless dummy head will let us do DSP derived from our custom measured ear shapes to fix the problems with this. Next would be to compensate for head movement like some devices already manage.
Another approach would be to fully recreate a soundfield that occurred elsewhere. Then one would be immersed within the same soundfield as occurred during the recording. This appears to be impossible with only two speakers and not fully realized yet with more.
Stereo is still a neat trick that works better than it should considering the simplicity of it. Speaker makers have from time to time tried to create cross-talk cancelling methods. While headphone makers regularly try and introduce crosstalk in a way that lets headphones sound like speakers in a room. More reason to think maybe two channels just isn't sufficient or at the very least something important is missing or being overlooked. Maybe one day all this will be bypassed when we can do brainscans of the auditory experience of a listener and input those directly into our brains.
I agree that crosstalk cancelling schemes are problematic and to be really effective would need to be custom derived with head tracking etc.
But ultimately the question is whether we really want to reproduce the 'dry' recording at the listener's ears. If that is
really what it is all about, we can wear headphones. We can also apply custom filters to that, with head tracking if we like - use an IR sensor like the Oculus Rift uses, with its decidedly 'low end' price. Another approach might be to always listen to small speakers in the near field. Suddenly we should all be experiencing the highest of 'high end' audio for a few quid. But is that really the best sound?
I can't help but think that audiophiles are confused. They want their system to be part of a 'living room' that they can walk about in, drink beer, look at album covers. At the same time, due to the seductiveness of over-simplified frequency domain measurements, they think that the ultimate sound must be achieved if you can reproduce the dry recording's frequency response at the listener's ears, and so they end up with their 'living room' dominated by huge contraptions (speaker line arrays, bass traps etc.) or they obsess over DSP in order to 'correct' the room. They are ending up with the worst of all worlds.
I think that our 1970s and 80s forebears were right. Stereo sounds best when played from neutral speakers (and yes, there is some uncertainty over exactly what that means - a speaker's interaction between its own physical presence and the room means that it might need some adjustments dependent on location, best done by calculation not measurement) in an ordinary living room. DSP's role should be to give us the most neutral speaker possible.
But what about the bass? Surely that's different, and it's a nightmare? Not in my experience. Maybe if audiophiles' listening rooms were more like real living rooms, and they used the appropriate sealed speakers (with their less smeared output and naturally room-complementing roll-off), they could just relax a bit about all this stuff. The room is 'natural'; its time domain characteristics are absolutely consistent with its frequency domain characteristics, and the slightly different issue of the bass just isn't the problem people think it is - their view being 'informed' by how it looks in measurements and not just how it sounds.