If humans were sensitive to this phenomenon, this would be a problem for stereo/multichannel sound in general, because the ITD of two separate spherical wavefronts arriving from two different directions would be different than for a single spherical wavefront arriving from a point between the two (or more) sources. Yet the phantom image still appears. If this is so, surely so must the phantom reflection?
I am thinking of these as two separate phenomena.
One of them is: how do we read our natural acoustic environment? I think it is akin to 'focusing', just as we do visually. There is a 'ray tracing' element to modelling it. We learn to interpret reflections and 'focus' on the source.
When we look at reflections in a mirror we are not focussing on the mirror, but are able to 'focus back' to the source. If we see a mirror it can be confusing, but with two eyes and physical movement we quickly establish what is the mirror and what is the source. Hearing is the same, I think: a reflection from a large surface still leads us back to the source (through the 'radius' phenomenon). Think of the listener's hearing as being a lens gathering 'rays' and looking to where they point back to. Like an array of correlated radio telescopes, the distance between our ears effectively sets the diameter of the notional 'lens', and physical head movement makes it bigger, increasing its resolving power.
Certainly, you could imagine the two-mic-with-movement system being used with a neural network to learn to navigate by sound alone - I believe our brains and ears have evolved/learned to do this in order to locate sounds sources in reflective environments. Underneath, it is creating possible models of the static environment based on phase, timing and volume differences and the more information it has, the less ambiguous the model. It may be expressed mathematically, but with a neural network you don't need to know any maths explicitly; you just 'train' the system.
A secondary point would be that the radius of the wavefront (assuming it's spherical, which is not necessarily the case anyway) could be altered by moving the secondary source. There's no rule stating that the primary and secondary sources have to be equidistant from the listener in the anechoic chamber.
Yes, I wasn't assuming any particular relationship between them nor that it would be a perfect sphere (I did mention earlier that the speaker will have directivity at different frequencies and that it is not a point, but is relatively 'point-like'). It's just that each configuration will differ between what it is trying to simulate and the reality that our brains have evolved to interpret.
The visual analogy for the second phenomenon, stereo, is , I think, the putting on of 3D goggles. It hijacks the human hearing system by doing something that cannot happen in nature: producing identical sounds from two separate sources - and
Blumlein stereo really does this: it creates inter-speaker
volume differences which, thanks to the crossfeed from each speaker to both ears, creates inter-aural time differences. To our hearing, I think it is more about time domain correlation than explicit phase relationships - I don't think the brain is using those as its primary cues with stereo.
The truly amazing thing is that if you model speaker-based stereo using simple correlation between what the ears would be picking up as the model of the human's hearing system, it produces a perfect 'auditory scene' that even stays stable with head movement. But this scene is static, like a 3D television image with special glasses: if you move around you are not going to gain extra information from 'parallax'.
The great gift for audiophiles is that we are able to fuse the static artificial scene with the natural acoustics of our environment.
Both artificial scene and room remain stable as we turn our heads and - to a limited extent - move around. It really is an amazing system; the fantastically compelling, detailed imaging from a good stereo system it is not just a wishful illusion. It may be an illusion in one sense, but it is 'real' in that it can be conjured up deliberately and repeatedly.