Our hearing process is quite involved and it takes a lot more than most realize to take the two inputs (those holes in the sides of our head) and convert it i into a completely "real" 3D image in our consciousness. In the scientific domain it takes a minimum of 4 observation points to define a 3D space.
Most audio folk know that horizontal localization is from the difference in time and amplitude that are the clues BUT there is a lot more. How about localizing a sounds height where what arrives at area ear is identical?
We hear height because the shape of our outer ear distorts the frequency response of what goes into the "holes" and does that according to what angle the sound is arriving from.
AND this "Pinna response" is also involved in the horizontal domain. This was discovered when tiny microphones were placed IN the ear canal very close to the ear drum.
My Friend Doug Jones was the one who programmed a tef machine to produce a "waterfall" plot of the mic response as a function source angle.
It was clear there were a pair of notches which moved in frequency according to what angle the sound was coming from.
It was this work that lead to the LEDR recording which employ those notches to trick the ear into hear a moving source way above the loudspeakers.
The cool part is, each of us has learned through a lifetime of experience how to not hear any of those wild changes in our ear's response vs angle and only interpret those as direction.
So where does this apply here?
Keep in mind this is one man's opinion based on what i have seen working in this area so take this with a grain of salt
Well Having anything that produces a comb in the time domains of our localization process may not show up in a single mic measurement (or can as fine combing) but we hear with two ears with fantastic spatial processing partly based on comb filtering.
A comb filter (one or more notches in the frequency response) is the result of the same signal arriving at two (slightly) different times and often thought to be inaudible.
Facing one speaker, If what arrives at your ears from one loudspeaker is identical like a simple small source on a flat baffle , then it will obviously sound where your looking. If you play a recording mixed or captured "far away" then it will sound far away...but somewhat less so for a speaker that radiates a complex radiation. Until stereo this property wouldn't have mattered.
The difference is because our hearing also involves our eyes and what we know. When you "blind test" the single speaker that is complex and i mean with your eyes closed, it will be easy to point where it is and also estimate how far away it is because THOSE clues tell you where the speaker is.
The simple radiation speaker has fewer or much fewer of those spatial clues and sounds more like far away or more up close if that was the recording. In
Stereo (what i am after), the speaker's complex radiation IS also part of the presentation, part of the stereo image and just like how you can hear how far away they are eyes closed, you always aware of them as the origin depending on the recording.
The simple radiation speaker (all SPL, responses equal etc) can drop way back in the phantom image, or you are even unaware of the loudspeaker as source because the phantom image is so strong.
Don't get me wrong, I am not saying this is the only thing rather this is just one factor that pertains to stereo image and multiple tweeters in general.
Hope that helps a little and like Ivan at work says "it depends"
Tom Danley
Danley Sound Labs