It does: "the brain" would not average, as you put it, it is about different signal processing chains. As soon as a phantom source is identified, the coloration due to direction dependent ear signals is - spontaneously ignored. If the phantom source collapses, the coloration is perceived. So far the direct observation. The modelling of the effect in the second part of Theile's piece is speculative, sure, but not without reason.
I would go even further in saying, the identification of phantom sources hinders the detection of coloration, may it originate in the HRTF or in the speakers.
Anyway, as you say, phantom is colorful, does it turn to grey once downmixed?
We should foremost acknowledge, that evaluating a speaker is a very unique mode of listening. I'm quite rarely into that, don't know about you