Findings of researcher David Griesinger seem to support an argument for time coherence north of 500 Hz or so, and for freedom from early reflections. He studies the acoustics and psychoacoustics of concert venues and lecture halls, rather than home audio as such, but the principles he describes are arguably applicable to home audio too.
If I understand correctly, when the overtones all arrive at the same instant (rather than smeared over a short time interval), their energy combines into an instantaneous energy peak which increases the dynamic range (peak-to-average ratio) and therefore the effective signal-to-noise ratio. In the context of concert venues and lecture halls, sufficiently early reflections can degrade the phase response and therefore the time-domain relationship of these overtones. So whether the phase response is degraded by the loudspeakers, or by early reflections, or both, this is arguably detrimental to the effective signal-to-noise ratio and therefore detrimental to clarity and/or dynamic contrast.
Quoting David Griesinger: "The information content of speech is (almost) entirely in frequencies above 500Hz. For all people, even children, this means that information – at least the identity of vowels – is encoded in amplitude modulations of harmonics of a lower frequency tone. This method of encoding is universally used by almost any creature that wishes to communicate, from insects to elephants. Why? Because harmonics have a unique property –
they combine to make sharp peaks in the acoustic pressure at the frequency of the fundamental frequency that created them. It is the presence – or the absence – of these peaks that enables the perception of ‘near’ and ‘far’.
The sharp peaks also facilitate separating the signals from noise, and with the appropriate neural network the peaks from one sound source can be separated from another.
"The peaks only exist when the incoming signal consists of a tone with a definite pitch and lots of upper harmonics. Furthermore, the peaks only exist when there are two or more harmonics at the same time within one critical band." [emphasis mine; also, note that critical bands are approximately 1/3 octave wide.]
So the implications for home audio seem to be that a) the direct sound should be time-coherent north of about 500 Hz in order for the overtones to arrive simultaneously, resulting in these peaks; and b) early reflections which would be conflated with the direct sound north of 500 Hz are undesirable.
So, we might well ask, how early is "too early" for the onset of reflections?
David Griesinger again: "Transients are not corrupted by reflections if the room is large enough - and 10ms of reflections free time is enough."
Now at first glance it looks like Griesinger's advocacy of a 10-milliseconds reflections-free time interval after the arrival of the direct sound conflicts with
@Floyd Toole's observation that the in-room reflections actually improve intelligibility by providing the listener with multiple "looks" at complex sounds. And such may be the case! (I no longer have my copy of The Book, and am awaiting the 4th Edition.) But perhaps this increased intelligibility comes from those reflections arriving AFTER the 10-milliseconds interval Griesinger mentions. Hopefully someone who knows the answer will post and shed light on this.
(Elsewhere Griesinger mentions 700 Hz and 1000 Hz as being the lower end of the frequency region that matters most to the ears, giving me the impression that the octave between 500 Hz and 1 kHz is a somewhat fuzzy transition region.)