I wonder if this is quite a bit more complex than a simple tradeoff between "sharp imaging" (my words) or "definition" versus wide soundstage (increased ASW) and listener envelopment (LEV). Perhaps there are aspects of:
Auditory envelopment (AE) related to ITD at low frequencies
@Thomas Lund ):
https://www.audiosciencereview.com/...bass-and-subwoofers.51589/page-4#post-1871225
Envelopment from 50-700 Hz (
@Thomas Lund ):
https://www.genelec.com/-/blog/how-to-analyse-frequency-and-temporal-responses
"Frequencies below 500Hz are primarily responsible for perceptions of Resonance, Envelopment, Warmth”:
http://www.davidgriesinger.com/ICA_2004 imbedded.pptx
"The special role of 500 Hz in binaural neural processing may not be entirely fortuitous. Because of the size of the human head, this frequency range corresponds to a minimum in coherence when the soundfield is isotropic":
https://web.pa.msu.edu/acoustics/koller.pdf
700 Hz middle ear high pass filter (
@j_j ):
https://www.aes-media.org/sections/pnw/ppt/jj/aes_apr2019_hearing099.pptx, possible implications at
https://www.audiosciencereview.com/...matter-in-audio-no.24026/page-30#post-1871687, more at
https://www.audiosciencereview.com/forum/index.php?threads/our-perception-of-audio.501/page-17
Localization from ITD/phase lock from bass frequencies up to about 1 kHz (
@j_j ):
https://www.audiosciencereview.com/...matter-in-audio-no.24026/page-30#post-1874667
With two ears, ITD is comparison of time relationships between waveform < 500 Hz, mix of waveform and envelope 500 Hz to 2 kHz, envelope leading edge >2 kHz. Leading edges in ERB emphasized relative to steady state due to onset of compression (part of Haas effect). ILD communicated to brain (
@j_j ):
https://www.aes-media.org/sections/pnw/ppt/jj/auditory_mechanisms_01_28_21.pptx
“Frequencies above 1kHz are primarily responsible for perceptions of Timbre, Clarity, Intelligibility, Distance”:
http://www.davidgriesinger.com/ICA_2004 imbedded.pptx
Transition from phase sensitivity to envelope onset around 1 kHz to 4 kHz (
@j_j ):
https://www.audiosciencereview.com/...matter-in-audio-no.24026/page-30#post-1874667
"As the frequency of a sound increases beyond 1000 Hz, there is a substantial degradation in the ability of the binaural system to make use of ITD in the waveform. Timing in the fine structure of a tone or noise ceases to be of value. Instead, listeners are able to make use of ITD in the envelope of sounds. If there is no structure in the envelope, as for a continuous sine tone, then listeners cannot localize. For noise, like the third-octave noises used in our experiments, the ITD in the temporal fluctuations can be used. Given the significance of envelope ITDs at mid and high frequencies, it would seem that the waveform cross correlation and waveform coherence, as measured in the experiments reported here, are less interesting than the cross-correlation and coherence of the envelope. However, the coherence of the waveform and of the envelope are statistically related":
https://web.pa.msu.edu/acoustics/koller.pdf
ATF/HRST boosts ~1 kHz for waves from behind, near 3 kHz from front (Above 4 kHz, outer ears and pinnae scatter significantly, quite individual-specific above 6 kHz mostly with valley-and-peak structure that shifts with frequency, peaking near 7 kHz with source directly overhead), ILD large and reliable above 3 kHz:
https://www.cogsci.msu.edu/DSS/2019-2020/Hartmann/Hartmann_1999.pdf
“The head masks sounds: this shadow effect reduces intensity, especially at higher frequencies [10]. For wavelengths shorter than head diameter, the head partially decreases acoustic energy by reflection and absorption. Thus, the lowest frequency at which the shadow effect occurs is approximately: fmin =
v max = 343 0.175 ≈ 1960 Hz…ILD is thus virtually zero below 1500 Hz, and becomes relevant for wavelengths shorter than head diameter (> 1500 Hz)":
https://www.sciencedirect.com/science/article/pii/S187972961830067X
“Sound stimulus frequency greatly affects the accuracy of localization [12,22,23], which is best for low frequencies (< 1000 Hz),poorest between 1000 and 3000 Hz, and intermediate for high frequencies (> 3000 Hz)…The accuracy of sound source localization thus depends on:•azimuthal position: better in front than to the side;•type of stimulus:◦ band width: the wider the band, the better the accuracy,◦ frequency: poorer between 1000 and 3000 Hz,◦ and speech or tonal type of sound":
https://www.sciencedirect.com/science/article/pii/S187972961830067X
Tones have flat envelopes, so no “first arrival” information. Many reflections off small surfaces at high frequencies [>2 kHz], which interfere with each other and ITD, so hard to localize (
@j_j ):"
https://www.aes-media.org/sections/pnw/ppt/jj/auditory_mechanisms_01_28_21.pptx
“Comb-like filters tuned to the fundamental period of an amplitude waveform can separate the formants of a particular speaker or instrument from other signals and from noise…The alignment of phases in the upper harmonics of tones is vital to source separation and localization…Recent papers from the field of speech comprehension have come to the same conclusions about the importance of the amplitude waveform of sounds with distinct pitch. They call the process “Source separation by periodicity"" (
http://www.davidgriesinger.com/Learning to Listen 14.pptx)
To me, as an uneducated layman, this suggests a number of possible implications:
What we perceive as a "large" (does that mean wide, deep, or both?) soundstage may actually be related to a number of factors, including Auditory Envelopment due to low bass reproduction to enhance perception of larger spaces (though this may be prone to distortion by the listening room), perception of envelopment due to frequencies below around 500-700 Hz, lateral reflections (possibly with a critical frequency range) as discussed by others with respect to ASW so not worth reiterating but also possibly second-image thresholds resulting in soundstage extending beyond the loudspeakers' separation), and perhaps additional effects related to listening environment (DR ratio to change perception of distance) and upper frequencies (higher frequencies muffled by air, harmonic phase alignment or envelop onset randomized by multiple reflections). The first factors listed above may related to various anecdotes about the soundstage "opening up" with subwoofers (sorry, I can't be bothered to provide examples). The latter ones may relate to the perceived effects of various diffusion products (see Ron Sauro's comments at
http://nwaalabs.ipower.com/Files/NWAA Labs/Diffusion, When phase and energy becomes more important than directivity in the perception of space 2017 NOLA.pdf and
https://www.stereophile.com/content/nwaa-labs-measurement-beyond-atomic-level, which I interpret as possibly resulting in the perception of more reflections having occurred, as in a larger room, plus the high frequency absorption that tends to occur with mathematical diffusion products).
What we perceive as "precise" soundstage may also be related to a number of factors, including localization as outlined above, DR ratio again, suppressing or delaying early reflections that may randomize phase or decorrelate envelope onset, but also our perception of what Griesinger calls "proximity," since he relates this to source separation and stream segregation, so if we're able to pick out and follow a specific musical "line" or "source" (whether a singer, instrument, or orchestral section). To give a different example, perhaps we're in a relatively crowded/noisy room but able to isolate a specific speaker a la the cocktail party effect. We may cognitively anchor this with visual input, i.e. seeing the speaker. With a musical recording, do we similarly anchor this ability to isolate the individual sources to which we're paying attention at one particular time with auditory localization cues? If so, perhaps different aspects of localization may be frequency and phase dependent, so maybe three ranges in order of relative importance: below 500-1000 Hz, above 2-3 (or 4?) kHz, and 0.5-1 to 2-3 khz. Add in room/modal distortion and the intrinsic first-pass filter around 700 Hz, I think there may be an interesting correlation with typical loudspeaker crossover frequency range choices.
Young-Ho