Concert hall acoustics links and excerpts

youngho · Jan 18, 2024

Concert hall acoustics

This is a follow-up post to https://www.audiosciencereview.com/...-of-lokki-bech-toole-et-al.27540/#post-950580, focussing on concert hall acoustics and expanding significantly on previously shared resources for further learning. I’ve included links, relevant quotes, and some of my notes. I believe that there are some interesting areas of possible relevance with home stereo reproduction, especially of classical music.

Greisinger 2004
http://www.davidgriesinger.com/ICA_2004 imbedded.pptx
“Human hearing adapts to an acoustic environment over a period of 5 to 10 minutes”
“Frequencies above 1kHz are primarily responsible for perceptions of Timbre, Clarity, Intelligibility, Distance”
“Frequencies below 500Hz are primarily responsible for perceptions of Resonance, Envelopment, Warmth”

Dammerud 2009
https://www.akutek.info/Papers/JJD_Stage_acoustics_PhDthesis_96dpi.pdf
Stage Acoustics for Symphony Orchestras in Concert Halls
“Based on calculated C80 and G, early and late Strength, Ge [G0-80] and Gl [G80-∞], were also derived according to Equations 7.2 and 7.3. G was found by comparing measured levels to a reference microphone at 1 m distance from the source…The values of the listed acoustic measures were found as average (arithmetical) value within the three octave bands 500–2000 Hz.”

Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“Potential assessors were openly invited with an article published in a national magazine of classical music. In addition, invitations were sent to student orchestra mailing lists, as well as to students of musicology and music. Finally, 23 candidates (13 males), each of them with a musical background and between the ages of 19 and 75 years (average age of 35), participated in the listening tests.”
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound”
“The best correlation with average preference ratings of all assessors was found to be with subjective proximity…none of the standardized objective room acoustical parameters could explain the proximity and preference data.”
Concert hall attribute clusters

Loudness/envelopment/reverberance: spaciousness, width of sound image
Bassiness/proximity: warmth, intimacy, naturally close
Definition/clarity: separating sound, focus and localization

Listener concert hall preferences

Majority (10/17): “Close sound with a lot of bass, loudness, envelopment, and reverberance. The definition is very low, but subjective clarity is very diverse within these three halls.”
Minority (7/17): “They render the most intimate sound that contains enough bass and loudness. They have mild reverberance with well-defined sound.”

High correlation between overall preference and subjective proximity

Lee 2013
https://pure.hud.ac.uk/en/publicati...h-and-listener-envelopment-in-relation-to-sou
The distance dependent ASW was best predicted using objective measures called G E (early sound strength) while the LEV results were highly correlated with GL (late sound strength) and B/F ratio (Back/Front energy ratio of late sound). Such conventional measures as [1-IACCE], [1-IACCL] and LF did not agree with the perceived results.

Griesinger 2013
http://www.davidgriesinger.com/ICA2013/What is Clarity5.pptx
“Sound is detected in the inner ear with a continuous 1/3 octave filter. Speech information is encoded in the relative strength of critical bands in the frequency range of 800 to 4000Hz, with some consonants at higher frequencies.”
“Standard hearing models predict a pitch acuity limited by the 1/3 octave bandwidth of the basilar filters. But musicians and listeners hear pitch to an accuracy of one part in one thousand! Why? Licklider proposed that our acuity of hearing could be explained by an autocorrelator located as close as possible to the hair cells, explaining our sense of pitch, and the rules of harmony.”
“Envelopment requires separation of a sound into two distinct streams: a foreground stream and a background stream. When a foreground stream is not perceived, there is only one stream, perceived as reverberant but not surrounding. Vienna’s Musikverrein and Boston’s Symphony Hall set the world standard for envelopment, but in both halls reverberation comes from the front in distant seats.”
“The perception of Clarity fails three ways: 1. Solution: limit early reflections. 2. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
“Harmonics of pitched tones increase the signal to noise ratio by 12dB or more, and allow source separation. But these advantages depend on the phase alignment of the harmonics, and these phases are altered by acoustics. When phases are preserved at the onsets of sounds we get CLARITY – otherwise we get MUD.”

Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors

Loudness (Strength, Level, Intensity): The louder the better.
Immersion (Presence, Intimacy, Envelopment, Spaciousness): Engaging and enveloping sound is interesting and desired.
Spatial extent (Distance, Depth, Source width, Balance): Proximite and spatially balanced sound (no image shift) is good.
Definition (Clarity, Articulation, Blend, Discrimination, Sharpness): Different instruments should have sharp articulation with a nice blend.
Timbre (Openness, Brilliance, Balance, Warmth, Bassiness): Balanced frequency response with small emphasis on bass and enough high frequencies give open and brilliant sound.

Lokki 2013
https://users.aalto.fi/~ktlokki/Publs/patynen_2013_JAS000842.pdf
Figure 3 illustrates time-frequency representation, which is development of frequency response at differing time windows. What I note is that the overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz. Concert halls like https://www.nagata-i.com/portfolio/suntory-hall/ and https://www.nagata-i.com/portfolio/sapporo-concert-hall-kitara/ (thanks, REG) show decreased reverberation times at these higher frequencies.

Lokki 2014
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
“Sensory evaluation methods, borrowed from the food and wine industry, are useful for studying concert-hall acoustics because they can extract information often hidden behind preference judgments. With such methods, in particular those based on individually elicited attributes, one can develop sensory profiles of concert halls or of seats inside one concert hall. Preference judgments might give an overall average picture, but the variance in the data is typically large due to the assessors’ personal tastes and previous experiences. Sensory evaluation methods provide a link between those subjective preferences and perceptual characteristics.”

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“Twenty eight assessors (14 males, 14 females; ages between 22 and 64 yr with average age of 39.6) were recruited for this study. We gathered the participants with a web-based questionnaire and we particularly looked for people who often go to live concerts. The selected assessors were 10 professional musicians (no specific genre), 10 active amateur musicians, and eight active concert goers with varying musical background. Thus, they could all be considered as experts in listening to music, even though not expert assessors nor listening test participants.”
“The main results show that listeners can be categorized into two different preference classes.”
Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
“The halls were all measured unoccupied resulting in more reverberant conditions than in situ at occupied conditions, which is often the reality in this renowned halls. Although the assessors described the sound samples being very natural and realistic, it is clear that unoccupied conditions might introduce a bias to the presented results, as the difference between occupied and unoccupied conditions varies between halls…it might be that the mental reference for assessors is classical music recordings, which always have high clarity and less reverberation than in reality in situ, or in our auralizations.”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
“When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012). The early reflections are perceptually grouped with the source streams through the precedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence. At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lock et al.2015).

Beranek 2016
https://pubs.aip.org/asa/jasa/article/139/4/1548/662531/Concert-hall-acoustics-Recent-findingsa
“It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between) namely, those who like the sound best in the front two-thirds of the main floor and those who prefer the sound in the upper rear second balcony. The difference is readily apparent to anyone by listening to the first half of an orchestral concert on the main floor and the second half in the rear second balcony. On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful. In the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound. The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
“Source Presence is that sound which reaches the listener before the reverberation becomes appreciable. It usually includes the direct sound and early reflections up to about 100 ms after arrival of the direct sound. Room Presence deals with the reverberant sound field that follows…[Haapaniemi and Lokki] found that each of the 8 well-known halls could be identified better by source presence than by room presence..“
“P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
“Because of the size of the head, when the sound arrives from a lateral direction, sideways, the intensity at the closer ear of a listener, in the 2000 to 8000 Hz region, is 1 to 5 dB greater than that when the sound arrives only from the front. With reflections from both sides of a shoebox hall this difference will occur at both ears. Hence, in a shoebox hall, the difference in the intensities between (pp) and (ff) that arrives at the ears between 2000 and 8000 Hz is 8 to 12 dB greater [7 dB þ (1 to 5 dB)]”

Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics

Loudness, volume, level, strength, body, dynamic range
Intimacy, proximity, source presence
Spatial impression, width, envelopment
Reverberance, liveness, fullness
Clarity, definition, articulation, sharpness, localization
Timbre, warmth, spectral balance

Griesinger 2018
https://www.akutek.info/Mitt Bibliotek/IOA Auditorium Acoustics Hamburg 2018/Additional/papers/p35.pdf
“We have developed a method intended to predict whether a source will be localizable or not from a binaural impulse response. We called it LOC, which is short for Localization. LOC was developed from a data set obtained using male speech with pauses between each word. We modeled a room with two loudspeakers placed at +-20 degrees in front of the listener in a space with two different reverberation times and four different pre-delays.”
“Frequencies from 1000Hz to 4000Hz are principally responsible for the data we obtained. We developed a formula for LOC based on the idea that when the integrated loudness of the direct sound in that frequency range is stronger than the integrated loudness of the reflections in the first 100ms of the onset of a speech syllable, then the whole syllable can be localized and perceived as close to the listener. A very important aspect of the formula is that the “loudness” of the reflections is proportional to the integrated logarithm of the sound and not the integrated sound energy or pressure.”
“LOC has been tested in several spaces, mostly large enough to hold 300 to 1500 seats. We typically used a loudspeaker array similar to Tapio Lokki’s on stage playing his anechoic recordings of an electronic string quartet with a soprano voice playing a Mozart aria. We recorded impulse responses with my personal dummy head and a three dimensional microphone from each speaker array. We also listened to music playing through the array to find the LLD in various parts of the venues. Values of LOC above 3dB typically predicted good localization of the ensemble. Values below 2dB predicted poor localization. In most of these tests the reflections were more or less equal on both sides.“
“We also tested LOC with a data set from Boston Symphony Hall. All but one of the seats had LOC values of +3dB or more, and sounded good both on headphones and in person with a live orchestra. One seat had a strong reflection from the right side wall. LOC at that seat was below 2dB in the right ear and above 5dB in the left ear. The instruments were not localizable and the sound in that seat was muddy. Deleting the strong sidewall reflection from the impulse responses raised LOC in the right ear to +5dB. The sound improved greatly. We learned that to determine the ability to localize instruments in a particular seat we need to look at the minimum value of LOC from the two ears, not the maximum or average value. “
“In Boston symphony hall the critical distance, where the D/R is one, is only about 17 feet from an omnidirectional source. On the floor the best seats can be forty to fifty feet from the stage, and the great seats in the front of the first balcony are more than 110 feet from the stage. “
“Violins start gradually. LOC was devised assuming the sounds we are trying to localize abruptly rise to full level. Violins can start notes abruptly, but usually they do not. Sound onsets can take 50 milliseconds or more to reach full level. By the time the note reaches full level reflections in a small room have substantial energy. “
“The direct to reverberant ratio needed to localize the violins is uniformly greater than for male speech. The data suggests that the major reason is the slower onset of each note. Reverberation with the one second RT builds up rapidly, and quickly masks the slow onset of the direct sound from the violin. A higher D/R is needed to overcome the masking.”
“With the two second reverberation and short values of pre-delay the D/R needed for localization for the violins is nearly the same as it is for the one second reverberation, but as pre-delay increases violins become easier to localize. The slope of the improvement in localizability is steeper than for male speech. In the theory behind the development of LOC the slope of the improvement with predelay is governed by the length of the comb filter or auto correlator that separates the direct sound from later reverberation. The data for violins suggests that the length of this filter is shorter for harmonics of violins than it is for male speech. This result was expected. High frequency fundamentals do not need as long a filter to obtain the same accuracy as lower frequency fundamentals.“
"t is well known that a prompt early reflection can augment speech and musical instruments. Churches have put a wall and a ceiling behind and over pulpits for centuries. Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better. But it has to be the very last row. The next to last row is no better than the others. In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work. We did some experiments using our virtual room and male speech. The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre. The current version of LOC appeared to work for this case. A reflection 6ms after the direct sound did not affect the localizability of the sound, and was beginning to alter the timbre. A reflection at 7ms reduced the ability to localize, and added an unpleasant timbre. We tried the same experiment with female speech and got the same result. We do not know why this effect occurs, but it is the same for both male and female speech."
“As a result of these experiments we added a 2ms cross fade centered at 6ms to the calculation for LOC.”

Griesinger 2018
http://www.davidgriesinger.com/Localization, Loudness and Proximity 8.pptx
“The ability to localize a sound depends on detecting the direct sound, which alone carries the ITD and ILD…enables localization and proximity. If a sound is perceived as close our attention is involuntarily drawn to it.”
“Aarabi et al…finds that there is a distinct amount of phase randomization that causes separation of sounds to abruptly fail.”
“Successfully separated sources form multiple “foreground” streams…sounds that cannot be localized or separated form a single stream with special properties; the “background stream. The background stream is perceived as distinctly separate from the foreground streams, and is often perceived as fully surrounding. When there are no localized foreground streams all the sound we hear is perceived as a single stream.”
“If we want to know if the ear distinctly hears the direct sound we need to plot the integrated LOUDNESS of direct sound versus the integrated LOUDNESS of the reflections. I say loudness because the ear responds to the logarithm of sound pressure, not the sound power. Our measure for localizing the direct sound, LOC, separately plots the loudness of frequencies above 1000Hz from the direct sound and the build up of the loudness of the reflections inside a 100ms window. The loudness – the integrated neural activity – of the direct sound is given by the area under the blue line inside the black window. Similarly, the loudness of the reflections is given by the area under the red curve inside the black window. LOC is the ratio of the direct sound area to the reflection area in decibels.”
“The experiments showed that early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize. Reflections at 6ms do not increase localization and begin to alter timbre. Reflections at 7ms and later add to the reverberant loudness, and are detrimental both to timbre and localization.”
“We have found that the ability to sharply localize sources in halls and rooms depends on many factors besides the timing and strength of the early reflections. The speed of the attack of each new sound matters a lot…the reverberation time matters, as does the score and tempo of the music. Short notes excite reverberation very little, and tend not to be masked by previous notes. But strong held notes build up the reverberation, and can sometimes mask everything.”

Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“I believe acoustic research has overlooked three critical concepts:
1. Pitch: Why are humans able to distinguish pitch to an accuracy of six parts in 10,000?
(Because pitch (periodicity) enables us to separate pitched signals from noise.)
2. Phase: It is commonly thought that phase is inaudible above 1500Hz.
(False! The phases of the upper harmonics of tones are critical to proximity and source separation.)
3. Attention: Acoustic quality is measured with intelligibility. Attention is more important.
(Sounds that are proximate involuntarily attract attention.)”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies. A had a tiny oscilloscope with a one inch CRT displaying the left/right phase. It was easy to see if the sound was too monaural. But loudspeakers must be carefully placed if spatial bass is to be heard. The arrangement along the long wall shown in the previous slide can often work in small spaces.“
“The alignment of phases in the upper harmonics of tones is vital to source separation and localization. Harmonic tones are created by pulses of air, the release of a rosined string, or strikes of a hammer. The amplitude spikes created by these pulses cut through noise and reverberation…early reflections randomize the phase of these pulses. The consequences are dramatic. ”

Hochgraf 2019
https://acousticstoday.org/the-art-...ons-in-research-and-design-kelsey-a-hochgraf/
The Art of Concert Hall Acoustics: Current Trends and Questions in Research and Design
“Although there is not yet a consensus around specific attributes most correlated with audience listener preference, there is agreement that different people prioritize different elements of the acoustical experience. Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound “
Auditory stream segregation: source and room response
“There is growing consensus among acousticians that although many of these [numerical parameters standardized by ISO 3382-1:2009] are useful, they do not provide a complete representation of concert hall acoustics...the limitations are largely attributable to differences between an omnidirectional sound source and an orchestra and between omnidirectional microphones and the human hearing system.”
“The importance of lateral reflections for spatial impression is well-understood…but more recent research has shown that these reflections are also critical to the perception of dynamic responsiveness…increased perception of dynamic range has also been shown to correlate with increased emotional response...the perception threshold for lateral reflections decreases with increasing sound level, meaning that more lateral reflections will be perceived by the listener as the music crescendos, further heightening the sense of dynamic responsiveness.”
“[T]he tide seems to be shifting away from high surface diffusivity and there is more evidence to substantiate the need for strong lateral reflections,“

Lokki 2019
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””
“It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
“Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”

Vigeant 2019
https://sites.psu.edu/spral/files/2...Hall-Preference-ISRA-Amsterdam-2019-FINAL.pdf
“Sixteen subjects participated in the experiment, all meeting minimum hearing thresholds of 15 dB-HL in the octave bands from 250 –8000 Hz. Subjects were required to have at least five years of formal musical training and were required to be actively studying their instrument or involved in a musical ensemble. The subject pool included 11 males and 5 females with an average age of 24 years. The average musical experience across all subjects was 14 years.”
"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” (0.66-0.74)
“Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87).”
“Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)”
“A set of four orthogonal factors were shown to explain 72% of the total variance in perception, which were interpreted as clarity, strength / envelopment, strength / source width, and brilliance.“

Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“Pitch (periodicity) enables us to separate pitched signals from noise…the phases of the upper harmonics of tones are critical to proximity and source separation…sounds that are proximate involuntarily attract attention.”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies…the arrangement along the long wall shown in the previous slide can often work in small spaces.”
“Comb-like filters tuned to the fundamental period of an amplitude waveform can separate the formants of a particular speaker or instrument from other signals and from noise…The alignment of phases in the upper harmonics of tones is vital to source separation and localization…Recent papers from the field of speech comprehension have come to the same conclusions about the importance of the amplitude waveform of sounds with distinct pitch. They call the process “Source separation by periodicity.””
“For example: Manfred Schroeder began to study concert halls with binaural technology in 1974. He used a dummy head microphone for recording, and crosstalk cancellation for playback. With some modifications these techniques can work well. But Schroeder made a serious error, and it was contagious. He reproduced a whole orchestra with just two speakers. Mixing an orchestra into just two channels eliminates the phase information that gives rise to the perception of proximity. Edison, Dick Campbell, Kimio Hamasaki, Tapio Lokki and the author all find that to make a believable orchestral sound you must use a separate loudspeaker for each instrument! When you hear an orchestral recording from just two speakers in a hall it sounds artificial. But if you listen beyond the LLD the sound blends together, and is a bit more believable. This is the sound of a poor seat, not a good one.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. There was no phase coherence on any instrument. They found adding lateral reflections widened the image. They used the term “Apparent Source Width” (ASW) to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. But ASW reduces localization, eliminates proximity, and decreases envelopment. It is the sound in a poor seat, not in a good one.”
For some reason, the last three sentences remind me of @Karl-Heinz Fink's post: https://www.audiosciencereview.com/...n-without-subwoofer.48559/page-9#post-1743584

Griesinger 2020 (and 2018)
http://www.davidgriesinger.com/The_Physics_of_auditory_proximity.pptx
https://www.aes.org/tmpFiles/elib/20240113/18463.pdf
“Aarabi showed that phase randomization can dramatically increase errors in word recognition in noisy environments…randomization of phase decreases what we now call “Proximity”, and lack of proximity decreases attention and recall.”
“We have been calling the proximity perception “Engagement”, “Clarity”, or “Presence” since 2004. We have a measure for it we call “LOC”.”
“Proximity – the perception that a source is acoustically close - is an important determinant of attention and recall.”
“The ear detects proximity through the phase coherence of upper harmonics in the direct sound, which are randomized by early reflections.”
Proximity may be measured by LOC, which compares sum in dB of direct sound and reflections in 80 ms window, +3dB predicts good proximity
“In most halls proximity disappears over a distance of one meter at a particular distance from the sound sources. We call this the “Limit of Localization Distance” or LLD“
“STI , the speech transmission index, is a standard way of measuring intelligibility in halls and classrooms. We calculated both STI and our measure for proximity, LOC, in a 27’x25’x10’ room with surface absorption of 0.15 and RT 0.34s…a small room with a 0.34 second RT has excellent STI – but the teacher is heard with proximity only in the front seat…this model classroom is greatly improved by increasing the average absorption to 0.3 …The intelligibility measured by STI also increases. A small room needs a lot of absorption if proximity is to be high in all the seats.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. The orchestra recording had no proximity. Many later experiments repeated this arrangement. They all found that adding lateral reflections widened the image. They used the term “Apparent Source Width” or ASW to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. ASW by definition eliminates sharp localization and decreases proximity.”
“In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time. “

Lokki 2020
https://research.aalto.fi/files/75640242/SCI_Lokki_Auditory_spatial_impression_in_concert_halls.pdf
“Direct sounds and adjacent scattering, i.e. the initial 5 ms of the acoustic response arrives from each source on the stage in frontal directions. In a shoebox hall the stage floor is typically on the ear level of the audience at main parterre, thus the listener does not receive the stage floor reflection…The frequency responses illustrate that in the shoe-box hall the direct sounds lack the low frequencies, but have considerably strong high frequencies. In contrast, in the vineyard hall with a raked audience area, the frequency response of the first 5 ms is quite different due to stage floor reflections.”
“Early reflections until 30 ms...increase the overall loudness, color the sound and might change the perceived width of the source. As said, the temporal envelope preserving lateral reflections integrate to the direct sound best, increasing its quality and preserving the ability to localize. If the reflections are scrambling the phases of upper harmonics, i.e. reflections from heavily diffusing surfaces, the precedence effect might partially break down and such early reflections are not fully integrated to the direct sound (Lokki et al., 2011). Such reflections might increase the perceived width of the source to the detriment of less defined location of the source. As a result the instruments better blend together, but some listeners associate that to reduced clarity.…the shoe-box hall provides prominent lateral reflections already inside this 30 ms time window…the early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level.”
“Later reflections between 30 and 200 ms increase the overall sound energy…In the shoe-box hall, the increase is particularly strong above 200 Hz, equalizing the frequency response to be more or less at at 200 ms after the direct sound. Moreover, the energy in this time window reaches the measurement position almost evenly from all directions”
“Reverberation beyond 200 ms increases the cumulative energy to its final state…contributes to loudness, envelopment, spaciousness, and timbre…Notable differences between these halls can be observed in the smoothness of the overall frequency responses, level of low frequencies, and spatial distribution of sound energy.”

Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”

Lokki 2024
https://acris.aalto.fi/ws/portalfiles/portal/118035577/388_1_10.0020066.pdf
“Categorization of listeners by preference confirmed the emergence of two groups, where listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound and listeners in the other group [5/20] prefer clarity. This aligns with earlier studies of large symphony halls.10 Thus, there do not seem to be fundamental differences between the perception of chamber music halls and symphony halls in this regard.”

My notes in italics

>= Two broad relative preference groups
Lokki 2012: “Here, it was found that assessors can be grouped to two preference groups. Similar grouping has been found also earlier by Schroeder et al.,9 who found similar preference groups related to loud sound and clear sound. Barron4 divided assessors into groups by intimacy and reverberance. There, results also correlate with the results presented here; one group preferred clear and intimate sound and another group preferred loud, enveloping, and reverberant sound.”
Beranek 2016: ““It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between)”
Hochgraf 2019: “Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound“
Vigeant 2019: “Traditionally, preference has been divided into two groups: one group preferring strength and reverberance, and the other preferring clarity and intimacy….although subjects can be placed in two discrete groups, it removes the subtlety and individual variability in the data…large individual differences were observed, not captured fully in the traditional two-group preference model.”
Similar preference groups in stereo listening?

Group 1 : REW (reverberate, envelopment, width)
Lokki 2012: Majority (10/17) prefer “bass, loudness, envelopment, and reverberance”
Lokki 2016: Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
Beranek 2016: “those who prefer the sound in the upper rear second balcony…in the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound.
Vigilant 2019: ““Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87)” and reverberance (=0.72)
Lokki 2024 “Listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound…Envelopment correlates mainly with late lateral sound level LJ and strength G. Proximity correlates positively with mid-frequency G and C80 and negatively with low-frequency EDT and T20, i.e., strong and clear sound lead to more proximate sound. Finally, the attribute warmth has high correlation with wide-band G and LJ, but not with low frequency G.”
May correlate with listeners like Toole who prefer ASW?

Group 2: CD (clarity, definition)
Lokki 2012 Minority (7/17) prefer “intimate and close sound with high definition and clear sound…mild reverberance with well-defined sound.”
Lokki 2016: Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
C80 is energy ratio before and after 80 ms, measured in dB. C80 frequencies with highest correlations were ~500 Hz to 4 kHz
Griesinger 2013: ““The perception of Clarity fails three ways: 1. Solution: limit early reflections. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
Beranek 2016 “those who like the sound best in the front two-thirds of the main floor… On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful…The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
Vigeant: “"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” 0.66-0.74”
Lokki 2024: “listeners in the other group [5/20] prefer clarity…Preference scores in the smaller preference group 2 correlate positively with C80, and negatively with LJ, i.e., almost the same correlation as for the attribute clarity. ”
May be similar to audio professionals’ sensitivity to lateral reflections in small room audio?

Localization
May be measured through LOC, which focuses primarily on upper midrange 1-4 kHz, basically comparing integrated loudness of direct sound with integrated loudness (logarithmic) of reflections in first 100 ms. This was originally developed for speech but had to be modified for violin, due to usually slower sound onset. Similarly, C50 used for speech, C80 for music. The frequency range of C80 with highest correlation to perception of clarity or definition was 500 Hz to 4 kHz.
LLD or limit of localization distance, where localization and proximity fail, can significantly exceeds critical distance where direct:reverberant sound ratio is 1, as in Boston Symphony Hall
Griesinger argues phase preservation of harmonic coherence (seemingly above 1 kHz) in direct sound (up to 5 ms, “early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize,” similar time frame for Lokki’s direct response and adjacent scattering) important for pitch separation and localization, Lokki also notes potential negative effects of scrambling phases of upper harmonics on definition/localization/clarity
Griesinger 2018: “t is well known that a prompt early reflection can augment speech and musical instruments…Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better…In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work…The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre…”
Joachim Gerhard (previously of Audio Physic) had proposed listener placement <1m of the wall behind him/jher, arguing that “From experience that this reflection is not so objectionable for phantom image perception” (see future listening room acoustics link)
Griesinger’s description of sound onset or attack is like @j_j's “leading edge” of envelope in psychoacoustics, as phase randomization (which damages localization and the perception of proximity) is like uncorrelated envelopes and edges (JJ notes that decorrelation of leading envelope edges contributes to distance perception—see future psychoacoustics link).

Proximity
Lokki 2012: The main overall preference driver in this study was an attribute cluster interpreted as Proximity (related to distance) [also depth and intimacy], which correlates highly with the average of all preference ratings.
Vigeant 2019: “Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)” [close vs far]

Frequency range and dynamics
Lokki 2013: Figure 3 shows overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz
Lokki 2016: “When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”
Beranek 2016: “P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
Lokki 2019: ““It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
Lokki 2020: “The early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level”
Given change in perceived frequency response over time, along with slower onset of sound in music, phase at lower frequencies may be less important to maintain relative to the rest of the frequency spectrum.

Floor and ceiling reflections
Lokki 2016: ““The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
Lokki 2019: ““When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections.“
Lokki 2020: “If such a reflection is coming from the median plane, i.e. from ceiling or reflectors above an orchestra, the sound quality might be reduced due to coloration, which is the same in both ears. Moreover, such ceiling reflection might increase the interaural correlation, which could increase the perceived distance of the source.”
No floor bounce per se in shoebox halls, but ceiling reflections may lower perception of proximity

Rear wall (behind the orchestra) reflections
Lokki 2019: “Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”
Griesinger 2020: “In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time.”
Rear wall reflections may reduce perception of proximity

Trdat · Jan 18, 2024

Griesenger's research is hard to interpret for an amatuer but there defnitely is a relevance for it in small room acoustics.

MaxwellsEq · Jan 18, 2024

Wow! That's a long read, but absolutely fascinating! Thank you @youngho

youngho · Jan 18, 2024

I added the quote from Griesinger 2018 regarding sitting close (<=2.5 feet, resulting in <=~5 ms delay in reflection) to the rear wall (behind the listener) for rooms with poor proximity and localization, since there's an interesting confluence with Joachim Gerhard's setup recommendation positioning the listening close (0.4-0.1m) to the wall behind (https://www.researchgate.net/public...ENT_FOR_OPTIMISED_PHANTOM_SOURCE_REPRODUCTION)

tmuikku · Jan 18, 2024

Oh jeah thanks posting it all!

While all these studies are for bigger rooms, the proximity stuff is in the auditory system which we carry in small rooms as well. We always carry it with us, live venues, classrooms, anywhere, and is relatively common between all of us regardless of rooms or loudspeakers. Important bit is that this perceptual effect that comes with neural stream separation seems to be great part of sound perception in home stereo setups as well. At least there seems to be similar, or same, perceptual effect that has been audible in all small rooms and speakers I've experimented and I've been trying to preach it on the forums for some time now. I'd speculate anyone can perceive it with their setups.

Everyone should be aware of this neural stream separation, or "audible critical distance", Griesinger LLD, where perception changes quite dramatically at some particular distance from speakers as brain gets attention to the direct sound, clarity happens and room sound is kind of suppressed from perception. Not being aware about this "transition", it would be impossible to come into consensus on many many threads for example. There are threads where people are talking about same thing but not understanding each other because they do not know if this difference exists, and which side they are and which side others are, making the discussion just noise. Knowing it the discussion would turn into understanding most could relate to, utilizing the transition as common denominator.

Some examples of confusion: from room acoustics "treat the first reflections" but if the transition didn't shift further into room and behind the listening spot there would be hardly any difference in sound, at least not as stark as getting bit closer up. What if I told you could have got better sound just by shrinking listening triangle size without touching acoustics. Or, hearing horn speakers "Oh they sound lovely!" not knowing one is listening at proximity now, since increased directivity could have moved the transition behind your listening spot, far enough into the room, and that your speakers in your room could provide similar sound if you get them close enough. What if you swap playback electronics every year spending thousands never realizing the sound you are looking for is in front of your nose, intimate, involving sound. Your brain just needs to latch to it and your room reflections prevent it to as you sit too far away considering the circumstances.

Well, few simplified examples how much of a deal the transition is in my view. It's crazy that this stuff is not mentioned in most discussion or speaker reviews. I went to a local Hifi expo and there was literally one or two rooms where any of the seats were closer than the transition as if no one knew about this stuff, at least half should have had seats closer up, right?

Anyway, like in the quotes in opening post some people seem to like "far sound" and some the "close sound" with proper localization and clarity, and that's fine. Looking pictures of home hifi setups it looks to me most people have the far sound and thats fine. Key would be to be aware that there is these two distinct sounds available (in most if not all situations I'd speculate), and if people knew how to get to those reliably a lot of confusion with discussion would be gone. Also, anyone could now arrange their setups if they want to, if they notice they like the close sound better. After all, it seems to be quite easy to hear the transition, at least in the few systems I've got access to and friends I've used test subjects. I encourage everyone to try and find at which distance from your speakers your brain gets attention and where not, when clarity of phantom center happens. Hear difference?

Put your listening spot there about and you can enjoy both sounds, just lean back or forward.

ps. the confusion is due to all the details, like which early reflection should be at which delay and at which angle, or what speaker directivity should work better in which situation. It all is nicely lumped to the one perceptual phenomenon, brain locks in or not and there is no necessity to know any of the details, it's easy just to listen to, takes few seconds to move closer and further from speakers to find it. I mean, in my opinion all the details don't make any sense and are hard to understand what they actually sound like until you know the transition, which kind of encompasses most of it and really is quite dramatic change in perception. Now that you are aware of the transition then you can use it to A/B listen various details with your setup, even listen singular reflections in a way, listen how toe-in changes things, many many tests using your brain as switch to have two perspectives into sound. All of this dramatically increases listening skill, helps to connect written concepts to your perception of your system in your room.

youngho · Jan 18, 2024

Trdat said:
Griesenger's research is hard to interpret for an amatuer but there defnitely is a relevance for it in small room acoustics.

tmuikku said:
While all these studies are for bigger rooms, the proximity stuff is in the auditory system which we carry in small rooms as well...

There are threads where people are talking about same thing but not understanding each other because they do not know if this difference exists, and which side they are and which side others are, making the discussion just noise. Knowing it the discussion would turn into understanding most could relate to, utilizing the transition as common denominator.

I found some interesting ?confluence/similarities/analogies (obviously, there are huge divergences, as well, especially in terms of scale/magnitude when it comes to time, level/magnitude, etc) between the research above and listening room acoustics. Indeed, in the second edition of Sound Reproduction, Toole had written "The concepts of ASW—image broadening, early spatial impression, spaciousness, and envelopment—evolved within the concert hall context. The challenge is to transfer, or translate, these into the context of small rooms fitted with multichannel audio systems." However, as you could see above, some would argue that the previous concepts of ASW were fundamentally flawed, so given this newer research, how could we potentially translate the newer research from concert halls in terms of perception and preference to small rooms?

Looking at Lokki's wheel of concert acoustics 2018, which I simplified a little, I can see how individual preference could be described along more than the two parameters of REW vs CD, then translated to stereo reproduction in small rooms with some interesting implications (I outlined my initial thoughts along these lines in https://www.audiosciencereview.com/...-of-lokki-bech-toole-et-al.27540/#post-950580). I'll also include more from Griesinger later on bass and subwoofers.

tmuikku · Jan 18, 2024

Oh yeah, there is a lot to it. See old Toole paper called "LOUDSPEAKERS AND ROOMS FOR STEREOPHONIC SOUND REPRODUCTION" which is unfortunately behind AES pay wall https://www.aes.org/e-lib/browse.cfm?elib=5430 . The paper kind of describes the whole thing, and then leaves it as mystery. I'm not too familiar with all the research, perhaps Toole has added more to this later, but with my limited exposure to papers in general the issue has been left floating in air as almost no one is talking about this. From perspective of me as hobbyist consuming forums mostly, the industry never paid attention to it that much until the stuff you have quoted. We have near field speakers and hifi speakers and a lot of confusion in discussion in the forums at least. I could be wrong of course, but bringing this all up here for everyone to react on in order to find other people expertise and opinion as I haven't found it by searching.

It is a well known stuff that there is "direct field" sound to speakers and this is what is utilized in studios and referenced as "nearfield monitoring". Then there is the "far field" listening, which typically is the home hifi which Tooles studies seem to concentrate mostly on, and studios have another set of speakers for. But why make the distinction at all, as some recordings sound better on either people should know about this stuff in order to take advantage of it, right? In studios it makes sense to have two sets of speakers because the operator needs to be static by their console, but at home the listener can and will move.

So, I think what is missing from the discussion is awareness of "nearfield" and "farfield" sound and why and how they differ, and what are the implications, and that it is quite easy to switch between at will. I speculate anyone can detect transition between the two at their homes and start reasoning about stuff and get better common understanding on written concepts such as envelopment, or DI, what sound they like, how they relate and how to position their own setup in their own room in order to have a chance to hear envelopment. The newer studies help with this, they work as a map so to speak, they write what you should perceive and what you should not on each side of transition, and that the transition happens due to our auditory system.

The auditory effect seems to be either/or perceptually as Griesinger says, either there is stream separation or not, but it doesn't have to be either/or situation for the person setting up a stereo system or listening to it as Toole refers in the other paper. Toole refers a "moving target" that some people like either sound, some recordings are made optimizing for either starting from microphone technique and I bet to monitoring as well, and that there is likely no single solution that could give both sounds to all people at once, no universal speaker or room in this sense that could satisfy both people simultaneously and for all recordings. And he is missing the point in my opinion: there doesn't have to be one solution for all because, everyone can choose at will just by changing listening distance a bit, lean back/forward, or move the chair, take a step. Both can sound fabulous at any given time, and both very likely available with most speakers and rooms. At least I speculate so

I see implication to room acoustics and speaker system design that all one has to do is to optimize for both, and then just choose per record / mood / preference which mode to switch the auditory system for by moving a little. If one never likes the other side of transition, then by all means optimize for what one likes.

ps. I've attached few snippets from the paper, hopefully it's fine.

youngho · Jan 18, 2024

tmuikku said:
So, I think what is missing from the discussion is awareness of "nearfield" and "farfield" sound and why and how they differ, and what are the implications,

This was a good discussion: https://audiosciencereview.com/foru...-and-far-field-definitions.23841/#post-801349

Toole also discusses this--from the second edition of Sound Reproduction: "The sound level correspondingly falls rapidly, at a rate of −6 dB/dd (dd = double-distance). This happens only in the far field of the source. Beranek (1986) suggests that the far field begins at a distance of 3 to 10 times the largest dimension of the sound source. At this distance, the source is small compared to the distance, and a second criterion is normally satisfied: distance2 = wavelength2/36...in a room, closely adjacent reflecting surfaces must be considered to be part of the source. This means that the far field for the combination (loudspeaker plus a very early reflection) can be very far away." He shows a figure of a what appears to be a relatively small three-way loudspeaker, noting "Total system: enclosure edge diffraction, panel radiation, port “talk,” etc. Far field begins at 2.3 to 7.6 m"

There's also critical distance, where "the direct sound equals the level of the reverberation" and which can be increased by increasing source directivity or absorption, in addition to Griesinger's LLD, which is greater than critical distance in Boston Symphony Hall.

tmuikku said:
it is quite easy to switch between at will. I speculate anyone can detect transition between the two at their homes and start reasoning about stuff and get better common understanding on written concepts such as envelopment, or DI, what sound they like, how they relate and how to position their own setup in their own room in order to have a chance to hear envelopment. The newer studies help with this, they work as a map so to speak, they write what you should perceive and what you should not on each side of transition, and that the transition happens due to our auditory system.

Yes, I'm trying to develop my understanding, as well.

tmuikku · Jan 18, 2024

youngho said:
This was a good discussion: https://audiosciencereview.com/foru...-and-far-field-definitions.23841/#post-801349

Toole also discusses this--from the second edition of Sound Reproduction: "The sound level correspondingly falls rapidly, at a rate of −6 dB/dd (dd = double-distance). This happens only in the far field of the source. Beranek (1986) suggests that the far field begins at a distance of 3 to 10 times the largest dimension of the sound source. At this distance, the source is small compared to the distance, and a second criterion is normally satisfied: distance2 = wavelength2/36...in a room, closely adjacent reflecting surfaces must be considered to be part of the source. This means that the far field for the combination (loudspeaker plus a very early reflection) can be very far away." He shows a figure of a what appears to be a relatively small three-way loudspeaker, noting "Total system: enclosure edge diffraction, panel radiation, port “talk,” etc. Far field begins at 2.3 to 7.6 m"

There's also critical distance, where "the direct sound equals the level of the reverberation" and which can be increased by increasing source directivity or absorption, in addition to Griesinger's LLD, which is greater than critical distance in Boston Symphony Hall.

Thanks for the link, I'll read the thread.

Yeah there is danger to mix up of terms with the technical near/far field of a loudspeaker and in this context with perception. I think "nearfield studio monitoring" is wrong term used widely so I used the same familiar wording, with quotation. Critical distance is also well defined term, but I'm not sure it correlates to the LLD at home either, as it doesn't in the hall as you mention.

What I ment by saying awareness with "nearfield" and "farfield" I specifically mean in respect to the Griesinger LLD, the stream separation, shift in perception. The transition. I'm not sure what the right terms would be, do you? I've been just using various terms quite haphazardly and should choose wording more carefully. It's perceptually two states, one is from speaker to a transition distance, beyond which it's the other. Near/far, direct/reverberant, closer/beyond, all seem nice terms but come with danger of misunderstanding

youngho said:
Yes, I'm trying to develop my understanding, as well.

Have you tried hear it with your system? I think there is many ways to find it out and I found it experimenting with mono noise. Adjust your speaker to match very well between left and right, and feed same mono noise signal to both channels for strong phantom center image. Go standing to a far distance, like other side of the room, and start walking slowly towards loudspeakers staying equidistant from both. You could keep eyes closed, concentrate on listening the phantom center. Far away it's a big hazy blob of sound somewhere between speakers and at some distance to speakers it seems to collapse to a quite small bunch, localization suddenly happens as well as clarity. Not hearing it? bring speakers closer to each other, toe them in a bit and try again. For reference, I've got quite directional speakers in a domestic living room with no acoustic treatment beside normal furnishing, and the transition happens about 2.2m from speaker to ear. I would speculate this could be closer if lively room and wide radiating speakers, and bit further out with higher DI system and comfortable acoustics.

This could be very sudden shift in perception, maybe one step like I have it here, and thus quite noticeable. I do not know what constitutes to it, but what ever it is must be contained, most likely combination of things and likely is due to auditory system is now capable of stream separation as per papers in your opening post. I've been able to extend the transition bit further out into the room as it was initially with some with acoustic panels and toe-in, so the transition could be bit more gradual as well.

If you find it, mark it down and put some music on, move back and forth at will and experiment away

Duke · Jan 19, 2024

youngho said:
[quoting Griesinger] “Proximity – the perception that a source is acoustically close - is an important determinant of attention and recall.”
“The ear detects proximity through the phase coherence of upper harmonics in the direct sound, which are randomized by early reflections.”

[@youngho's notes on Lokki's writings] The main overall preference driver in this study was an attribute cluster interpreted as Proximity (related to distance) [also depth and intimacy], which correlates highly with the average of all preference ratings.

youngho said:
[quoting Griesinger] “Envelopment requires separation of a sound into two distinct streams: a foreground stream and a background stream. When a foreground stream is not perceived, there is only one stream, perceived as reverberant but not surrounding. Vienna’s Musikverrein and Boston’s Symphony Hall set the world standard for envelopment, but in both halls reverberation comes from the front in distant seats.”

youngho said:
[quoting Griesinger] “Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. The orchestra recording had no proximity. Many later experiments repeated this arrangement. They all found that adding lateral reflections widened the image. They used the term “Apparent Source Width” or ASW to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. ASW by definition eliminates sharp localization and decreases proximity.” [emphasis youngho's]

Arguably related two-channel speculation:

It has been my amateur sighted-listening observation as well that there is a tradeoff relationship between ASW on the one hand, and sharp image localization/proximity/envelopment on the other.

The reflection path lengths in home audio are typically too short to enable the perceptions of proximity and/or envelopment, but our typically short reflection path lengths ARE capable of creating the perception of enhanced "Apparent Source Width" from strong, early lateral reflections. And ASW is an enjoyable spatial quality, arguably a bonus beyond normal two-channel spatial quality as it can place sound sources laterally beyond the loudspeaker locations.

I speculate that the general preference for wide-pattern loudspeakers for stereo playback is, in part, because they can readily offer an enhanced soundstage width in normal playback rooms, whereas effectively creating the competing perceptual package of sharp image localization/proximity/envelopment is elusive at best.

To put it another way, imo the enjoyable, soundstage-expanding ASW increase enabled by strong early lateral reflections is relatively "low hanging fruit" in the context of two-channel home audio.

Duke · Jan 19, 2024

Trdat said:
Griesenger's research is hard to interpret for an amatuer but there defnitely is a relevance for it in small room acoustics.

The information about concert hall acoustics and psychoacoustics from Griesinger, Lokki, and others, along with @youngho's insightful summary notes, is imo a potential treasure-trove of guidance for home audio playback. But this information is pretty raw, not yet having having been clearly interepreted into "what this is telling us to do in home audio".

Imo having a good foundation in "what to think" is arguably insufficient for extracting full utility from information arriving from outside of one's "box" (in this case,"home audio" is our "box"); imo a good foundation in "how to think" is called for.

tmuikku · Jan 19, 2024

Yeah, totally, concentrate listening and think with it, use the transition, notice with the consious mind what the unconscious part of brain let into existence, into your reality. This unlocks listening skill, your mind is on what you hear, you'll hear the auditory system working

And, now you can establish connection between logic and perception. I think hearing the transition provides basis to understand what you hear.

youngho · Jan 20, 2024

Shoot, I can't figure out what happened with the italicization partway through, and I can't edit the original post.

In any case, here's a follow-up post focussing on bass: https://www.audiosciencereview.com/forum/index.php?threads/bass-and-subwoofers.51589/#post-1857133

orresearch · Apr 30, 2024

Great thread and informative. Thanks.

Smaestro · May 7, 2024

This is an absolute treasure of a write up, I can't thank you enough! There's just so much in here.

Your notes also hit the nail on the head as far as potential commonalities with other research goes.

youngho said:
Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“I believe acoustic research has overlooked three critical concepts:
1. Pitch: Why are humans able to distinguish pitch to an accuracy of six parts in 10,000?
(Because pitch (periodicity) enables us to separate pitched signals from noise.)
2. Phase: It is commonly thought that phase is inaudible above 1500Hz.
(False! The phases of the upper harmonics of tones are critical to proximity and source separation.)
3. Attention: Acoustic quality is measured with intelligibility. Attention is more important.
(Sounds that are proximate involuntarily attract attention.)”

I have some questions. Maybe its mentioned in the presentation but the link to David Griesingers Learning to Listen is down. Perhaps you or anybody knows:

How far can the phase relation be streched before it reaches the point of losing proximity and source separation?
Or more extreme, when does an instrument turn into noise, perceptually?
What other limits were perhaps found with regards to this "urban myth"?

It is "common knowledge" that smooth phase differences aren't a big deal, but I've seen no hard limit on how much is too much. Yet when you see engineers work, the'll do anything to avoid EQing too much later, stating phase problems, so the possible problems are within reach of EQing.

Some notes of my own, going through this from mixing pov:

- Playing fortissimo adds low and high, but less of mids. (I assume they used a balanced piece for this). Not what I'd expect but apparently the lower and higher pitches instruments have a larger dynamic range.

- A good rule of thumb to know that a large dynamic range of higher registers is a range 15dB in a real concert room.

- A FR in a concert hall is gently sloping down up to 6KHz, and a stronger slope down and a cutoff around 15kHz is enough for natural sounding music. Interestingly, some EDM tracks I've looked at follow this curve and cutoff exactly.

And some thinking out loud:
The large impact of the reflections on the separation and intelligibility is very different from the tonality focused in-room stereo research.
The large impact of reflection timing and loudness also makes a strong case for binaural recordings vs stereo listening.

A close second would be close mic'ed orchestra's with later added hall reverbs, listened on headphones. This could be a more accurate simile of listening to an orchestra in real life than stereo speakers.

There is mention that if two sets of reflections clash (1. On the recording and 2. From the room), the stronger one wins in our perception. Well, most consumers rooms are very reflective boxes, so the room wins in that case.

Perhaps the best consumer in-house experience for orchestral music is therefore listening to binaural recordings on headphones, while also having the music on speakers for the bodily sensations.

youngho · May 7, 2024

Smaestro said:
How far can the phase relation be streched before it reaches the point of losing proximity and source separation?
Or more extreme, when does an instrument turn into noise, perceptually?
What other limits were perhaps found with regards to this "urban myth"?

I can't answer your questions directly, as I'm not qualified, also I suspect that the answers may depend to some degree on the signal type, frequency range, SNR, and perhaps listener characteristics. You may find this discussion, especially contributions by @j_j , to be helpful: https://www.audiosciencereview.com/...se-distortion-shift-matter-in-audio-no.24026/. I also put together some background reading that you might enjoy: https://www.audiosciencereview.com/...acoustics-self-education-links-sharing.45583/

Smaestro said:
- Playing fortissimo adds low and high, but less of mids. (I assume they used a balanced piece for this). Not what I'd expect but apparently the lower and higher pitches instruments have a larger dynamic range.

Alternatively, some instruments themselves may have different frequency response characteristics when played quietly vs loudly.

Smaestro said:
- A FR in a concert hall is gently sloping down up to 6KHz, and a stronger slope down and a cutoff around 15kHz is enough for natural sounding music. Interestingly, some EDM tracks I've looked at follow this curve and cutoff exactly.

I had a somehwat different takeaway, which was rather relatively flat between a few hundred Hz and 1 kHz or so, then sloping gently until around 6-7 kH, then more steeply beyond that. Perhaps interestingly, I noticed something similar about some Harman speakers' directivity curve: https://www.audiosciencereview.com/...s-genelec-monitors.44549/page-65#post-1637150

Smaestro said:
And some thinking out loud:
The large impact of the reflections on the separation and intelligibility is very different from the tonality focused in-room stereo research.
The large impact of reflection timing and loudness also makes a strong case for binaural recordings vs stereo listening.

As I understand it, Griesinger argues for individual headphone equalization at the eardrum, along with crosstalk cancellation.

Smaestro said:
A close second would be close mic'ed orchestra's with later added hall reverbs, listened on headphones. This could be a more accurate simile of listening to an orchestra in real life than stereo speakers.

A significant problem here would likely have to do with preservation of phase with respect to the early/direct sound, as well as the reverberation being presented from the wrong direction(s) at the reproduction step. You might also enjoy reading https://www.audiosciencereview.com/...-all-just-get-along.35258/page-6#post-1230066

Young-Ho

youngho · May 7, 2024

youngho said:
I can't answer your questions directly, as I'm not qualified, also I suspect that the answers may depend to some degree on the signal type, frequency range, SNR, and perhaps listener characteristics. You may find this discussion, especially contributions by @j_j , to be helpful: https://www.audiosciencereview.com/...se-distortion-shift-matter-in-audio-no.24026/. I also put together some background reading that you might enjoy: https://www.audiosciencereview.com/...acoustics-self-education-links-sharing.45583/

To clarify just a bit, here is one example of where listener characteristics may be relevant: https://www.audiosciencereview.com/...-immersion-networks.50589/page-5#post-1912938

youngho said:
Alternatively, some instruments themselves may have different frequency response characteristics when played quietly vs loudly.

E.g. the Lombard effect for the human voice, changes in timbre for instruments (see https://newt.phys.unsw.edu.au/jw/clarinetacoustics.html#pff and https://pubs.aip.org/asa/jasa/artic...Influence-of-pitch-loudness-and-timbre-on-the)

Smaestro said:
- Playing fortissimo adds low and high, but less of mids. (I assume they used a balanced piece for this). Not what I'd expect but apparently the lower and higher pitches instruments have a larger dynamic range.

I forgot to mention that I was was reminded of the so-called smiley face EQ, which perhaps enhances subjective perception of LOUDNESS, similar to certain types of distortion (the opposite situation of anecdotes of some listeners raising volume of systems with very low distortion to much higher SPLs than expected).

Smaestro · May 7, 2024

Thanks for all the links and info! I'm going to go through all of it.

I was indeed also reminded of the smile face EQ / Equal Loudness curves. It is interesting because it is the opposite of what I expected (expected: more mids to keep equal tonality). For my mixes, I'm going to experiment in making louder parts disproportionally louder in the low and high end, and see if that's more impactful than a broad volume increase.

Concert hall acoustics links and excerpts

youngho

Addicted to Fun and Learning

Trdat

Major Contributor

MaxwellsEq

Major Contributor

youngho

Addicted to Fun and Learning

tmuikku

Senior Member

youngho

Addicted to Fun and Learning

tmuikku

Senior Member

Attachments

youngho

Addicted to Fun and Learning

tmuikku

Senior Member

Duke

Major Contributor

Duke

Major Contributor

tmuikku

Senior Member

youngho

Addicted to Fun and Learning

orresearch

New Member

Smaestro

Active Member

youngho

Addicted to Fun and Learning

youngho

Addicted to Fun and Learning

Smaestro

Active Member

Similar threads