Concert hall acoustics
This is a follow-up post to https://www.audiosciencereview.com/...-of-lokki-bech-toole-et-al.27540/#post-950580, focussing on concert hall acoustics and expanding significantly on previously shared resources for further learning. I’ve included links, relevant quotes, and some of my notes. I believe that there are some interesting areas of possible relevance with home stereo reproduction, especially of classical music.
Greisinger 2004
http://www.davidgriesinger.com/ICA_2004 imbedded.pptx
“Human hearing adapts to an acoustic environment over a period of 5 to 10 minutes”
“Frequencies above 1kHz are primarily responsible for perceptions of Timbre, Clarity, Intelligibility, Distance”
“Frequencies below 500Hz are primarily responsible for perceptions of Resonance, Envelopment, Warmth”
Dammerud 2009
https://www.akutek.info/Papers/JJD_Stage_acoustics_PhDthesis_96dpi.pdf
Stage Acoustics for Symphony Orchestras in Concert Halls
“Based on calculated C80 and G, early and late Strength, Ge [G0-80] and Gl [G80-∞], were also derived according to Equations 7.2 and 7.3. G was found by comparing measured levels to a reference microphone at 1 m distance from the source…The values of the listed acoustic measures were found as average (arithmetical) value within the three octave bands 500–2000 Hz.”
Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“Potential assessors were openly invited with an article published in a national magazine of classical music. In addition, invitations were sent to student orchestra mailing lists, as well as to students of musicology and music. Finally, 23 candidates (13 males), each of them with a musical background and between the ages of 19 and 75 years (average age of 35), participated in the listening tests.”
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound”
“The best correlation with average preference ratings of all assessors was found to be with subjective proximity…none of the standardized objective room acoustical parameters could explain the proximity and preference data.”
Concert hall attribute clusters
Lee 2013
https://pure.hud.ac.uk/en/publicati...h-and-listener-envelopment-in-relation-to-sou
The distance dependent ASW was best predicted using objective measures called G E (early sound strength) while the LEV results were highly correlated with GL (late sound strength) and B/F ratio (Back/Front energy ratio of late sound). Such conventional measures as [1-IACCE], [1-IACCL] and LF did not agree with the perceived results.
Griesinger 2013
http://www.davidgriesinger.com/ICA2013/What is Clarity5.pptx
“Sound is detected in the inner ear with a continuous 1/3 octave filter. Speech information is encoded in the relative strength of critical bands in the frequency range of 800 to 4000Hz, with some consonants at higher frequencies.”
“Standard hearing models predict a pitch acuity limited by the 1/3 octave bandwidth of the basilar filters. But musicians and listeners hear pitch to an accuracy of one part in one thousand! Why? Licklider proposed that our acuity of hearing could be explained by an autocorrelator located as close as possible to the hair cells, explaining our sense of pitch, and the rules of harmony.”
“Envelopment requires separation of a sound into two distinct streams: a foreground stream and a background stream. When a foreground stream is not perceived, there is only one stream, perceived as reverberant but not surrounding. Vienna’s Musikverrein and Boston’s Symphony Hall set the world standard for envelopment, but in both halls reverberation comes from the front in distant seats.”
“The perception of Clarity fails three ways: 1. Solution: limit early reflections. 2. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
“Harmonics of pitched tones increase the signal to noise ratio by 12dB or more, and allow source separation. But these advantages depend on the phase alignment of the harmonics, and these phases are altered by acoustics. When phases are preserved at the onsets of sounds we get CLARITY – otherwise we get MUD.”
Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors
Lokki 2013
https://users.aalto.fi/~ktlokki/Publs/patynen_2013_JAS000842.pdf
Figure 3 illustrates time-frequency representation, which is development of frequency response at differing time windows. What I note is that the overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz. Concert halls like https://www.nagata-i.com/portfolio/suntory-hall/ and https://www.nagata-i.com/portfolio/sapporo-concert-hall-kitara/ (thanks, REG) show decreased reverberation times at these higher frequencies.
Lokki 2014
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
“Sensory evaluation methods, borrowed from the food and wine industry, are useful for studying concert-hall acoustics because they can extract information often hidden behind preference judgments. With such methods, in particular those based on individually elicited attributes, one can develop sensory profiles of concert halls or of seats inside one concert hall. Preference judgments might give an overall average picture, but the variance in the data is typically large due to the assessors’ personal tastes and previous experiences. Sensory evaluation methods provide a link between those subjective preferences and perceptual characteristics.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“Twenty eight assessors (14 males, 14 females; ages between 22 and 64 yr with average age of 39.6) were recruited for this study. We gathered the participants with a web-based questionnaire and we particularly looked for people who often go to live concerts. The selected assessors were 10 professional musicians (no specific genre), 10 active amateur musicians, and eight active concert goers with varying musical background. Thus, they could all be considered as experts in listening to music, even though not expert assessors nor listening test participants.”
“The main results show that listeners can be categorized into two different preference classes.”
Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
“The halls were all measured unoccupied resulting in more reverberant conditions than in situ at occupied conditions, which is often the reality in this renowned halls. Although the assessors described the sound samples being very natural and realistic, it is clear that unoccupied conditions might introduce a bias to the presented results, as the difference between occupied and unoccupied conditions varies between halls…it might be that the mental reference for assessors is classical music recordings, which always have high clarity and less reverberation than in reality in situ, or in our auralizations.”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
“When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012). The early reflections are perceptually grouped with the source streams through the precedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence. At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lock et al.2015).
Beranek 2016
https://pubs.aip.org/asa/jasa/article/139/4/1548/662531/Concert-hall-acoustics-Recent-findingsa
“It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between) namely, those who like the sound best in the front two-thirds of the main floor and those who prefer the sound in the upper rear second balcony. The difference is readily apparent to anyone by listening to the first half of an orchestral concert on the main floor and the second half in the rear second balcony. On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful. In the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound. The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
“Source Presence is that sound which reaches the listener before the reverberation becomes appreciable. It usually includes the direct sound and early reflections up to about 100 ms after arrival of the direct sound. Room Presence deals with the reverberant sound field that follows…[Haapaniemi and Lokki] found that each of the 8 well-known halls could be identified better by source presence than by room presence..“
“P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
“Because of the size of the head, when the sound arrives from a lateral direction, sideways, the intensity at the closer ear of a listener, in the 2000 to 8000 Hz region, is 1 to 5 dB greater than that when the sound arrives only from the front. With reflections from both sides of a shoebox hall this difference will occur at both ears. Hence, in a shoebox hall, the difference in the intensities between (pp) and (ff) that arrives at the ears between 2000 and 8000 Hz is 8 to 12 dB greater [7 dB þ (1 to 5 dB)]”
Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics
https://www.akutek.info/Mitt Bibliotek/IOA Auditorium Acoustics Hamburg 2018/Additional/papers/p35.pdf
“We have developed a method intended to predict whether a source will be localizable or not from a binaural impulse response. We called it LOC, which is short for Localization. LOC was developed from a data set obtained using male speech with pauses between each word. We modeled a room with two loudspeakers placed at +-20 degrees in front of the listener in a space with two different reverberation times and four different pre-delays.”
“Frequencies from 1000Hz to 4000Hz are principally responsible for the data we obtained. We developed a formula for LOC based on the idea that when the integrated loudness of the direct sound in that frequency range is stronger than the integrated loudness of the reflections in the first 100ms of the onset of a speech syllable, then the whole syllable can be localized and perceived as close to the listener. A very important aspect of the formula is that the “loudness” of the reflections is proportional to the integrated logarithm of the sound and not the integrated sound energy or pressure.”
“LOC has been tested in several spaces, mostly large enough to hold 300 to 1500 seats. We typically used a loudspeaker array similar to Tapio Lokki’s on stage playing his anechoic recordings of an electronic string quartet with a soprano voice playing a Mozart aria. We recorded impulse responses with my personal dummy head and a three dimensional microphone from each speaker array. We also listened to music playing through the array to find the LLD in various parts of the venues. Values of LOC above 3dB typically predicted good localization of the ensemble. Values below 2dB predicted poor localization. In most of these tests the reflections were more or less equal on both sides.“
“We also tested LOC with a data set from Boston Symphony Hall. All but one of the seats had LOC values of +3dB or more, and sounded good both on headphones and in person with a live orchestra. One seat had a strong reflection from the right side wall. LOC at that seat was below 2dB in the right ear and above 5dB in the left ear. The instruments were not localizable and the sound in that seat was muddy. Deleting the strong sidewall reflection from the impulse responses raised LOC in the right ear to +5dB. The sound improved greatly. We learned that to determine the ability to localize instruments in a particular seat we need to look at the minimum value of LOC from the two ears, not the maximum or average value. “
“In Boston symphony hall the critical distance, where the D/R is one, is only about 17 feet from an omnidirectional source. On the floor the best seats can be forty to fifty feet from the stage, and the great seats in the front of the first balcony are more than 110 feet from the stage. “
“Violins start gradually. LOC was devised assuming the sounds we are trying to localize abruptly rise to full level. Violins can start notes abruptly, but usually they do not. Sound onsets can take 50 milliseconds or more to reach full level. By the time the note reaches full level reflections in a small room have substantial energy. “
“The direct to reverberant ratio needed to localize the violins is uniformly greater than for male speech. The data suggests that the major reason is the slower onset of each note. Reverberation with the one second RT builds up rapidly, and quickly masks the slow onset of the direct sound from the violin. A higher D/R is needed to overcome the masking.”
“With the two second reverberation and short values of pre-delay the D/R needed for localization for the violins is nearly the same as it is for the one second reverberation, but as pre-delay increases violins become easier to localize. The slope of the improvement in localizability is steeper than for male speech. In the theory behind the development of LOC the slope of the improvement with predelay is governed by the length of the comb filter or auto correlator that separates the direct sound from later reverberation. The data for violins suggests that the length of this filter is shorter for harmonics of violins than it is for male speech. This result was expected. High frequency fundamentals do not need as long a filter to obtain the same accuracy as lower frequency fundamentals.“
"t is well known that a prompt early reflection can augment speech and musical instruments. Churches have put a wall and a ceiling behind and over pulpits for centuries. Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better. But it has to be the very last row. The next to last row is no better than the others. In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work. We did some experiments using our virtual room and male speech. The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre. The current version of LOC appeared to work for this case. A reflection 6ms after the direct sound did not affect the localizability of the sound, and was beginning to alter the timbre. A reflection at 7ms reduced the ability to localize, and added an unpleasant timbre. We tried the same experiment with female speech and got the same result. We do not know why this effect occurs, but it is the same for both male and female speech."
“As a result of these experiments we added a 2ms cross fade centered at 6ms to the calculation for LOC.”
Griesinger 2018
http://www.davidgriesinger.com/Localization, Loudness and Proximity 8.pptx
“The ability to localize a sound depends on detecting the direct sound, which alone carries the ITD and ILD…enables localization and proximity. If a sound is perceived as close our attention is involuntarily drawn to it.”
“Aarabi et al…finds that there is a distinct amount of phase randomization that causes separation of sounds to abruptly fail.”
“Successfully separated sources form multiple “foreground” streams…sounds that cannot be localized or separated form a single stream with special properties; the “background stream. The background stream is perceived as distinctly separate from the foreground streams, and is often perceived as fully surrounding. When there are no localized foreground streams all the sound we hear is perceived as a single stream.”
“If we want to know if the ear distinctly hears the direct sound we need to plot the integrated LOUDNESS of direct sound versus the integrated LOUDNESS of the reflections. I say loudness because the ear responds to the logarithm of sound pressure, not the sound power. Our measure for localizing the direct sound, LOC, separately plots the loudness of frequencies above 1000Hz from the direct sound and the build up of the loudness of the reflections inside a 100ms window. The loudness – the integrated neural activity – of the direct sound is given by the area under the blue line inside the black window. Similarly, the loudness of the reflections is given by the area under the red curve inside the black window. LOC is the ratio of the direct sound area to the reflection area in decibels.”
“The experiments showed that early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize. Reflections at 6ms do not increase localization and begin to alter timbre. Reflections at 7ms and later add to the reverberant loudness, and are detrimental both to timbre and localization.”
“We have found that the ability to sharply localize sources in halls and rooms depends on many factors besides the timing and strength of the early reflections. The speed of the attack of each new sound matters a lot…the reverberation time matters, as does the score and tempo of the music. Short notes excite reverberation very little, and tend not to be masked by previous notes. But strong held notes build up the reverberation, and can sometimes mask everything.”
Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“I believe acoustic research has overlooked three critical concepts:
1. Pitch: Why are humans able to distinguish pitch to an accuracy of six parts in 10,000?
(Because pitch (periodicity) enables us to separate pitched signals from noise.)
2. Phase: It is commonly thought that phase is inaudible above 1500Hz.
(False! The phases of the upper harmonics of tones are critical to proximity and source separation.)
3. Attention: Acoustic quality is measured with intelligibility. Attention is more important.
(Sounds that are proximate involuntarily attract attention.)”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies. A had a tiny oscilloscope with a one inch CRT displaying the left/right phase. It was easy to see if the sound was too monaural. But loudspeakers must be carefully placed if spatial bass is to be heard. The arrangement along the long wall shown in the previous slide can often work in small spaces.“
“The alignment of phases in the upper harmonics of tones is vital to source separation and localization. Harmonic tones are created by pulses of air, the release of a rosined string, or strikes of a hammer. The amplitude spikes created by these pulses cut through noise and reverberation…early reflections randomize the phase of these pulses. The consequences are dramatic. ”
Hochgraf 2019
https://acousticstoday.org/the-art-...ons-in-research-and-design-kelsey-a-hochgraf/
The Art of Concert Hall Acoustics: Current Trends and Questions in Research and Design
“Although there is not yet a consensus around specific attributes most correlated with audience listener preference, there is agreement that different people prioritize different elements of the acoustical experience. Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound “
Auditory stream segregation: source and room response
“There is growing consensus among acousticians that although many of these [numerical parameters standardized by ISO 3382-1:2009] are useful, they do not provide a complete representation of concert hall acoustics...the limitations are largely attributable to differences between an omnidirectional sound source and an orchestra and between omnidirectional microphones and the human hearing system.”
“The importance of lateral reflections for spatial impression is well-understood…but more recent research has shown that these reflections are also critical to the perception of dynamic responsiveness…increased perception of dynamic range has also been shown to correlate with increased emotional response...the perception threshold for lateral reflections decreases with increasing sound level, meaning that more lateral reflections will be perceived by the listener as the music crescendos, further heightening the sense of dynamic responsiveness.”
“[T]he tide seems to be shifting away from high surface diffusivity and there is more evidence to substantiate the need for strong lateral reflections,“
Lokki 2019
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””
“It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
“Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”
Vigeant 2019
https://sites.psu.edu/spral/files/2...Hall-Preference-ISRA-Amsterdam-2019-FINAL.pdf
“Sixteen subjects participated in the experiment, all meeting minimum hearing thresholds of 15 dB-HL in the octave bands from 250 –8000 Hz. Subjects were required to have at least five years of formal musical training and were required to be actively studying their instrument or involved in a musical ensemble. The subject pool included 11 males and 5 females with an average age of 24 years. The average musical experience across all subjects was 14 years.”
"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” (0.66-0.74)
“Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87).”
“Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)”
“A set of four orthogonal factors were shown to explain 72% of the total variance in perception, which were interpreted as clarity, strength / envelopment, strength / source width, and brilliance.“
Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“Pitch (periodicity) enables us to separate pitched signals from noise…the phases of the upper harmonics of tones are critical to proximity and source separation…sounds that are proximate involuntarily attract attention.”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies…the arrangement along the long wall shown in the previous slide can often work in small spaces.”
“Comb-like filters tuned to the fundamental period of an amplitude waveform can separate the formants of a particular speaker or instrument from other signals and from noise…The alignment of phases in the upper harmonics of tones is vital to source separation and localization…Recent papers from the field of speech comprehension have come to the same conclusions about the importance of the amplitude waveform of sounds with distinct pitch. They call the process “Source separation by periodicity.””
“For example: Manfred Schroeder began to study concert halls with binaural technology in 1974. He used a dummy head microphone for recording, and crosstalk cancellation for playback. With some modifications these techniques can work well. But Schroeder made a serious error, and it was contagious. He reproduced a whole orchestra with just two speakers. Mixing an orchestra into just two channels eliminates the phase information that gives rise to the perception of proximity. Edison, Dick Campbell, Kimio Hamasaki, Tapio Lokki and the author all find that to make a believable orchestral sound you must use a separate loudspeaker for each instrument! When you hear an orchestral recording from just two speakers in a hall it sounds artificial. But if you listen beyond the LLD the sound blends together, and is a bit more believable. This is the sound of a poor seat, not a good one.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. There was no phase coherence on any instrument. They found adding lateral reflections widened the image. They used the term “Apparent Source Width” (ASW) to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. But ASW reduces localization, eliminates proximity, and decreases envelopment. It is the sound in a poor seat, not in a good one.”
For some reason, the last three sentences remind me of @Karl-Heinz Fink's post: https://www.audiosciencereview.com/...n-without-subwoofer.48559/page-9#post-1743584
Griesinger 2020 (and 2018)
http://www.davidgriesinger.com/The_Physics_of_auditory_proximity.pptx
https://www.aes.org/tmpFiles/elib/20240113/18463.pdf
“Aarabi showed that phase randomization can dramatically increase errors in word recognition in noisy environments…randomization of phase decreases what we now call “Proximity”, and lack of proximity decreases attention and recall.”
“We have been calling the proximity perception “Engagement”, “Clarity”, or “Presence” since 2004. We have a measure for it we call “LOC”.”
“Proximity – the perception that a source is acoustically close - is an important determinant of attention and recall.”
“The ear detects proximity through the phase coherence of upper harmonics in the direct sound, which are randomized by early reflections.”
Proximity may be measured by LOC, which compares sum in dB of direct sound and reflections in 80 ms window, +3dB predicts good proximity
“In most halls proximity disappears over a distance of one meter at a particular distance from the sound sources. We call this the “Limit of Localization Distance” or LLD“
“STI , the speech transmission index, is a standard way of measuring intelligibility in halls and classrooms. We calculated both STI and our measure for proximity, LOC, in a 27’x25’x10’ room with surface absorption of 0.15 and RT 0.34s…a small room with a 0.34 second RT has excellent STI – but the teacher is heard with proximity only in the front seat…this model classroom is greatly improved by increasing the average absorption to 0.3 …The intelligibility measured by STI also increases. A small room needs a lot of absorption if proximity is to be high in all the seats.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. The orchestra recording had no proximity. Many later experiments repeated this arrangement. They all found that adding lateral reflections widened the image. They used the term “Apparent Source Width” or ASW to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. ASW by definition eliminates sharp localization and decreases proximity.”
“In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time. “
Lokki 2020
https://research.aalto.fi/files/75640242/SCI_Lokki_Auditory_spatial_impression_in_concert_halls.pdf
“Direct sounds and adjacent scattering, i.e. the initial 5 ms of the acoustic response arrives from each source on the stage in frontal directions. In a shoebox hall the stage floor is typically on the ear level of the audience at main parterre, thus the listener does not receive the stage floor reflection…The frequency responses illustrate that in the shoe-box hall the direct sounds lack the low frequencies, but have considerably strong high frequencies. In contrast, in the vineyard hall with a raked audience area, the frequency response of the first 5 ms is quite different due to stage floor reflections.”
“Early reflections until 30 ms...increase the overall loudness, color the sound and might change the perceived width of the source. As said, the temporal envelope preserving lateral reflections integrate to the direct sound best, increasing its quality and preserving the ability to localize. If the reflections are scrambling the phases of upper harmonics, i.e. reflections from heavily diffusing surfaces, the precedence effect might partially break down and such early reflections are not fully integrated to the direct sound (Lokki et al., 2011). Such reflections might increase the perceived width of the source to the detriment of less defined location of the source. As a result the instruments better blend together, but some listeners associate that to reduced clarity.…the shoe-box hall provides prominent lateral reflections already inside this 30 ms time window…the early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level.”
“Later reflections between 30 and 200 ms increase the overall sound energy…In the shoe-box hall, the increase is particularly strong above 200 Hz, equalizing the frequency response to be more or less at at 200 ms after the direct sound. Moreover, the energy in this time window reaches the measurement position almost evenly from all directions”
“Reverberation beyond 200 ms increases the cumulative energy to its final state…contributes to loudness, envelopment, spaciousness, and timbre…Notable differences between these halls can be observed in the smoothness of the overall frequency responses, level of low frequencies, and spatial distribution of sound energy.”
Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”
Lokki 2024
https://acris.aalto.fi/ws/portalfiles/portal/118035577/388_1_10.0020066.pdf
“Categorization of listeners by preference confirmed the emergence of two groups, where listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound and listeners in the other group [5/20] prefer clarity. This aligns with earlier studies of large symphony halls.10 Thus, there do not seem to be fundamental differences between the perception of chamber music halls and symphony halls in this regard.”
My notes in italics
>= Two broad relative preference groups
Lokki 2012: “Here, it was found that assessors can be grouped to two preference groups. Similar grouping has been found also earlier by Schroeder et al.,9 who found similar preference groups related to loud sound and clear sound. Barron4 divided assessors into groups by intimacy and reverberance. There, results also correlate with the results presented here; one group preferred clear and intimate sound and another group preferred loud, enveloping, and reverberant sound.”
Beranek 2016: ““It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between)”
Hochgraf 2019: “Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound“
Vigeant 2019: “Traditionally, preference has been divided into two groups: one group preferring strength and reverberance, and the other preferring clarity and intimacy….although subjects can be placed in two discrete groups, it removes the subtlety and individual variability in the data…large individual differences were observed, not captured fully in the traditional two-group preference model.”
Similar preference groups in stereo listening?
Group 1 : REW (reverberate, envelopment, width)
Lokki 2012: Majority (10/17) prefer “bass, loudness, envelopment, and reverberance”
Lokki 2016: Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
Beranek 2016: “those who prefer the sound in the upper rear second balcony…in the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound.
Vigilant 2019: ““Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87)” and reverberance (=0.72)
Lokki 2024 “Listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound…Envelopment correlates mainly with late lateral sound level LJ and strength G. Proximity correlates positively with mid-frequency G and C80 and negatively with low-frequency EDT and T20, i.e., strong and clear sound lead to more proximate sound. Finally, the attribute warmth has high correlation with wide-band G and LJ, but not with low frequency G.”
May correlate with listeners like Toole who prefer ASW?
Group 2: CD (clarity, definition)
Lokki 2012 Minority (7/17) prefer “intimate and close sound with high definition and clear sound…mild reverberance with well-defined sound.”
Lokki 2016: Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
C80 is energy ratio before and after 80 ms, measured in dB. C80 frequencies with highest correlations were ~500 Hz to 4 kHz
Griesinger 2013: ““The perception of Clarity fails three ways: 1. Solution: limit early reflections. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
Beranek 2016 “those who like the sound best in the front two-thirds of the main floor… On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful…The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
Vigeant: “"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” 0.66-0.74”
Lokki 2024: “listeners in the other group [5/20] prefer clarity…Preference scores in the smaller preference group 2 correlate positively with C80, and negatively with LJ, i.e., almost the same correlation as for the attribute clarity. ”
May be similar to audio professionals’ sensitivity to lateral reflections in small room audio?
Localization
May be measured through LOC, which focuses primarily on upper midrange 1-4 kHz, basically comparing integrated loudness of direct sound with integrated loudness (logarithmic) of reflections in first 100 ms. This was originally developed for speech but had to be modified for violin, due to usually slower sound onset. Similarly, C50 used for speech, C80 for music. The frequency range of C80 with highest correlation to perception of clarity or definition was 500 Hz to 4 kHz.
LLD or limit of localization distance, where localization and proximity fail, can significantly exceeds critical distance where direct:reverberant sound ratio is 1, as in Boston Symphony Hall
Griesinger argues phase preservation of harmonic coherence (seemingly above 1 kHz) in direct sound (up to 5 ms, “early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize,” similar time frame for Lokki’s direct response and adjacent scattering) important for pitch separation and localization, Lokki also notes potential negative effects of scrambling phases of upper harmonics on definition/localization/clarity
Griesinger 2018: “t is well known that a prompt early reflection can augment speech and musical instruments…Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better…In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work…The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre…”
Joachim Gerhard (previously of Audio Physic) had proposed listener placement <1m of the wall behind him/jher, arguing that “From experience that this reflection is not so objectionable for phantom image perception” (see future listening room acoustics link)
Griesinger’s description of sound onset or attack is like @j_j's “leading edge” of envelope in psychoacoustics, as phase randomization (which damages localization and the perception of proximity) is like uncorrelated envelopes and edges (JJ notes that decorrelation of leading envelope edges contributes to distance perception—see future psychoacoustics link).
Proximity
Lokki 2012: The main overall preference driver in this study was an attribute cluster interpreted as Proximity (related to distance) [also depth and intimacy], which correlates highly with the average of all preference ratings.
Vigeant 2019: “Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)” [close vs far]
Frequency range and dynamics
Lokki 2013: Figure 3 shows overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz
Lokki 2016: “When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”
Beranek 2016: “P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
Lokki 2019: ““It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
Lokki 2020: “The early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level”
Given change in perceived frequency response over time, along with slower onset of sound in music, phase at lower frequencies may be less important to maintain relative to the rest of the frequency spectrum.
Floor and ceiling reflections
Lokki 2016: ““The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
Lokki 2019: ““When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections.“
Lokki 2020: “If such a reflection is coming from the median plane, i.e. from ceiling or reflectors above an orchestra, the sound quality might be reduced due to coloration, which is the same in both ears. Moreover, such ceiling reflection might increase the interaural correlation, which could increase the perceived distance of the source.”
No floor bounce per se in shoebox halls, but ceiling reflections may lower perception of proximity
Rear wall (behind the orchestra) reflections
Lokki 2019: “Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”
Griesinger 2020: “In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time.”
Rear wall reflections may reduce perception of proximity
This is a follow-up post to https://www.audiosciencereview.com/...-of-lokki-bech-toole-et-al.27540/#post-950580, focussing on concert hall acoustics and expanding significantly on previously shared resources for further learning. I’ve included links, relevant quotes, and some of my notes. I believe that there are some interesting areas of possible relevance with home stereo reproduction, especially of classical music.
Greisinger 2004
http://www.davidgriesinger.com/ICA_2004 imbedded.pptx
“Human hearing adapts to an acoustic environment over a period of 5 to 10 minutes”
“Frequencies above 1kHz are primarily responsible for perceptions of Timbre, Clarity, Intelligibility, Distance”
“Frequencies below 500Hz are primarily responsible for perceptions of Resonance, Envelopment, Warmth”
Dammerud 2009
https://www.akutek.info/Papers/JJD_Stage_acoustics_PhDthesis_96dpi.pdf
Stage Acoustics for Symphony Orchestras in Concert Halls
“Based on calculated C80 and G, early and late Strength, Ge [G0-80] and Gl [G80-∞], were also derived according to Equations 7.2 and 7.3. G was found by comparing measured levels to a reference microphone at 1 m distance from the source…The values of the listed acoustic measures were found as average (arithmetical) value within the three octave bands 500–2000 Hz.”
Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“Potential assessors were openly invited with an article published in a national magazine of classical music. In addition, invitations were sent to student orchestra mailing lists, as well as to students of musicology and music. Finally, 23 candidates (13 males), each of them with a musical background and between the ages of 19 and 75 years (average age of 35), participated in the listening tests.”
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound”
“The best correlation with average preference ratings of all assessors was found to be with subjective proximity…none of the standardized objective room acoustical parameters could explain the proximity and preference data.”
Concert hall attribute clusters
- Loudness/envelopment/reverberance: spaciousness, width of sound image
- Bassiness/proximity: warmth, intimacy, naturally close
- Definition/clarity: separating sound, focus and localization
- Majority (10/17): “Close sound with a lot of bass, loudness, envelopment, and reverberance. The definition is very low, but subjective clarity is very diverse within these three halls.”
- Minority (7/17): “They render the most intimate sound that contains enough bass and loudness. They have mild reverberance with well-defined sound.”
Lee 2013
https://pure.hud.ac.uk/en/publicati...h-and-listener-envelopment-in-relation-to-sou
The distance dependent ASW was best predicted using objective measures called G E (early sound strength) while the LEV results were highly correlated with GL (late sound strength) and B/F ratio (Back/Front energy ratio of late sound). Such conventional measures as [1-IACCE], [1-IACCL] and LF did not agree with the perceived results.
Griesinger 2013
http://www.davidgriesinger.com/ICA2013/What is Clarity5.pptx
“Sound is detected in the inner ear with a continuous 1/3 octave filter. Speech information is encoded in the relative strength of critical bands in the frequency range of 800 to 4000Hz, with some consonants at higher frequencies.”
“Standard hearing models predict a pitch acuity limited by the 1/3 octave bandwidth of the basilar filters. But musicians and listeners hear pitch to an accuracy of one part in one thousand! Why? Licklider proposed that our acuity of hearing could be explained by an autocorrelator located as close as possible to the hair cells, explaining our sense of pitch, and the rules of harmony.”
“Envelopment requires separation of a sound into two distinct streams: a foreground stream and a background stream. When a foreground stream is not perceived, there is only one stream, perceived as reverberant but not surrounding. Vienna’s Musikverrein and Boston’s Symphony Hall set the world standard for envelopment, but in both halls reverberation comes from the front in distant seats.”
“The perception of Clarity fails three ways: 1. Solution: limit early reflections. 2. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
“Harmonics of pitched tones increase the signal to noise ratio by 12dB or more, and allow source separation. But these advantages depend on the phase alignment of the harmonics, and these phases are altered by acoustics. When phases are preserved at the onsets of sounds we get CLARITY – otherwise we get MUD.”
Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors
- Loudness (Strength, Level, Intensity): The louder the better.
- Immersion (Presence, Intimacy, Envelopment, Spaciousness): Engaging and enveloping sound is interesting and desired.
- Spatial extent (Distance, Depth, Source width, Balance): Proximite and spatially balanced sound (no image shift) is good.
- Definition (Clarity, Articulation, Blend, Discrimination, Sharpness): Different instruments should have sharp articulation with a nice blend.
- Timbre (Openness, Brilliance, Balance, Warmth, Bassiness): Balanced frequency response with small emphasis on bass and enough high frequencies give open and brilliant sound.
Lokki 2013
https://users.aalto.fi/~ktlokki/Publs/patynen_2013_JAS000842.pdf
Figure 3 illustrates time-frequency representation, which is development of frequency response at differing time windows. What I note is that the overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz. Concert halls like https://www.nagata-i.com/portfolio/suntory-hall/ and https://www.nagata-i.com/portfolio/sapporo-concert-hall-kitara/ (thanks, REG) show decreased reverberation times at these higher frequencies.
Lokki 2014
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
“Sensory evaluation methods, borrowed from the food and wine industry, are useful for studying concert-hall acoustics because they can extract information often hidden behind preference judgments. With such methods, in particular those based on individually elicited attributes, one can develop sensory profiles of concert halls or of seats inside one concert hall. Preference judgments might give an overall average picture, but the variance in the data is typically large due to the assessors’ personal tastes and previous experiences. Sensory evaluation methods provide a link between those subjective preferences and perceptual characteristics.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“Twenty eight assessors (14 males, 14 females; ages between 22 and 64 yr with average age of 39.6) were recruited for this study. We gathered the participants with a web-based questionnaire and we particularly looked for people who often go to live concerts. The selected assessors were 10 professional musicians (no specific genre), 10 active amateur musicians, and eight active concert goers with varying musical background. Thus, they could all be considered as experts in listening to music, even though not expert assessors nor listening test participants.”
“The main results show that listeners can be categorized into two different preference classes.”
Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
“The halls were all measured unoccupied resulting in more reverberant conditions than in situ at occupied conditions, which is often the reality in this renowned halls. Although the assessors described the sound samples being very natural and realistic, it is clear that unoccupied conditions might introduce a bias to the presented results, as the difference between occupied and unoccupied conditions varies between halls…it might be that the mental reference for assessors is classical music recordings, which always have high clarity and less reverberation than in reality in situ, or in our auralizations.”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
“When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012). The early reflections are perceptually grouped with the source streams through the precedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence. At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lock et al.2015).
Beranek 2016
https://pubs.aip.org/asa/jasa/article/139/4/1548/662531/Concert-hall-acoustics-Recent-findingsa
“It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between) namely, those who like the sound best in the front two-thirds of the main floor and those who prefer the sound in the upper rear second balcony. The difference is readily apparent to anyone by listening to the first half of an orchestral concert on the main floor and the second half in the rear second balcony. On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful. In the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound. The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
“Source Presence is that sound which reaches the listener before the reverberation becomes appreciable. It usually includes the direct sound and early reflections up to about 100 ms after arrival of the direct sound. Room Presence deals with the reverberant sound field that follows…[Haapaniemi and Lokki] found that each of the 8 well-known halls could be identified better by source presence than by room presence..“
“P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
“Because of the size of the head, when the sound arrives from a lateral direction, sideways, the intensity at the closer ear of a listener, in the 2000 to 8000 Hz region, is 1 to 5 dB greater than that when the sound arrives only from the front. With reflections from both sides of a shoebox hall this difference will occur at both ears. Hence, in a shoebox hall, the difference in the intensities between (pp) and (ff) that arrives at the ears between 2000 and 8000 Hz is 8 to 12 dB greater [7 dB þ (1 to 5 dB)]”
Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics
- Loudness, volume, level, strength, body, dynamic range
- Intimacy, proximity, source presence
- Spatial impression, width, envelopment
- Reverberance, liveness, fullness
- Clarity, definition, articulation, sharpness, localization
- Timbre, warmth, spectral balance
https://www.akutek.info/Mitt Bibliotek/IOA Auditorium Acoustics Hamburg 2018/Additional/papers/p35.pdf
“We have developed a method intended to predict whether a source will be localizable or not from a binaural impulse response. We called it LOC, which is short for Localization. LOC was developed from a data set obtained using male speech with pauses between each word. We modeled a room with two loudspeakers placed at +-20 degrees in front of the listener in a space with two different reverberation times and four different pre-delays.”
“Frequencies from 1000Hz to 4000Hz are principally responsible for the data we obtained. We developed a formula for LOC based on the idea that when the integrated loudness of the direct sound in that frequency range is stronger than the integrated loudness of the reflections in the first 100ms of the onset of a speech syllable, then the whole syllable can be localized and perceived as close to the listener. A very important aspect of the formula is that the “loudness” of the reflections is proportional to the integrated logarithm of the sound and not the integrated sound energy or pressure.”
“LOC has been tested in several spaces, mostly large enough to hold 300 to 1500 seats. We typically used a loudspeaker array similar to Tapio Lokki’s on stage playing his anechoic recordings of an electronic string quartet with a soprano voice playing a Mozart aria. We recorded impulse responses with my personal dummy head and a three dimensional microphone from each speaker array. We also listened to music playing through the array to find the LLD in various parts of the venues. Values of LOC above 3dB typically predicted good localization of the ensemble. Values below 2dB predicted poor localization. In most of these tests the reflections were more or less equal on both sides.“
“We also tested LOC with a data set from Boston Symphony Hall. All but one of the seats had LOC values of +3dB or more, and sounded good both on headphones and in person with a live orchestra. One seat had a strong reflection from the right side wall. LOC at that seat was below 2dB in the right ear and above 5dB in the left ear. The instruments were not localizable and the sound in that seat was muddy. Deleting the strong sidewall reflection from the impulse responses raised LOC in the right ear to +5dB. The sound improved greatly. We learned that to determine the ability to localize instruments in a particular seat we need to look at the minimum value of LOC from the two ears, not the maximum or average value. “
“In Boston symphony hall the critical distance, where the D/R is one, is only about 17 feet from an omnidirectional source. On the floor the best seats can be forty to fifty feet from the stage, and the great seats in the front of the first balcony are more than 110 feet from the stage. “
“Violins start gradually. LOC was devised assuming the sounds we are trying to localize abruptly rise to full level. Violins can start notes abruptly, but usually they do not. Sound onsets can take 50 milliseconds or more to reach full level. By the time the note reaches full level reflections in a small room have substantial energy. “
“The direct to reverberant ratio needed to localize the violins is uniformly greater than for male speech. The data suggests that the major reason is the slower onset of each note. Reverberation with the one second RT builds up rapidly, and quickly masks the slow onset of the direct sound from the violin. A higher D/R is needed to overcome the masking.”
“With the two second reverberation and short values of pre-delay the D/R needed for localization for the violins is nearly the same as it is for the one second reverberation, but as pre-delay increases violins become easier to localize. The slope of the improvement in localizability is steeper than for male speech. In the theory behind the development of LOC the slope of the improvement with predelay is governed by the length of the comb filter or auto correlator that separates the direct sound from later reverberation. The data for violins suggests that the length of this filter is shorter for harmonics of violins than it is for male speech. This result was expected. High frequency fundamentals do not need as long a filter to obtain the same accuracy as lower frequency fundamentals.“
"t is well known that a prompt early reflection can augment speech and musical instruments. Churches have put a wall and a ceiling behind and over pulpits for centuries. Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better. But it has to be the very last row. The next to last row is no better than the others. In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work. We did some experiments using our virtual room and male speech. The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre. The current version of LOC appeared to work for this case. A reflection 6ms after the direct sound did not affect the localizability of the sound, and was beginning to alter the timbre. A reflection at 7ms reduced the ability to localize, and added an unpleasant timbre. We tried the same experiment with female speech and got the same result. We do not know why this effect occurs, but it is the same for both male and female speech."
“As a result of these experiments we added a 2ms cross fade centered at 6ms to the calculation for LOC.”
Griesinger 2018
http://www.davidgriesinger.com/Localization, Loudness and Proximity 8.pptx
“The ability to localize a sound depends on detecting the direct sound, which alone carries the ITD and ILD…enables localization and proximity. If a sound is perceived as close our attention is involuntarily drawn to it.”
“Aarabi et al…finds that there is a distinct amount of phase randomization that causes separation of sounds to abruptly fail.”
“Successfully separated sources form multiple “foreground” streams…sounds that cannot be localized or separated form a single stream with special properties; the “background stream. The background stream is perceived as distinctly separate from the foreground streams, and is often perceived as fully surrounding. When there are no localized foreground streams all the sound we hear is perceived as a single stream.”
“If we want to know if the ear distinctly hears the direct sound we need to plot the integrated LOUDNESS of direct sound versus the integrated LOUDNESS of the reflections. I say loudness because the ear responds to the logarithm of sound pressure, not the sound power. Our measure for localizing the direct sound, LOC, separately plots the loudness of frequencies above 1000Hz from the direct sound and the build up of the loudness of the reflections inside a 100ms window. The loudness – the integrated neural activity – of the direct sound is given by the area under the blue line inside the black window. Similarly, the loudness of the reflections is given by the area under the red curve inside the black window. LOC is the ratio of the direct sound area to the reflection area in decibels.”
“The experiments showed that early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize. Reflections at 6ms do not increase localization and begin to alter timbre. Reflections at 7ms and later add to the reverberant loudness, and are detrimental both to timbre and localization.”
“We have found that the ability to sharply localize sources in halls and rooms depends on many factors besides the timing and strength of the early reflections. The speed of the attack of each new sound matters a lot…the reverberation time matters, as does the score and tempo of the music. Short notes excite reverberation very little, and tend not to be masked by previous notes. But strong held notes build up the reverberation, and can sometimes mask everything.”
Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“I believe acoustic research has overlooked three critical concepts:
1. Pitch: Why are humans able to distinguish pitch to an accuracy of six parts in 10,000?
(Because pitch (periodicity) enables us to separate pitched signals from noise.)
2. Phase: It is commonly thought that phase is inaudible above 1500Hz.
(False! The phases of the upper harmonics of tones are critical to proximity and source separation.)
3. Attention: Acoustic quality is measured with intelligibility. Attention is more important.
(Sounds that are proximate involuntarily attract attention.)”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies. A had a tiny oscilloscope with a one inch CRT displaying the left/right phase. It was easy to see if the sound was too monaural. But loudspeakers must be carefully placed if spatial bass is to be heard. The arrangement along the long wall shown in the previous slide can often work in small spaces.“
“The alignment of phases in the upper harmonics of tones is vital to source separation and localization. Harmonic tones are created by pulses of air, the release of a rosined string, or strikes of a hammer. The amplitude spikes created by these pulses cut through noise and reverberation…early reflections randomize the phase of these pulses. The consequences are dramatic. ”
Hochgraf 2019
https://acousticstoday.org/the-art-...ons-in-research-and-design-kelsey-a-hochgraf/
The Art of Concert Hall Acoustics: Current Trends and Questions in Research and Design
“Although there is not yet a consensus around specific attributes most correlated with audience listener preference, there is agreement that different people prioritize different elements of the acoustical experience. Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound “
Auditory stream segregation: source and room response
“There is growing consensus among acousticians that although many of these [numerical parameters standardized by ISO 3382-1:2009] are useful, they do not provide a complete representation of concert hall acoustics...the limitations are largely attributable to differences between an omnidirectional sound source and an orchestra and between omnidirectional microphones and the human hearing system.”
“The importance of lateral reflections for spatial impression is well-understood…but more recent research has shown that these reflections are also critical to the perception of dynamic responsiveness…increased perception of dynamic range has also been shown to correlate with increased emotional response...the perception threshold for lateral reflections decreases with increasing sound level, meaning that more lateral reflections will be perceived by the listener as the music crescendos, further heightening the sense of dynamic responsiveness.”
“[T]he tide seems to be shifting away from high surface diffusivity and there is more evidence to substantiate the need for strong lateral reflections,“
Lokki 2019
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””
“It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
“Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”
Vigeant 2019
https://sites.psu.edu/spral/files/2...Hall-Preference-ISRA-Amsterdam-2019-FINAL.pdf
“Sixteen subjects participated in the experiment, all meeting minimum hearing thresholds of 15 dB-HL in the octave bands from 250 –8000 Hz. Subjects were required to have at least five years of formal musical training and were required to be actively studying their instrument or involved in a musical ensemble. The subject pool included 11 males and 5 females with an average age of 24 years. The average musical experience across all subjects was 14 years.”
"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” (0.66-0.74)
“Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87).”
“Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)”
“A set of four orthogonal factors were shown to explain 72% of the total variance in perception, which were interpreted as clarity, strength / envelopment, strength / source width, and brilliance.“
Griesinger 2019
http://www.davidgriesinger.com/Learning to Listen 14.pptx
“Pitch (periodicity) enables us to separate pitched signals from noise…the phases of the upper harmonics of tones are critical to proximity and source separation…sounds that are proximate involuntarily attract attention.”
“To have a spacious bass you need have low correlation between the left and right channels at low frequencies…the arrangement along the long wall shown in the previous slide can often work in small spaces.”
“Comb-like filters tuned to the fundamental period of an amplitude waveform can separate the formants of a particular speaker or instrument from other signals and from noise…The alignment of phases in the upper harmonics of tones is vital to source separation and localization…Recent papers from the field of speech comprehension have come to the same conclusions about the importance of the amplitude waveform of sounds with distinct pitch. They call the process “Source separation by periodicity.””
“For example: Manfred Schroeder began to study concert halls with binaural technology in 1974. He used a dummy head microphone for recording, and crosstalk cancellation for playback. With some modifications these techniques can work well. But Schroeder made a serious error, and it was contagious. He reproduced a whole orchestra with just two speakers. Mixing an orchestra into just two channels eliminates the phase information that gives rise to the perception of proximity. Edison, Dick Campbell, Kimio Hamasaki, Tapio Lokki and the author all find that to make a believable orchestral sound you must use a separate loudspeaker for each instrument! When you hear an orchestral recording from just two speakers in a hall it sounds artificial. But if you listen beyond the LLD the sound blends together, and is a bit more believable. This is the sound of a poor seat, not a good one.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. There was no phase coherence on any instrument. They found adding lateral reflections widened the image. They used the term “Apparent Source Width” (ASW) to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. But ASW reduces localization, eliminates proximity, and decreases envelopment. It is the sound in a poor seat, not in a good one.”
For some reason, the last three sentences remind me of @Karl-Heinz Fink's post: https://www.audiosciencereview.com/...n-without-subwoofer.48559/page-9#post-1743584
Griesinger 2020 (and 2018)
http://www.davidgriesinger.com/The_Physics_of_auditory_proximity.pptx
https://www.aes.org/tmpFiles/elib/20240113/18463.pdf
“Aarabi showed that phase randomization can dramatically increase errors in word recognition in noisy environments…randomization of phase decreases what we now call “Proximity”, and lack of proximity decreases attention and recall.”
“We have been calling the proximity perception “Engagement”, “Clarity”, or “Presence” since 2004. We have a measure for it we call “LOC”.”
“Proximity – the perception that a source is acoustically close - is an important determinant of attention and recall.”
“The ear detects proximity through the phase coherence of upper harmonics in the direct sound, which are randomized by early reflections.”
Proximity may be measured by LOC, which compares sum in dB of direct sound and reflections in 80 ms window, +3dB predicts good proximity
“In most halls proximity disappears over a distance of one meter at a particular distance from the sound sources. We call this the “Limit of Localization Distance” or LLD“
“STI , the speech transmission index, is a standard way of measuring intelligibility in halls and classrooms. We calculated both STI and our measure for proximity, LOC, in a 27’x25’x10’ room with surface absorption of 0.15 and RT 0.34s…a small room with a 0.34 second RT has excellent STI – but the teacher is heard with proximity only in the front seat…this model classroom is greatly improved by increasing the average absorption to 0.3 …The intelligibility measured by STI also increases. A small room needs a lot of absorption if proximity is to be high in all the seats.”
“Barron and Marshall’s classic paper on spatial impression used a single speaker to reproduce a whole orchestra. The orchestra recording had no proximity. Many later experiments repeated this arrangement. They all found that adding lateral reflections widened the image. They used the term “Apparent Source Width” or ASW to describe what they thought was an improvement to the sound. ASW has since been quoted as beneficial to hall acoustics in nearly every article and text book. ASW by definition eliminates sharp localization and decreases proximity.”
“In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time. “
Lokki 2020
https://research.aalto.fi/files/75640242/SCI_Lokki_Auditory_spatial_impression_in_concert_halls.pdf
“Direct sounds and adjacent scattering, i.e. the initial 5 ms of the acoustic response arrives from each source on the stage in frontal directions. In a shoebox hall the stage floor is typically on the ear level of the audience at main parterre, thus the listener does not receive the stage floor reflection…The frequency responses illustrate that in the shoe-box hall the direct sounds lack the low frequencies, but have considerably strong high frequencies. In contrast, in the vineyard hall with a raked audience area, the frequency response of the first 5 ms is quite different due to stage floor reflections.”
“Early reflections until 30 ms...increase the overall loudness, color the sound and might change the perceived width of the source. As said, the temporal envelope preserving lateral reflections integrate to the direct sound best, increasing its quality and preserving the ability to localize. If the reflections are scrambling the phases of upper harmonics, i.e. reflections from heavily diffusing surfaces, the precedence effect might partially break down and such early reflections are not fully integrated to the direct sound (Lokki et al., 2011). Such reflections might increase the perceived width of the source to the detriment of less defined location of the source. As a result the instruments better blend together, but some listeners associate that to reduced clarity.…the shoe-box hall provides prominent lateral reflections already inside this 30 ms time window…the early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level.”
“Later reflections between 30 and 200 ms increase the overall sound energy…In the shoe-box hall, the increase is particularly strong above 200 Hz, equalizing the frequency response to be more or less at at 200 ms after the direct sound. Moreover, the energy in this time window reaches the measurement position almost evenly from all directions”
“Reverberation beyond 200 ms increases the cumulative energy to its final state…contributes to loudness, envelopment, spaciousness, and timbre…Notable differences between these halls can be observed in the smoothness of the overall frequency responses, level of low frequencies, and spatial distribution of sound energy.”
Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”
Lokki 2024
https://acris.aalto.fi/ws/portalfiles/portal/118035577/388_1_10.0020066.pdf
“Categorization of listeners by preference confirmed the emergence of two groups, where listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound and listeners in the other group [5/20] prefer clarity. This aligns with earlier studies of large symphony halls.10 Thus, there do not seem to be fundamental differences between the perception of chamber music halls and symphony halls in this regard.”
My notes in italics
>= Two broad relative preference groups
Lokki 2012: “Here, it was found that assessors can be grouped to two preference groups. Similar grouping has been found also earlier by Schroeder et al.,9 who found similar preference groups related to loud sound and clear sound. Barron4 divided assessors into groups by intimacy and reverberance. There, results also correlate with the results presented here; one group preferred clear and intimate sound and another group preferred loud, enveloping, and reverberant sound.”
Beranek 2016: ““It is interesting that listeners in Boston Symphony Hall can also be divided into two groups, (and possibly three counting those in between)”
Hochgraf 2019: “Several studies have shown that listeners can be categorized into at least two preference groups: one that prefers louder, more reverberant and enveloping acoustics and another that prefers a more intimate and clearer sound“
Vigeant 2019: “Traditionally, preference has been divided into two groups: one group preferring strength and reverberance, and the other preferring clarity and intimacy….although subjects can be placed in two discrete groups, it removes the subtlety and individual variability in the data…large individual differences were observed, not captured fully in the traditional two-group preference model.”
Similar preference groups in stereo listening?
Group 1 : REW (reverberate, envelopment, width)
Lokki 2012: Majority (10/17) prefer “bass, loudness, envelopment, and reverberance”
Lokki 2016: Minority (12/28 or 43%): “others love strong, reverberant and wide sound..The first attribute classes (reverberance/width/loudness) are well explained with LJ [late lateral energy] at all bands and G [strength] at mid frequencies.”
Beranek 2016: “those who prefer the sound in the upper rear second balcony…in the rear second balcony the reverberant sound almost immediately follows the arrival of the direct sound and it is loud and completely enveloping. Those who have subscription seats in the upper balcony praise the sound.
Vigilant 2019: ““Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87)” and reverberance (=0.72)
Lokki 2024 “Listeners in the larger group [15/20] prefer proximate, enveloping, and warm sound…Envelopment correlates mainly with late lateral sound level LJ and strength G. Proximity correlates positively with mid-frequency G and C80 and negatively with low-frequency EDT and T20, i.e., strong and clear sound lead to more proximate sound. Finally, the attribute warmth has high correlation with wide-band G and LJ, but not with low frequency G.”
May correlate with listeners like Toole who prefer ASW?
Group 2: CD (clarity, definition)
Lokki 2012 Minority (7/17) prefer “intimate and close sound with high definition and clear sound…mild reverberance with well-defined sound.”
Lokki 2016: Majority (16/28 or 57%): “Some listeners prefer clarity over reverberance…clarity and definition classes have positive correlations with mid frequency C80 and significant negative correlations with mid frequency EDT and LJ”
C80 is energy ratio before and after 80 ms, measured in dB. C80 frequencies with highest correlations were ~500 Hz to 4 kHz
Griesinger 2013: ““The perception of Clarity fails three ways: 1. Solution: limit early reflections. Control the reverberation time and level. 3. Don’t design for maximum RT at low frequencies.”
Beranek 2016 “those who like the sound best in the front two-thirds of the main floor… On the floor the sound is clear and loud with full bass, with many early reflections, none of which mask the direct sound, and with reverberation that is beautiful…The author of this paper, preferring clarity to the sound, identifies with the main-floor group.”
Vigeant: “"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation.” 0.66-0.74”
Lokki 2024: “listeners in the other group [5/20] prefer clarity…Preference scores in the smaller preference group 2 correlate positively with C80, and negatively with LJ, i.e., almost the same correlation as for the attribute clarity. ”
May be similar to audio professionals’ sensitivity to lateral reflections in small room audio?
Localization
May be measured through LOC, which focuses primarily on upper midrange 1-4 kHz, basically comparing integrated loudness of direct sound with integrated loudness (logarithmic) of reflections in first 100 ms. This was originally developed for speech but had to be modified for violin, due to usually slower sound onset. Similarly, C50 used for speech, C80 for music. The frequency range of C80 with highest correlation to perception of clarity or definition was 500 Hz to 4 kHz.
LLD or limit of localization distance, where localization and proximity fail, can significantly exceeds critical distance where direct:reverberant sound ratio is 1, as in Boston Symphony Hall
Griesinger argues phase preservation of harmonic coherence (seemingly above 1 kHz) in direct sound (up to 5 ms, “early reflections 5ms or less from the direct sound add to the direct sound loudness and increase the ability to localize,” similar time frame for Lokki’s direct response and adjacent scattering) important for pitch separation and localization, Lokki also notes potential negative effects of scrambling phases of upper harmonics on definition/localization/clarity
Griesinger 2018: “t is well known that a prompt early reflection can augment speech and musical instruments…Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better…In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work…The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre…”
Joachim Gerhard (previously of Audio Physic) had proposed listener placement <1m of the wall behind him/jher, arguing that “From experience that this reflection is not so objectionable for phantom image perception” (see future listening room acoustics link)
Griesinger’s description of sound onset or attack is like @j_j's “leading edge” of envelope in psychoacoustics, as phase randomization (which damages localization and the perception of proximity) is like uncorrelated envelopes and edges (JJ notes that decorrelation of leading envelope edges contributes to distance perception—see future psychoacoustics link).
Proximity
Lokki 2012: The main overall preference driver in this study was an attribute cluster interpreted as Proximity (related to distance) [also depth and intimacy], which correlates highly with the average of all preference ratings.
Vigeant 2019: “Another interesting analysis, found in the second column of Table 2, is the correlation of average preference with each of the ten subjective attributes. The highest correlation was found with proximity (=0.81)” [close vs far]
Frequency range and dynamics
Lokki 2013: Figure 3 shows overall FR is highest around 150 Hz or so, lower but reasonably flat from a few hundred Hz to about 1 kHz, then a gradual decline in FR above ~1 kHz until about 6-7 kHz, above which it drops off much more steeply. The direct sound actually has relatively little response <~150 Hz, but the FR fills in <~450 Hz over time, especially <200 Hz
Lokki 2016: “When an orchestra plays in fortissimo the low frequencies below 200 Hz and high frequencies above 3kHz ares substantially pronounced in comparison with piano passages”
Beranek 2016: “P€atynen et al.10 demonstrate that when an orchestra plays fortissimo the frequency spectrum changes..It was found that between 400 and 2000 Hz the spectrum for the (ff) sound is about 7 dB more intense than that for the (pp) sound and that between 2000 and 8000Hz the increase is more than 15dB (see upper curve in Fig. 3).”
Lokki 2019: ““It should be emphasized that the important frequency range in the concert hall is from 20 Hz up to 12–15 kHz…the construction of a stage is important especially for double basses, which play notes having fundamental frequencies as low as 33 Hz.”
Lokki 2020: “The early reflections (between 5-30 ms) in the shoe-box hall strengthen the low frequencies below 200 Hz substantially, yet the middle frequencies up to 1 kHz remain at a relatively low level”
Given change in perceived frequency response over time, along with slower onset of sound in music, phase at lower frequencies may be less important to maintain relative to the rest of the frequency spectrum.
Floor and ceiling reflections
Lokki 2016: ““The form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
Lokki 2019: ““When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections.“
Lokki 2020: “If such a reflection is coming from the median plane, i.e. from ceiling or reflectors above an orchestra, the sound quality might be reduced due to coloration, which is the same in both ears. Moreover, such ceiling reflection might increase the interaural correlation, which could increase the perceived distance of the source.”
No floor bounce per se in shoebox halls, but ceiling reflections may lower perception of proximity
Rear wall (behind the orchestra) reflections
Lokki 2019: “Our current understanding, supported by Kahle, is that the back wall should be absorptive. Such a wall treatment eases the balancing of instrument groups and increases the clarity for the audience.”
Griesinger 2020: “In many small halls there is a strong reflection from the back of the stage which reduces the number of seats with proximity. Adding absorption to the back and side walls can help a lot without much reduction of the reverb time.”
Rear wall reflections may reduce perception of proximity
Last edited: