Model
I have been trying to educate myself a little more about concert hall acoustics recently, so I’ve been skimming papers and presentations from David Greisinger (http://www.davidgriesinger.com/) and Tapio Lokki (https://users.aalto.fi/~ktlokki/). I’ve also had a long-standing interest in home audio and read Floyd Toole’s Sound Reproduction (primarily the first/second edition) many times. I noticed a number of interesting overlaps between preferences in concert hall acoustics and loudspeakers in rooms. I had also been curious about some of the apparent limitations in findings from Harman studies (obvious examples would include individual preference regarding lateral reflections, also differences between trained and untrained listeners with respect to target room curves, but less obvious ones would include distortion like IMD or HOMs, effects of diffraction, floor bounce, and other relevant effects that may not be apparent in CEA2034 or similar sets of measurements). I’d like to summarize some of the non-Toole/Harman research below in mostly chronologic order, also including some additional information from Toole that was included in but also supplements Sound Reproduction. I apologize for inaccuracies in my summaries, simplified attribution of multiple authors mostly to the ones above, and unformatted links.
Bech 1995
https://asa.scitation.org/doi/10.1121/1.413047
“electroacoustic simulation of the right-hand loudspeaker of a stereophonic setup, positioned in a small room.”
“The results show that only the first-order ceiling and floor reflections are likely to contribute individually to the timbre of a speech signal. For a noise signal additional reflections, from the wall to the left of the listener, will individually contribute to the timbre. The threshold of detection for all reflections depends on the level of the reverberant field. If the reverberant field is removed, thresholds will decrease by 2-5 dB.”
“care should be taken when generalizating from the results presented in this paper.”
Bech 1996
https://asa.scitation.org/doi/10.1121/1.414952
“The results have confirmed the findings of the first report that the floor reflection will contribute on an individual basis to the timbre of a noise signal.”
Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”
Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound…high correlation between overall preference and subjective Proximity”
Concert hall attribute clusters
Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
Tasting music like wine
Laukkanen 2014
https://users.aalto.fi/~ktlokki/Publs/mst_laukkanen.pdf
“The results of this work show that mixing engineers prefer rooms with T60 between 0.17 and 0.26 s, which is slightly less than presented in Chapter 3.5. For mastering engineers, preferred T60 seems to be between 0.3 and 0.4 s, which is significantly more than with mixing engineers.”
“The results of the listening tests clearly showed that mixing engineers prefer acoustically dry rooms. Accurate stereo image and the amount of room reverberation were the most important factors for them. In contrast, mastering engineers seemed to prefer more lively rooms and the frequency balance was the most important factor for them. The preference rating seemed also to vary between different music samples.”
Toole 2008
Sound Reproduction
“Why do recording and mixing engineers prefer to listen with reduced lateral reflections (higher IACC)? Perhaps they need to hear things that recreational listeners don’t. This is a popular explanation, and it sounds reasonable, but experiments reported in Section 6.2 indicate that we humans have a remarkable ability to hear what is in a recording in spite of room reflections—lots of them. But there is an alternative explanation, based on the observation that some listeners can become sensitized to these sounds and hear them in an exaggerated form. Ando et al. (2000) found that musicians judge reflections to be about seven times greater than ordinary listeners, meaning that they derive a satisfying amount of spaciousness from reflections at a much lower sound level than ordinary folk: “Musicians prefer weaker amplitudes than listeners do.” It is logical to think that this might apply to recording professionals as well, perhaps even more so, because they create artificial reflections electronically and manipulate them at will while listening to the effects. There can be no better opportunity for training and/or adaptation. In fact, it is entirely reasonable to think that acousticians who spend much of their lives moving around in rooms while listening to revealing test signals can become sensitized to aspects of sound fields that ordinary listeners blithely ignore. This is a caution to all of us who work in the fi eld of audio and acoustics. Our preferences may reflect accumulated biases and therefore may not be the same as those of our customers.”
Toole 2015
https://www.aes.org/e-lib/browse.cfm?elib=17839
“In Fig. 14 the author has modified the original data to separately show the result of evaluations by trained and untrained listeners…More data would be enlightening, but this amount is sufficient to indicate that a single target curve is not likely to satisfy all listeners. Add to this the program variations created by the “circle of confusion” and there is a strong argument for incorporating easily accessible bass and treble tone controls in playback equipment. The first task for such controls would be to allow users to optimize the spectral balance of their loudspeakers in their rooms, and, on an ongoing basis, to compensate for spectral imbalances as they appear in movies and music.”
“The attenuated high frequencies preferred by the trained listeners stands in contrast to the preferences exhibited by those same listeners in numerous double-blind multiple-comparison loudspeaker evaluations…Is this a consequence of the different experimental methods: the different listener tasks? In one, listeners adjusted the bass and/or treble balance in a single loudspeaker model; in the other they rated spectral balances and other attributes in randomized comparisons of different products. It is a subtle but important difference awaiting an explanation.”
Toole 2008
Sound Reproduction
“The shape of the room curve is clearly signaled in the shapes of both the “early-reflections” curve and the inverted DI.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“The main results show that listeners can be categorized into two different preference classes.”
Minority (43%): “Some listeners prefer clarity over reverberance”
Majority (57%): “others love strong, reverberant and wide sound”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“the form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012).Theearlyreflectionsareperceptuallygroupedwiththesourcestreamsthroughtheprecedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence.At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lokki et al.2015).
Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””
Bech and Lokki 2019
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_146_iss_5_3562_1.pdf
Sound field reproduction using spherical loudspeaker array in anechoic chamber, listeners told ““Imagine that you are in a typical residential room, listening to a 2-ch stereophonic reproduction over loudspeakers.”
Four perceptual constructs comprising attribute clusters
“One could attempt to alter the DRR within a field by means of directivity control in the loudspeakers, aiming to evoke certain perceptual aspects that would otherwise be dominated by the room’s natural acoustical field.”
Toole 2008
Sound Reproduction
“In-head localization seems like the logical opposite of an enveloping, external, and spacious auditory illusion. Perceptions of sounds originating inside the head, which routinely occur in headphone listening, can also occur in loudspeaker listening when the direct sound is not supported by the right amount and kind of reflected sound. The author and his colleagues have experienced the phenomenon many times when listening to stereo recordings in an anechoic chamber, usually with acoustically “dry” sounds hard panned to center or, less often, to the sides. It prompted an investigation (Toole, 1970), the conclusion of which was that there is a continuum of localization experience from external at a distance through to totally within the head. It is often noted with higher frequencies, and it can happen in a normal room with loudspeakers that have high directivity or in any situation where a strong direct sound is heard without appropriate reflections. Moulton (1995) noted that “speakers with narrow high-frequency dispersion . . . tend to project the phantom at or in front of the lateral speaker plane.” In an anechoic chamber, it can occur when listening to a single loudspeaker, especially on the frontal axis, in which case front-back reversals are also frequent occurrences. This phenomenon is so strong that it need not be a “blind” situation. Interestingly, a demonstration of four-loudspeaker Ambisonic recordings played in an anechoic chamber yielded an auditory impression that was almost totally within the head. This was a great disappointment to the gathered enthusiasts, all of whom anticipated an approximation of perfection. It suggested that, psychoacoustically, something fundamentally important was not being captured or communicated to the ears. An identical setup in a normally reflective room sounded far more realistic, even though the room reflections were a substantial corruption of the encoded sounds arriving at the ears.”
Toole 2016
https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation
Toole 2020
https://gearspace.com/board/showpost.php?p=15187387&postcount=61
Here follows some interpretations, generalizations, and speculations on my part. I realize that this is subject to significant confirmation bias on my part, but I hope that it may contribute to further discussion.
Listener preferences (general, but could possibly vary depending on the listener task at hand)
Primary distinction from Lokki 2016 (clarity vs reverberance/width/loudness)
1. Clarity in the concert hall, since opposed to reverberance and width, may correlate with preference for loudspeaker proximity in Bech and Lokki 2019. Given underlying attributes, likely correlates with audiophile “imaging.” If so, loudspeakers with high directivity and/or more “dead” rooms with lower RT and higher DRR through the use of absorption may be preferred (see Toole quote directly above, as well as Laukannen 2014), particularly at first reflection points, also toe-in for speakers directly aimed at listener. This preference class may represent a relative minority of listeners, see Lokki 2012 and 2016 (first).
2. Width and envelopment (earlier energy of sound field), likely correlates with audiophile “wide soundstage,” may benefit from wider-radiating loudspeakers with more even off-axis response and rooms that maintain or promote lateral reflections, particularly when speakers are pointed straight ahead, instead of toed-in. This preference class may represent a relative majority of listeners and possibly the target customers for much of Harman preference testing research (even though this is not necessarily reflected in the Olive models) and Toole’s suggestions for possible setups, depending on individual preference (https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation)
Other perceptual factors from Lokki
3. Reverberance (later energy of sound field) likely to benefit from speakers able to provide relatively later reflections (dipoles, ?bipoles, cross-firing narrow constant directivity or other specialized designs), rooms with more diffusion and/or use of angled reflection instead of absorption (like RFZ or CID for latter) to avoid excessively low RT30 and EDT (or “deadness”), see Laukannen 2014 and Lokki 2016 (first). This can be balanced with #1 Clarity above through use of diffusion outside of the median plane and lateral first reflection points.
4. Bass: 25-30.5% of preference estimate in Olive models, extension in one and primarily quality defined by absolute average deviation below 300 Hz in the other
5. Timbre: Despite some ambiguity between different usages of the term, like “spatial balance” from Lokki and what Greisinger refers to as instrument timbre defined from 1-4 kHz, I suspect that there may be at least five aspects here that may contribute.
A. Floor reflections result in comb filter typically starting in the 200-300 Hz range resulting from path length differences, so not well-reflected in anechoic or near field measurements, but loudspeakers that take this into account may be preferred. This is typically avoided in studio control rooms due to presence of consoles.
B. Wider baffle speakers transition from omnidirectional to forward-radiating at a lower frequency. Despite baffle step compensation, there is still a difference in directivity index, which may suggest relevance for the room curve, see next point C.
C. “Smoothly changing” (controlled?) versus “relatively constant” directivity speakers seem to demonstrate a more nearly diagonal straight line versus a relatively stair step DI with the middle stair ranging from several hundred to several thousand Hz. I speculate that preference for the former may correlate with preference for “clarity” (in #1 above in the listener preference model), the latter with #2. If the inverted DI curve predicts the shape of the room curve, compare the target curve for trained vs all listeners in Figure 14 of Toole 2015 with what I wrote about “smoothly changing” versus “relatively constant” directivity and possible correlations for relative preferences for clarity vs width/envelopment, as well as sensitivity to lateral reflections.
D. Diffraction, effects are controversial (see https://www.linkwitzlab.com/diffraction.htm vs https://www.avsforum.com/threads/ho...-science-shows.3038828/page-104#post-57684656), but might include local room effects in addition to the intrinsic loudspeaker ones.
E. Distortion, linear or otherwise
Young-Ho
I have been trying to educate myself a little more about concert hall acoustics recently, so I’ve been skimming papers and presentations from David Greisinger (http://www.davidgriesinger.com/) and Tapio Lokki (https://users.aalto.fi/~ktlokki/). I’ve also had a long-standing interest in home audio and read Floyd Toole’s Sound Reproduction (primarily the first/second edition) many times. I noticed a number of interesting overlaps between preferences in concert hall acoustics and loudspeakers in rooms. I had also been curious about some of the apparent limitations in findings from Harman studies (obvious examples would include individual preference regarding lateral reflections, also differences between trained and untrained listeners with respect to target room curves, but less obvious ones would include distortion like IMD or HOMs, effects of diffraction, floor bounce, and other relevant effects that may not be apparent in CEA2034 or similar sets of measurements). I’d like to summarize some of the non-Toole/Harman research below in mostly chronologic order, also including some additional information from Toole that was included in but also supplements Sound Reproduction. I apologize for inaccuracies in my summaries, simplified attribution of multiple authors mostly to the ones above, and unformatted links.
Bech 1995
https://asa.scitation.org/doi/10.1121/1.413047
“electroacoustic simulation of the right-hand loudspeaker of a stereophonic setup, positioned in a small room.”
“The results show that only the first-order ceiling and floor reflections are likely to contribute individually to the timbre of a speech signal. For a noise signal additional reflections, from the wall to the left of the listener, will individually contribute to the timbre. The threshold of detection for all reflections depends on the level of the reverberant field. If the reverberant field is removed, thresholds will decrease by 2-5 dB.”
“care should be taken when generalizating from the results presented in this paper.”
Bech 1996
https://asa.scitation.org/doi/10.1121/1.414952
“The results have confirmed the findings of the first report that the floor reflection will contribute on an individual basis to the timbre of a noise signal.”
Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”
Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound…high correlation between overall preference and subjective Proximity”
Concert hall attribute clusters
- Loudness/envelopment/reverberance: spaciousness, width of sound image
- Bassiness/proximity: warmth, intimacy, naturally close
- Definition/clarity: separating sound, focus and localization
- Majority (10/17): “Close sound with a lot of bass, loudness, envelopment, and reverberance. The definition is very low, but subjective clarity is very diverse within these three halls.”
- Minority (7/17): “They render the most intimate sound that contains enough bass and loudness. They have mild reverberance with well-defined sound.”
Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors
- Loudness (Strength, Level, Intensity): The louder the better.
- Immersion (Presence, Intimacy, Envelopment, Spaciousness): Engaging and enveloping sound is interesting and desired.
- Spatial extent (Distance, Depth, Source width, Balance): Proximite and spatially balanced sound (no image shift) is good.
- Definition (Clarity, Articulation, Blend, Discrimination, Sharpness): Different instruments should have sharp articulation with a nice blend.
- Timbre (Openness, Brilliance, Balance, Warmth, Bassiness): Balanced frequency response with small emphasis on bass and enough high frequencies give open and brilliant sound.
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
Tasting music like wine
Laukkanen 2014
https://users.aalto.fi/~ktlokki/Publs/mst_laukkanen.pdf
“The results of this work show that mixing engineers prefer rooms with T60 between 0.17 and 0.26 s, which is slightly less than presented in Chapter 3.5. For mastering engineers, preferred T60 seems to be between 0.3 and 0.4 s, which is significantly more than with mixing engineers.”
“The results of the listening tests clearly showed that mixing engineers prefer acoustically dry rooms. Accurate stereo image and the amount of room reverberation were the most important factors for them. In contrast, mastering engineers seemed to prefer more lively rooms and the frequency balance was the most important factor for them. The preference rating seemed also to vary between different music samples.”
Toole 2008
Sound Reproduction
“Why do recording and mixing engineers prefer to listen with reduced lateral reflections (higher IACC)? Perhaps they need to hear things that recreational listeners don’t. This is a popular explanation, and it sounds reasonable, but experiments reported in Section 6.2 indicate that we humans have a remarkable ability to hear what is in a recording in spite of room reflections—lots of them. But there is an alternative explanation, based on the observation that some listeners can become sensitized to these sounds and hear them in an exaggerated form. Ando et al. (2000) found that musicians judge reflections to be about seven times greater than ordinary listeners, meaning that they derive a satisfying amount of spaciousness from reflections at a much lower sound level than ordinary folk: “Musicians prefer weaker amplitudes than listeners do.” It is logical to think that this might apply to recording professionals as well, perhaps even more so, because they create artificial reflections electronically and manipulate them at will while listening to the effects. There can be no better opportunity for training and/or adaptation. In fact, it is entirely reasonable to think that acousticians who spend much of their lives moving around in rooms while listening to revealing test signals can become sensitized to aspects of sound fields that ordinary listeners blithely ignore. This is a caution to all of us who work in the fi eld of audio and acoustics. Our preferences may reflect accumulated biases and therefore may not be the same as those of our customers.”
Toole 2015
https://www.aes.org/e-lib/browse.cfm?elib=17839
“In Fig. 14 the author has modified the original data to separately show the result of evaluations by trained and untrained listeners…More data would be enlightening, but this amount is sufficient to indicate that a single target curve is not likely to satisfy all listeners. Add to this the program variations created by the “circle of confusion” and there is a strong argument for incorporating easily accessible bass and treble tone controls in playback equipment. The first task for such controls would be to allow users to optimize the spectral balance of their loudspeakers in their rooms, and, on an ongoing basis, to compensate for spectral imbalances as they appear in movies and music.”
“The attenuated high frequencies preferred by the trained listeners stands in contrast to the preferences exhibited by those same listeners in numerous double-blind multiple-comparison loudspeaker evaluations…Is this a consequence of the different experimental methods: the different listener tasks? In one, listeners adjusted the bass and/or treble balance in a single loudspeaker model; in the other they rated spectral balances and other attributes in randomized comparisons of different products. It is a subtle but important difference awaiting an explanation.”
Toole 2008
Sound Reproduction
“The shape of the room curve is clearly signaled in the shapes of both the “early-reflections” curve and the inverted DI.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“The main results show that listeners can be categorized into two different preference classes.”
Minority (43%): “Some listeners prefer clarity over reverberance”
Majority (57%): “others love strong, reverberant and wide sound”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“the form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012).Theearlyreflectionsareperceptuallygroupedwiththesourcestreamsthroughtheprecedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence.At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lokki et al.2015).
Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics
- Loudness, volume, level, strength, body, dynamic range
- Intimacy, proximity, source presence
- Spatial impression, width, envelopment
- Reverberance, liveness, fullness
- Clarity, definition, articulation, sharpness, localization
- Timbre, warmth, spectral balance
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””
Bech and Lokki 2019
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_146_iss_5_3562_1.pdf
Sound field reproduction using spherical loudspeaker array in anechoic chamber, listeners told ““Imagine that you are in a typical residential room, listening to a 2-ch stereophonic reproduction over loudspeakers.”
Four perceptual constructs comprising attribute clusters
- Reverberance: relates to the later energy [of the sound field], “excellent relation” to RT30 and early decay time
- Width and envelopment: relate to the earlier energy of the sound field
- Bass
- Proximity, negatively correlates to width and envelopment, “strong correlation” with clarity index 50 (C50) and direct to reverb ratio (DRR)
“One could attempt to alter the DRR within a field by means of directivity control in the loudspeakers, aiming to evoke certain perceptual aspects that would otherwise be dominated by the room’s natural acoustical field.”
Toole 2008
Sound Reproduction
“In-head localization seems like the logical opposite of an enveloping, external, and spacious auditory illusion. Perceptions of sounds originating inside the head, which routinely occur in headphone listening, can also occur in loudspeaker listening when the direct sound is not supported by the right amount and kind of reflected sound. The author and his colleagues have experienced the phenomenon many times when listening to stereo recordings in an anechoic chamber, usually with acoustically “dry” sounds hard panned to center or, less often, to the sides. It prompted an investigation (Toole, 1970), the conclusion of which was that there is a continuum of localization experience from external at a distance through to totally within the head. It is often noted with higher frequencies, and it can happen in a normal room with loudspeakers that have high directivity or in any situation where a strong direct sound is heard without appropriate reflections. Moulton (1995) noted that “speakers with narrow high-frequency dispersion . . . tend to project the phantom at or in front of the lateral speaker plane.” In an anechoic chamber, it can occur when listening to a single loudspeaker, especially on the frontal axis, in which case front-back reversals are also frequent occurrences. This phenomenon is so strong that it need not be a “blind” situation. Interestingly, a demonstration of four-loudspeaker Ambisonic recordings played in an anechoic chamber yielded an auditory impression that was almost totally within the head. This was a great disappointment to the gathered enthusiasts, all of whom anticipated an approximation of perfection. It suggested that, psychoacoustically, something fundamentally important was not being captured or communicated to the ears. An identical setup in a normally reflective room sounded far more realistic, even though the room reflections were a substantial corruption of the encoded sounds arriving at the ears.”
Toole 2016
https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation
Toole 2020
https://gearspace.com/board/showpost.php?p=15187387&postcount=61
Here follows some interpretations, generalizations, and speculations on my part. I realize that this is subject to significant confirmation bias on my part, but I hope that it may contribute to further discussion.
Listener preferences (general, but could possibly vary depending on the listener task at hand)
Primary distinction from Lokki 2016 (clarity vs reverberance/width/loudness)
1. Clarity in the concert hall, since opposed to reverberance and width, may correlate with preference for loudspeaker proximity in Bech and Lokki 2019. Given underlying attributes, likely correlates with audiophile “imaging.” If so, loudspeakers with high directivity and/or more “dead” rooms with lower RT and higher DRR through the use of absorption may be preferred (see Toole quote directly above, as well as Laukannen 2014), particularly at first reflection points, also toe-in for speakers directly aimed at listener. This preference class may represent a relative minority of listeners, see Lokki 2012 and 2016 (first).
2. Width and envelopment (earlier energy of sound field), likely correlates with audiophile “wide soundstage,” may benefit from wider-radiating loudspeakers with more even off-axis response and rooms that maintain or promote lateral reflections, particularly when speakers are pointed straight ahead, instead of toed-in. This preference class may represent a relative majority of listeners and possibly the target customers for much of Harman preference testing research (even though this is not necessarily reflected in the Olive models) and Toole’s suggestions for possible setups, depending on individual preference (https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation)
Other perceptual factors from Lokki
3. Reverberance (later energy of sound field) likely to benefit from speakers able to provide relatively later reflections (dipoles, ?bipoles, cross-firing narrow constant directivity or other specialized designs), rooms with more diffusion and/or use of angled reflection instead of absorption (like RFZ or CID for latter) to avoid excessively low RT30 and EDT (or “deadness”), see Laukannen 2014 and Lokki 2016 (first). This can be balanced with #1 Clarity above through use of diffusion outside of the median plane and lateral first reflection points.
4. Bass: 25-30.5% of preference estimate in Olive models, extension in one and primarily quality defined by absolute average deviation below 300 Hz in the other
5. Timbre: Despite some ambiguity between different usages of the term, like “spatial balance” from Lokki and what Greisinger refers to as instrument timbre defined from 1-4 kHz, I suspect that there may be at least five aspects here that may contribute.
A. Floor reflections result in comb filter typically starting in the 200-300 Hz range resulting from path length differences, so not well-reflected in anechoic or near field measurements, but loudspeakers that take this into account may be preferred. This is typically avoided in studio control rooms due to presence of consoles.
B. Wider baffle speakers transition from omnidirectional to forward-radiating at a lower frequency. Despite baffle step compensation, there is still a difference in directivity index, which may suggest relevance for the room curve, see next point C.
C. “Smoothly changing” (controlled?) versus “relatively constant” directivity speakers seem to demonstrate a more nearly diagonal straight line versus a relatively stair step DI with the middle stair ranging from several hundred to several thousand Hz. I speculate that preference for the former may correlate with preference for “clarity” (in #1 above in the listener preference model), the latter with #2. If the inverted DI curve predicts the shape of the room curve, compare the target curve for trained vs all listeners in Figure 14 of Toole 2015 with what I wrote about “smoothly changing” versus “relatively constant” directivity and possible correlations for relative preferences for clarity vs width/envelopment, as well as sensitivity to lateral reflections.
D. Diffraction, effects are controversial (see https://www.linkwitzlab.com/diffraction.htm vs https://www.avsforum.com/threads/ho...-science-shows.3038828/page-104#post-57684656), but might include local room effects in addition to the intrinsic loudspeaker ones.
E. Distortion, linear or otherwise
Young-Ho
Last edited: