• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Refining a listener and loudspeaker model based on readings of Lokki, Bech, Toole, et al

youngho

Addicted to Fun and Learning
Joined
Apr 21, 2019
Messages
505
Likes
851
Model

I have been trying to educate myself a little more about concert hall acoustics recently, so I’ve been skimming papers and presentations from David Greisinger (http://www.davidgriesinger.com/) and Tapio Lokki (https://users.aalto.fi/~ktlokki/). I’ve also had a long-standing interest in home audio and read Floyd Toole’s Sound Reproduction (primarily the first/second edition) many times. I noticed a number of interesting overlaps between preferences in concert hall acoustics and loudspeakers in rooms. I had also been curious about some of the apparent limitations in findings from Harman studies (obvious examples would include individual preference regarding lateral reflections, also differences between trained and untrained listeners with respect to target room curves, but less obvious ones would include distortion like IMD or HOMs, effects of diffraction, floor bounce, and other relevant effects that may not be apparent in CEA2034 or similar sets of measurements). I’d like to summarize some of the non-Toole/Harman research below in mostly chronologic order, also including some additional information from Toole that was included in but also supplements Sound Reproduction. I apologize for inaccuracies in my summaries, simplified attribution of multiple authors mostly to the ones above, and unformatted links.

Bech 1995
https://asa.scitation.org/doi/10.1121/1.413047
“electroacoustic simulation of the right-hand loudspeaker of a stereophonic setup, positioned in a small room.”
“The results show that only the first-order ceiling and floor reflections are likely to contribute individually to the timbre of a speech signal. For a noise signal additional reflections, from the wall to the left of the listener, will individually contribute to the timbre. The threshold of detection for all reflections depends on the level of the reverberant field. If the reverberant field is removed, thresholds will decrease by 2-5 dB.”
“care should be taken when generalizating from the results presented in this paper.”

Bech 1996
https://asa.scitation.org/doi/10.1121/1.414952
“The results have confirmed the findings of the first report that the floor reflection will contribute on an individual basis to the timbre of a noise signal.”

Private communication from Lokki in response to my question about seat dip effect versus floor bounce: “Floor bounce is a single reflection that creates quite narrow (high Q) comb filter to the frequency response. The seat-dip effect is a combination of several phenomena, thus the dip has usually lower Q and it is only one dip, not a comb filter at higher frequencies.”

Lokki 2012
https://users.aalto.fi/~ktlokki/Publs/JASA_lokki2012.pdf
“The results show that the main discriminative attributes between halls are loudness, envelopment, and reverberance. The second large cluster of attributes consists of bassiness and proximity attributes. The third main perceptual dimension has definition and clarity attributes. The preference judgments were divided into two groups of assessors, the first preferring concert halls with loud, enveloping and reverberant sound. The second group preferred concert halls that render intimate and close sound with high definition and clear sound…high correlation between overall preference and subjective Proximity”
Concert hall attribute clusters
  1. Loudness/envelopment/reverberance: spaciousness, width of sound image
  2. Bassiness/proximity: warmth, intimacy, naturally close
  3. Definition/clarity: separating sound, focus and localization
Listener concert hall preferences
  1. Majority (10/17): “Close sound with a lot of bass, loudness, envelopment, and reverberance. The definition is very low, but subjective clarity is very diverse within these three halls.”
  2. Minority (7/17): “They render the most intimate sound that contains enough bass and loudness. They have mild reverberance with well-defined sound.”
High correlation between overall preference and subjective Proximity

Lokki 2013
http://dx.doi.org/10.1121/1.4800481
Concert hall perceptual factors
  1. Loudness (Strength, Level, Intensity): The louder the better.
  2. Immersion (Presence, Intimacy, Envelopment, Spaciousness): Engaging and enveloping sound is interesting and desired.
  3. Spatial extent (Distance, Depth, Source width, Balance): Proximite and spatially balanced sound (no image shift) is good.
  4. Definition (Clarity, Articulation, Blend, Discrimination, Sharpness): Different instruments should have sharp articulation with a nice blend.
  5. Timbre (Openness, Brilliance, Balance, Warmth, Bassiness): Balanced frequency response with small emphasis on bass and enough high frequencies give open and brilliant sound.
Lokki 2014
https://physicstoday.scitation.org/doi/10.1063/PT.3.2242
Tasting music like wine

Laukkanen 2014
https://users.aalto.fi/~ktlokki/Publs/mst_laukkanen.pdf
“The results of this work show that mixing engineers prefer rooms with T60 between 0.17 and 0.26 s, which is slightly less than presented in Chapter 3.5. For mastering engineers, preferred T60 seems to be between 0.3 and 0.4 s, which is significantly more than with mixing engineers.”
“The results of the listening tests clearly showed that mixing engineers prefer acoustically dry rooms. Accurate stereo image and the amount of room reverberation were the most important factors for them. In contrast, mastering engineers seemed to prefer more lively rooms and the frequency balance was the most important factor for them. The preference rating seemed also to vary between different music samples.”

Toole 2008
Sound Reproduction
“Why do recording and mixing engineers prefer to listen with reduced lateral reflections (higher IACC)? Perhaps they need to hear things that recreational listeners don’t. This is a popular explanation, and it sounds reasonable, but experiments reported in Section 6.2 indicate that we humans have a remarkable ability to hear what is in a recording in spite of room reflections—lots of them. But there is an alternative explanation, based on the observation that some listeners can become sensitized to these sounds and hear them in an exaggerated form. Ando et al. (2000) found that musicians judge reflections to be about seven times greater than ordinary listeners, meaning that they derive a satisfying amount of spaciousness from reflections at a much lower sound level than ordinary folk: “Musicians prefer weaker amplitudes than listeners do.” It is logical to think that this might apply to recording professionals as well, perhaps even more so, because they create artificial reflections electronically and manipulate them at will while listening to the effects. There can be no better opportunity for training and/or adaptation. In fact, it is entirely reasonable to think that acousticians who spend much of their lives moving around in rooms while listening to revealing test signals can become sensitized to aspects of sound fields that ordinary listeners blithely ignore. This is a caution to all of us who work in the fi eld of audio and acoustics. Our preferences may reflect accumulated biases and therefore may not be the same as those of our customers.”

Toole 2015
https://www.aes.org/e-lib/browse.cfm?elib=17839
“In Fig. 14 the author has modified the original data to separately show the result of evaluations by trained and untrained listeners…More data would be enlightening, but this amount is sufficient to indicate that a single target curve is not likely to satisfy all listeners. Add to this the program variations created by the “circle of confusion” and there is a strong argument for incorporating easily accessible bass and treble tone controls in playback equipment. The first task for such controls would be to allow users to optimize the spectral balance of their loudspeakers in their rooms, and, on an ongoing basis, to compensate for spectral imbalances as they appear in movies and music.”
“The attenuated high frequencies preferred by the trained listeners stands in contrast to the preferences exhibited by those same listeners in numerous double-blind multiple-comparison loudspeaker evaluations…Is this a consequence of the different experimental methods: the different listener tasks? In one, listeners adjusted the bass and/or treble balance in a single loudspeaker model; in the other they rated spectral balances and other attributes in randomized comparisons of different products. It is a subtle but important difference awaiting an explanation.”

Toole 2008
Sound Reproduction
“The shape of the room curve is clearly signaled in the shapes of both the “early-reflections” curve and the inverted DI.”

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“The main results show that listeners can be categorized into two different preference classes.”
Minority (43%): “Some listeners prefer clarity over reverberance”
Majority (57%): “others love strong, reverberant and wide sound”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/ICA2016-0465.pdf
“the form of a human head strengthens these high frequencies for the sound coming from the side, thus showing the benefit of lateral reflections instead of reflections from the ceiling.”

Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/p43.pdf
“A few people, e.g.,Kahle(2013) has suggested that the auditory perception of a symphony orchestra playing in a concert hall can be understood with respect to two main percepts: the source presence and the room presence. The source presence is the continuous perception of the sound sources in the hall while the room presence is the perception of the space the music is listened to.These two are separate entities in the perceptual domain. If a hall can create these two"auditory streams",i.e., they are distinct and separate, then it is proposed this may permit both good clarity and plentiful, enveloping reverberation at the same time. The formation of the auditory streams is possible through stream segregation (Griesinger, 1997) and is subject to the perceptual grouping laws therein (Moore, 2012).Theearlyreflectionsareperceptuallygroupedwiththesourcestreamsthroughtheprecedence effect (Litovsky, Colburn, Yost, & Guzman, 1999),and affect the width, loudness, and timbre of the auditory events (Blauert, 1997).In this way, the direct sound of the orchestra and the early reflections of the hall combine to make up the source presence. The late reflections, i.e. reverberation, form the context and space for the music, and lend the music support, embellishment, and a sense of depth, providing the listener with a sense of envelopment; that is, room presence.At the moment, there is no clear consensus how these two streams are formed, or do we even need them. Naturally, more research is needed, including the spatial aspects of early and late reflections (Lokki et al.2015).

Lokki 2017
https://users.aalto.fi/~ktlokki/Publs/kuusinen_aaua_2017.pdf
Wheel of concert hall acoustics
  1. Loudness, volume, level, strength, body, dynamic range
  2. Intimacy, proximity, source presence
  3. Spatial impression, width, envelopment
  4. Reverberance, liveness, fullness
  5. Clarity, definition, articulation, sharpness, localization
  6. Timbre, warmth, spectral balance
Lokki 2019
https://doi.org/10.3390/acoustics1020025
Concert halls and architectural features
“When median plane reflections are delayed in time due to the height of the room, the clarity of sound is improved, as our brains have more time to process the direct sound and the first lateral early reflections. In fact, sometimes we almost perceive two “sound streams”, the early sound and reverberation separately, and Kahle [35] calls these streams as the “source presence” and the “room presence”, respectively.””

Bech and Lokki 2019
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_146_iss_5_3562_1.pdf
Sound field reproduction using spherical loudspeaker array in anechoic chamber, listeners told ““Imagine that you are in a typical residential room, listening to a 2-ch stereophonic reproduction over loudspeakers.”
Four perceptual constructs comprising attribute clusters
  1. Reverberance: relates to the later energy [of the sound field], “excellent relation” to RT30 and early decay time
  2. Width and envelopment: relate to the earlier energy of the sound field
  3. Bass
  4. Proximity, negatively correlates to width and envelopment, “strong correlation” with clarity index 50 (C50) and direct to reverb ratio (DRR)
“Assessors systematically preferred the sound fields with lower RT. In our study, the most preferred acoustical conditions presented fields that evoked the sense of being less reverberant and less wide and enveloping. The sources were perceived as closer to the listener, exhibiting high levels of proximity. It is also important to note that the current results suggested that a negative preference is apparent for acoustical conditions with RT higher than 0.4 s”
“One could attempt to alter the DRR within a field by means of directivity control in the loudspeakers, aiming to evoke certain perceptual aspects that would otherwise be dominated by the room’s natural acoustical field.”

Toole 2008
Sound Reproduction
“In-head localization seems like the logical opposite of an enveloping, external, and spacious auditory illusion. Perceptions of sounds originating inside the head, which routinely occur in headphone listening, can also occur in loudspeaker listening when the direct sound is not supported by the right amount and kind of reflected sound. The author and his colleagues have experienced the phenomenon many times when listening to stereo recordings in an anechoic chamber, usually with acoustically “dry” sounds hard panned to center or, less often, to the sides. It prompted an investigation (Toole, 1970), the conclusion of which was that there is a continuum of localization experience from external at a distance through to totally within the head. It is often noted with higher frequencies, and it can happen in a normal room with loudspeakers that have high directivity or in any situation where a strong direct sound is heard without appropriate reflections. Moulton (1995) noted that “speakers with narrow high-frequency dispersion . . . tend to project the phantom at or in front of the lateral speaker plane.” In an anechoic chamber, it can occur when listening to a single loudspeaker, especially on the frontal axis, in which case front-back reversals are also frequent occurrences. This phenomenon is so strong that it need not be a “blind” situation. Interestingly, a demonstration of four-loudspeaker Ambisonic recordings played in an anechoic chamber yielded an auditory impression that was almost totally within the head. This was a great disappointment to the gathered enthusiasts, all of whom anticipated an approximation of perfection. It suggested that, psychoacoustically, something fundamentally important was not being captured or communicated to the ears. An identical setup in a normally reflective room sounded far more realistic, even though the room reflections were a substantial corruption of the encoded sounds arriving at the ears.”

Toole 2016
https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation

Toole 2020
https://gearspace.com/board/showpost.php?p=15187387&postcount=61

Here follows some interpretations, generalizations, and speculations on my part. I realize that this is subject to significant confirmation bias on my part, but I hope that it may contribute to further discussion.

Listener preferences (general, but could possibly vary depending on the listener task at hand)
Primary distinction from Lokki 2016 (clarity vs reverberance/width/loudness)
1. Clarity in the concert hall, since opposed to reverberance and width, may correlate with preference for loudspeaker proximity in Bech and Lokki 2019. Given underlying attributes, likely correlates with audiophile “imaging.” If so, loudspeakers with high directivity and/or more “dead” rooms with lower RT and higher DRR through the use of absorption may be preferred (see Toole quote directly above, as well as Laukannen 2014), particularly at first reflection points, also toe-in for speakers directly aimed at listener. This preference class may represent a relative minority of listeners, see Lokki 2012 and 2016 (first).
2. Width and envelopment (earlier energy of sound field), likely correlates with audiophile “wide soundstage,” may benefit from wider-radiating loudspeakers with more even off-axis response and rooms that maintain or promote lateral reflections, particularly when speakers are pointed straight ahead, instead of toed-in. This preference class may represent a relative majority of listeners and possibly the target customers for much of Harman preference testing research (even though this is not necessarily reflected in the Olive models) and Toole’s suggestions for possible setups, depending on individual preference (https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation)
Other perceptual factors from Lokki
3. Reverberance (later energy of sound field) likely to benefit from speakers able to provide relatively later reflections (dipoles, ?bipoles, cross-firing narrow constant directivity or other specialized designs), rooms with more diffusion and/or use of angled reflection instead of absorption (like RFZ or CID for latter) to avoid excessively low RT30 and EDT (or “deadness”), see Laukannen 2014 and Lokki 2016 (first). This can be balanced with #1 Clarity above through use of diffusion outside of the median plane and lateral first reflection points.
4. Bass: 25-30.5% of preference estimate in Olive models, extension in one and primarily quality defined by absolute average deviation below 300 Hz in the other
5. Timbre: Despite some ambiguity between different usages of the term, like “spatial balance” from Lokki and what Greisinger refers to as instrument timbre defined from 1-4 kHz, I suspect that there may be at least five aspects here that may contribute.
A. Floor reflections result in comb filter typically starting in the 200-300 Hz range resulting from path length differences, so not well-reflected in anechoic or near field measurements, but loudspeakers that take this into account may be preferred. This is typically avoided in studio control rooms due to presence of consoles.
B. Wider baffle speakers transition from omnidirectional to forward-radiating at a lower frequency. Despite baffle step compensation, there is still a difference in directivity index, which may suggest relevance for the room curve, see next point C.
C. “Smoothly changing” (controlled?) versus “relatively constant” directivity speakers seem to demonstrate a more nearly diagonal straight line versus a relatively stair step DI with the middle stair ranging from several hundred to several thousand Hz. I speculate that preference for the former may correlate with preference for “clarity” (in #1 above in the listener preference model), the latter with #2. If the inverted DI curve predicts the shape of the room curve, compare the target curve for trained vs all listeners in Figure 14 of Toole 2015 with what I wrote about “smoothly changing” versus “relatively constant” directivity and possible correlations for relative preferences for clarity vs width/envelopment, as well as sensitivity to lateral reflections.
D. Diffraction, effects are controversial (see https://www.linkwitzlab.com/diffraction.htm vs https://www.avsforum.com/threads/ho...-science-shows.3038828/page-104#post-57684656), but might include local room effects in addition to the intrinsic loudspeaker ones.
E. Distortion, linear or otherwise

Young-Ho
 
Last edited:
@Duke Your designs fall under what I refer to as "specialized designs" in the Reverberance section towards the end. @Floyd Toole I invite any comments you might have.
 
Lokki 2016
https://users.aalto.fi/~ktlokki/Publs/JASMAN_vol_140_iss_1_551_1.pdf
“The main results show that listeners can be categorized into two different preference classes.”
Minority (43%): “Some listeners prefer clarity over reverberance”
Majority (57%): “others love strong, reverberant and wide sound”
2 broad preference classes: clarity vs reverberence/width/loudness
3 latent attribute classes: RWL, timbre, clarity/definition
Clarity/definition negative correlates with EDT = early decay time

This is interesting, I had expected a smaller minority of listeners prefering clarity over reverberance.
 
This is interesting, I had expected a smaller minority of listeners prefering clarity over reverberance.
Actually, I got the groups backwards! The majority (16/28) in the quoted 2016 Lokki paper preferred clarity, the minority reverberant sound, though the preferences did vary with musical choice (Bruckner vs Beethoven) and position within the hall. These results are almost exactly the inverse of the preferences expressed in the 2012 Lokki paper. I wonder if these may reflect relative differences in the assessors used for each study, but for the 2016 one, "The selected assessors were 10 professional musicians (no specific genre), 10 active amateur musicians, and eight active concert goers with varying musical background." Note Toole's comments on musicians' 7x sensitivity to reflections.

I don't seem to be able to edit my original post. Apologies for the error.
 
@youngho Michelle Vigeant has published good research on envelopment and reverb in concert halls. Not open access as far as I know, though.
Thanks! I did find an open access one: https://sites.psu.edu/spral/files/2...Hall-Preference-ISRA-Amsterdam-2019-FINAL.pdf, which also references the 2012 and 2016 Lokki papers above.

"Subjects were required to have at least five years of formal musical training and were required to be actively studying their instrument or involved in a musical ensemble."

"Much of the clarity-related impressions, including temporal clarity, spatial clarity, intimacy, and proximity were found to show high correlation. Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87)." This supports my supposition #1 above about clarity correlating with proximity, also supports what others have written about the envelopment/width cluster.
 
... intimacy, and proximity were found to show high correlation. Strength showed strong correlations with the spatial perceptions of envelopment (=0.77) and source width (=0.87)." This supports my supposition #1 above about clarity correlating with proximity, also supports what others have written about the envelopment/width cluster.

Subsidy--here in Europe the so called "classical music" is carried by government grants. The program is always the same, perpetual repetition of pieces from composers of an art that thrived together with bourgois culture. Today we know how badly founded the eponymous musical theory was. Contemporary composers wrangle with keeping the plain lies as a heritage, but habitually fall into despair in front of affirmative phrase-mongers.

I don't think stereo should copy cat the eternal quest for a celestial ideal. Not the least, modern concert hall architecture fails proverbially. Despite the hereby documented efforts taken. The strong intimate proximity of envelopment is clearly achieved by just knowing musicians and participating in their arts--as to paraphrase the quoted author, ambitiously widening the source of supposed enjoyment.

When it comes to special skills of such personell, true. Eventually I started learning an instrument. It's not 5 magic years of training yet, but I can already tell, that perspectives change dramatically. I won't explicate that further, alone I spare a lot on stereo expenses. Which is a nice pay-back the least.
 
Last edited:
On the subject of clarity it would be interesting to poll "classical" music audiophile listeners about their preference regarding simple/distant- vs multi/close-mic'ed recordings.
 
Listener preferences (general, but could possibly vary depending on the listener task at hand)
Primary distinction from Lokki 2016 (clarity vs reverberance/width/loudness)
1. Clarity in the concert hall, since opposed to reverberance and width, may correlate with preference for loudspeaker proximity in Bech and Lokki 2019. Given underlying attributes, likely correlates with audiophile “imaging.” If so, loudspeakers with high directivity and/or more “dead” rooms with lower RT and higher DRR through the use of absorption may be preferred (see Toole quote directly above, as well as Laukannen 2014), particularly at first reflection points, also toe-in for speakers directly aimed at listener. This preference class may represent a relative minority of listeners, see Lokki 2012 and 2016 (first).
I found this paper that supports the idea that controlled directivity may enhance the perception of proximity (or distance) with loudspeaker reproduction, but I wonder if the implications of the D/R ratio also apply to room treatment: https://asa.scitation.org/doi/10.1121/1.4921678
 
Finally got the time to read through this thread, @youngho. Really excellent stuff. Much appreciated that you took the time to present it all! These various data points can probably be interpreted in different ways. It's very valuable to get it collected in one thread for future reference.

I came across this PhD thesis recently, which seems to me to be one of the most thorough investigations of loudspeaker directivity and acoustics in small rooms from recent years: https://www.proquest.com/docview/2283450919?pq-origsite=gscholar&fromopenview=true

Haven't had time to really read it yet, though, only skimmed through a couple of pages. But seems relevant for the topic(s) of the thread.
 
Finally got the time to read through this thread, @youngho. Really excellent stuff. Much appreciated that you took the time to present it all! These various data points can probably be interpreted in different ways. It's very valuable to get it collected in one thread for future reference.

I came across this PhD thesis recently, which seems to me to be one of the most thorough investigations of loudspeaker directivity and acoustics in small rooms from recent years: https://www.proquest.com/docview/2283450919?pq-origsite=gscholar&fromopenview=true

Haven't had time to really read it yet, though, only skimmed through a couple of pages. But seems relevant for the topic(s) of the thread.
Happy to share! Putting it together also helped me understand it a little better. I agree about alternate interpretations, which is why I put in disclaimers, indications of ambiguity, and personal pronouns for the last section. Unfortunately, I can't edit the original post so may need to re-post an updated version later.

Thanks! However, I couldn't find a complete copy of the thesis, only the abstract, table of contents, and list of figures. Please let me know if you have one to share.
 
Thanks! However, I couldn't find a complete copy of the thesis, only the abstract, table of contents, and list of figures. Please let me know if you have one to share.

Hm, that's strange. Does this link work for you?
If not let me know and I can send you a copy on pm.
 
Finally got the time to read through this thread, @youngho. Really excellent stuff. Much appreciated that you took the time to present it all! These various data points can probably be interpreted in different ways. It's very valuable to get it collected in one thread for future reference.

I came across this PhD thesis recently, which seems to me to be one of the most thorough investigations of loudspeaker directivity and acoustics in small rooms from recent years: https://www.proquest.com/docview/2283450919?pq-origsite=gscholar&fromopenview=true

Haven't had time to really read it yet, though, only skimmed through a couple of pages. But seems relevant for the topic(s) of the thread.

A link, if you wish to download the full paper:

 
Finally got the time to read through this thread, @youngho. Really excellent stuff. Much appreciated that you took the time to present it all! These various data points can probably be interpreted in different ways. It's very valuable to get it collected in one thread for future reference.

I came across this PhD thesis recently, which seems to me to be one of the most thorough investigations of loudspeaker directivity and acoustics in small rooms from recent years: https://www.proquest.com/docview/2283450919?pq-origsite=gscholar&fromopenview=true

Haven't had time to really read it yet, though, only skimmed through a couple of pages. But seems relevant for the topic(s) of the thread.

@oivavoi thanks for sharing this fantastic paper. I read through it's summary of previous research and its the gold standard on the topic of the audibility of loudspeaker directivity.

I've read a large number of the papers cited and the author brilliantly stitches them together to explain what we hear in such a fluid and clear way that I picked up new and unexpected insights from papers that I thought I already knew well. Comparing as a compliment, it's the next step beyond Dr. Toole's book.

It's a lot to absorb and I'm looking forward to reading the next chapters summarizing his test outcomes.

What a treat! Thanks again.
 
Here follows some interpretations, generalizations, and speculations on my part. I realize that this is subject to significant confirmation bias on my part, but I hope that it may contribute to further discussion.

Listener preferences (general, but could possibly vary depending on the listener task at hand)
Primary distinction from Lokki 2016 (clarity vs reverberance/width/loudness)
1. Clarity in the concert hall, since opposed to reverberance and width, may correlate with preference for loudspeaker proximity in Bech and Lokki 2019. Given underlying attributes, likely correlates with audiophile “imaging.” If so, loudspeakers with high directivity and/or more “dead” rooms with lower RT and higher DRR through the use of absorption may be preferred (see Toole quote directly above, as well as Laukannen 2014), particularly at first reflection points, also toe-in for speakers directly aimed at listener. This preference class may represent a relative minority of listeners, see Lokki 2012 and 2016 (first).
2. Width and envelopment (earlier energy of sound field), likely correlates with audiophile “wide soundstage,” may benefit from wider-radiating loudspeakers with more even off-axis response and rooms that maintain or promote lateral reflections, particularly when speakers are pointed straight ahead, instead of toed-in. This preference class may represent a relative majority of listeners and possibly the target customers for much of Harman preference testing research (even though this is not necessarily reflected in the Olive models) and Toole’s suggestions for possible setups, depending on individual preference (https://www.audioholics.com/room-acoustics/room-reflections-human-adaptation)
Other perceptual factors from Lokki
3. Reverberance (later energy of sound field) likely to benefit from speakers able to provide relatively later reflections (dipoles, ?bipoles, cross-firing narrow constant directivity or other specialized designs), rooms with more diffusion and/or use of angled reflection instead of absorption (like RFZ or CID for latter) to avoid excessively low RT30 and EDT (or “deadness”), see Laukannen 2014 and Lokki 2016 (first). This can be balanced with #1 Clarity above through use of diffusion outside of the median plane and lateral first reflection points.
4. Bass: 25-30.5% of preference estimate in Olive models, extension in one and primarily quality defined by absolute average deviation below 300 Hz in the other
5. Timbre: Despite some ambiguity between different usages of the term, like “spatial balance” from Lokki and what Greisinger refers to as instrument timbre defined from 1-4 kHz, I suspect that there may be at least five aspects here that may contribute.
A. Floor reflections result in comb filter typically starting in the 200-300 Hz range resulting from path length differences, so not well-reflected in anechoic or near field measurements, but loudspeakers that take this into account may be preferred. This is typically avoided in studio control rooms due to presence of consoles.
B. Wider baffle speakers transition from omnidirectional to forward-radiating at a lower frequency. Despite baffle step compensation, there is still a difference in directivity index, which may suggest relevance for the room curve, see next point C.
C. “Smoothly changing” (controlled?) versus “relatively constant” directivity speakers seem to demonstrate a more nearly diagonal straight line versus a relatively stair step DI with the middle stair ranging from several hundred to several thousand Hz. I speculate that preference for the former may correlate with preference for “clarity” (in #1 above in the listener preference model), the latter with #2. If the inverted DI curve predicts the shape of the room curve, compare the target curve for trained vs all listeners in Figure 14 of Toole 2015 with what I wrote about “smoothly changing” versus “relatively constant” directivity and possible correlations for relative preferences for clarity vs width/envelopment, as well as sensitivity to lateral reflections.
D. Diffraction, effects are controversial (see https://www.linkwitzlab.com/diffraction.htm vs https://www.avsforum.com/threads/ho...-science-shows.3038828/page-104#post-57684656), but might include local room effects in addition to the intrinsic loudspeaker ones.
E. Distortion, linear or otherwise

Young-Ho
Here are my reflections on the excellent collection of important studies on how we hear in relation to stimuli - physics.

Good concert hall sound is often perceived to have better spatial sound than in ordinary listening rooms.
In the well-built concert hall, like Wiener Musikferein, there are no early reflexes before 20 ms and few after 40 ms in the best seat. Lateral reflexes are dominant with attenuation around 8 db. Lokki was surprised that the lateral walls reflected even relatively high frequencies.

The vertical perception of ceiling and floor reflections mainly affects the location of the main sound stream depending on distance. Ceiling and floor reflections provide a sound fusion similar to when listening to two stereo speakers. If it is the same distance for the ceiling and the floor reflex to the listening position, the sound center ends up in the middle of the speaker. The effect of vertical precedent occurs significantly later than the horizontal one due to the position of the ears and the configuration of the outer ears.

Recording technicians choose to listen at a relatively low listening level, < 85 dB without increasing the bass and treble. Consensus research shows unequivocally worse percieved sound above 85 dB.
a - Masking of nearby frequencies in the inner ear.
b -The muscles of the middle ear try to protect the inner ear from too high sound amplitudes by reducing the mechanical movements of the body's smallest bones - the auditory ossicles. The influence of the muscles on the auditory ossicles is not linear. Significant perceived distortion is created.
At really high sound levels, masking can remove some measured sound distortion. Which explains why some musicians don't want to play at a low volume.

Sound stream segregation is an important neurophysiological phenomenon that we all perceive but may not think about. The resolution of sounds is a maximum of 2-3 ms for the most transient sounds.
We thus have a segregated hearing where prominent sound reflexes behind a loudspeaker can be experienced as a separate new sound with significant masking without the precedence effect entering.
The hearing's resolution/segregation of the sound stream makes it very unlikely that comb filter effects are distorting the sound. Small dips in the sound's frequency curve, as with measurable comb filter effect, are not experienced according to consensus research. Has been a fact for more than 60 years in neuropsychology research.

Sound stream segregation enables hearing in difficult sound environments - Cocktail party effect.
With impaired hearing due to age or other causes, segregation deteriorates. This is not only related to non-linear frequency curve but also to the distortion of the sound itself and the difficulty to identify the sound before the next sound comes in the sound stream. Fast sound flow is difficult to perceive. Raising the voice can often make it harder to perceive the sounds, due to masking, when the main problem is that the sound stream is too fast.

JM
 
Last edited:
Thank you, @Neuro for you post, also @Keith_W for your kind comment. Pardon my contrariness:
Good concert hall sound is often perceived to have better spatial sound than in ordinary listening rooms.
The phrase "spatial sound" could be interpreted in multiple ways. There can be, for example, immediate impressions of being in much larger volume spaces, owing at least in part to truly diffuse ambient sound. There can be spatial impressions deriving from bass modes, a la https://www.audiosciencereview.com/forum/index.php?threads/bass-and-subwoofers.51589/#post-1857133. There's also localization and the perception of proximity, https://www.audiosciencereview.com/...ustics-links-and-excerpts.51487/#post-1853376
In the well-built concert hall, like Wiener Musikferein, there are no early reflexes before 20 ms and few after 40 ms in the best seat. Lateral reflexes are dominant with attenuation around 8 db. Lokki was surprised that the lateral walls reflected even relatively high frequencies.
Two of the most highly regarded concert halls do have the first early reflection before 20 ms: https://www.audiosciencereview.com/...irectional-speakers.1283/page-18#post-1883529
The vertical perception of ceiling and floor reflections mainly affects the location of the main sound stream depending on distance. Ceiling and floor reflections provide a sound fusion similar to when listening to two stereo speakers. If it is the same distance for the ceiling and the floor reflex to the listening position, the sound center ends up in the middle of the speaker. The effect of vertical precedent occurs significantly later than the horizontal one due to the position of the ears and the configuration of the outer ears.
I asked Floyd Toole about whether the ceiling (and, I'm assuming, floor) reflection can cause image shift in the way that you seem to be describing: https://www.audiosciencereview.com/...irectional-speakers.1283/page-18#post-1883529. Related might be: https://urn.fi/URN:NBN:fi:aalto-201712187901?

I'm not sure what you mean about the effects and timing, but the floor is likely to be the first reflection in non-lifestyle listening setups (as opposed to lifestyle-oriented room picutres where the speakers are tucked into corners or in bookshelves).

Sound stream segregation is an important neurophysiological phenomenon that we all perceive but may not think about. The resolution of sounds is a maximum of 2-3 ms for the most transient sounds.
We thus have a segregated hearing where prominent sound reflexes behind a loudspeaker can be experienced as a separate new sound with significant masking without the precedence effect entering.
The hearing's resolution/segregation of the sound stream makes it very unlikely that comb filter effects are distorting the sound. Small dips in the sound's frequency curve, as with measurable comb filter effect, are not experienced according to consensus research. Has been a fact for more than 60 years in neuropsychology research.

Sound stream segregation enables hearing in difficult sound environments - Cocktail party effect.
With impaired hearing due to age or other causes, segregation deteriorates. This is not only related to non-linear frequency curve but also to the distortion of the sound itself and the difficulty to identify the sound before the next sound comes in the sound stream. Fast sound flow is difficult to perceive. Raising the voice can often make it harder to perceive the sounds, due to masking, when the main problem is that the sound stream is too fast.
Re: raising voice, there are other changes (aka the Lombard effect)

Re: sound stream segregation, I am only very occasionally working on an update to https://www.audiosciencereview.com/...acoustics-self-education-links-sharing.45583/, here is one part:

Shamma 2013
Temporal coherence and the streaming of complex sounds
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310575/
“Streaming is an active listening process that engages attention and induces adaptive neural mechanisms that reshape the perceptual scene, presumably by enhancing responses to the target while suppressing responses to the background.”
“To segregate a sequence of tokens (be they phonemes or tones), it is necessary to satisfy a key condition – that the tokens be perceptually distinct from those associated with competing sequences, e.g., the pitches of two talkers or of two alternating tone sequences must be sufficiently different. This well-known principle of steaming has often been referred to as the “channeling hypothesis”
“Forming a stream also requires binding of the parallel perceptual attributes of its tokens, to the exclusion of those belonging to competing streams. “
“The proposed computational scheme emphasizes two distinct stages in stream formation (Fig. 59.1): (1) extracting auditory features and representing them in a multidimensional space mimicking early cortical processing and (2) organizing the features into streams according to their temporal coherence. Many feature axes are potentially relevant including the tonotopic frequency axis, pitch, spectral scales (or bandwidths), location, and loudness. All these features are usually computed very rapidly (<50 ms). Tokens that evoke sufficiently distinct (nonoverlapping) features in a model of cortical responses are deemed perceptually distinguishable and hence potentially form distinct streams if they are temporally anti-correlated or uncorrelated over relatively long time periods (>100 ms), consistent with known dynamics of the cortex and stream buildup.”

Krishnan 2014
Segregating complex sound sources through temporal coherence
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4270434/
“Our approach…is based on the notion that perceived sources (sound streams or objects) emit features, that are modulated in strength in a largely temporally coherent manner and that they evoke highly correlated response patterns in the brain. By clustering (or grouping) these responses one can reconstruct their underlying source, and also segregate it from other simultaneously interfering signals that are uncorrelated with it.”
“(1) coincidence here refers to that among modulated feature channels due to slow stimulus power (envelope) fluctuations, and not to any intrinsic brain oscillations; (2) coincidences are strictly done at cortical time-scales of a few hertz, and not at the fast pitch or acoustic frequency rates often considered; (3) coincidences are measured among modulated cortical features and perceptual attributes that usually occupy well-separated channels, unlike the crowded frequency channels of the auditory spectrogram; (4) coincidence must be measured over multiple time-scales and not just over a single time-window that is bound to be too long or too short for a subset of modulations”
Includes a number of references related to auditory stream segregation

Brown 2014
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310855/
The precedence effect in sound localization
 
Back
Top Bottom