• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Beyond Linearity: Why Speaker Dispersion Matters Far More Than People Expect



With conventional wide-dispersion speakers, and without suitable room treatment, yes. Symmetry in the early reflections matters.

There are ways to mitigate the influence of the first reflections, both from a spatial and from a sound-quality standpoint, in situations where the room is not physically symmetrical. These include what might be called compensating room treatment, and avoiding illuminating the typical first-reflection zones in the first place (the latter being the approach I use)….

Duke "experiencing acute middle-name envy" LeJeune

My own first-reflection strategy is accidental. The room has so much stuff in it that the reflection surfaces are more dispersing than reflecting. The floor bounce is well-damped with a rug over carpet, and the ceiling slopes up from the wall where the speakers are located such that the first reflection doesn’t aim at the listener. But the room is 20 feet long, with a grand piano at one end. The speakers are roughly centered on the space that doesn’t not include the piano, so wall reflections, if there are any, are not symmetrical.

Staging seems to me pinpoint perfect nevertheless, and also the room can be filled with, well, lots of sound, even when not at the sweet spot for staging.

listening-room-plan.JPG


listening-room-section.JPG


These diagrams grossly underrepresent the general clutter. The two chairs have been replaced by a small sofa, and the tubas keep multiplying for some unexplained reason. I don’t think there is one flat surface That could, if mirrored, show me the speakers in the reflection. I expect most sound comes from the speakers, with the remainder as general (but fast) reverberation.

The windows behind the speakers and the bit of walls below and above the staircase openings are potential early reflections. Nothing I can do about those, and that much is unaffected by directionality, unless the speaker is a dipole.

Rick “internyms take practice” Denney
 
Last edited:
Wide dispersion inherently leads to more reflections, and in most typical rooms, that tends to blur imaging precision compared to speakers with controlled, narrow directivity.

Wide vs Narrrow, same room

Unsmoothed frequency response. Both Left and Right playing.

1743374272841.png


Dips would seem to be reflective cancellations.


Impulse

1743374372533.png


Peaks are reflections
 
I have deliberately tried to push the arrival times of strong early reflections back in time as far as I reasonably can. My room is would be about 17 by 13 feet if it was a clean rectangle, but it is not. My target is to have a "reflection-free interval" following the direct sound of about 10 milliseconds; 15 milliseconds would be better, and 20 milliseconds better still.

Based on physical distances, my floor bounce arrives at 2.4 milliseconds after the direct sound, and my ceiling bounce at 4.4 ms. My understanding is that the floor and ceiling bounces are perceptually rather benign, and both reflection points are outside my speaker's -6 dB coverage pattern limits (though the floor reflection is not far outside of it).

My speakers are fairly directional and are toed-in aggressively such that the first ipsilateral reflections are quite weak, being well outside the speaker's coverage pattern. They arrive at about 2.8 milliseconds.

So the first strong lateral reflections are actually the contralateral ones, off the opposite side walls. These arrive at about 9.8 milliseconds.

The reflection off the wall behind my head arrives at about 10.6 milliseconds.

Each speaker has an additional up-and-rear firing driver whose path length includes a ceiling bounce, and its output arrives at about 11.5 milliseconds.

So while there are reflections which arrive well before my 10 millisecond target, they are either benign or weak or both. The strong onset of reflections is delayed until ballpark 10 milliseconds.
Yes. That kind of information would be helpful to everyone.
Of course, each person uses their setup within their own limited space, but things like reflection strength and arrival time can still provide valuable clues about the listening conditions.
For example, I’ve experimented with adjusting the ITDG from 5 ms to 10 ms, 15 ms… all the way up to 30 ms (in a BRIR environment), and I’ve also manipulated the overall shape of the ETC — from the typical exponential decay pattern to forms that consider note-to-note masking as discussed in David Griesinger’s papers.
And depending on each combination, the results all sounded quite different.
I think it would be even more helpful if others could also share what they perceive from certain speaker characteristics based on that kind of information.
 
If people are interested in experimenting with what Omni speakers sound like, get a pair of the Sony XB100 and set them up in stereo. You can place them closer to the walls/further from you and vise versa and hear what those reflections mean to your perception. Place them on candle stands, barstools, or a wall shelf. If you buy them on sale or used in mint condition, you can get them for $65 or so. Afterwards you have a killer set of handheld bluetooth speakers. They are incapable of playing very loud and they are very limited in the bass, but within their limitations they are quite HiFi. Oh, try your best to refrain from falling in love with omni speakers. There are not many good ones available, and DIYing them is very challenging.
 
Yes. That kind of information would be helpful to everyone.
Of course, each person uses their setup within their own limited space, but things like reflection strength and arrival time can still provide valuable clues about the listening conditions.

Agreed.

We've done in-house experiments introducing variations in timing, intensity, and spectral content to the late-onset reflections added by the rear-firing driver(s).

Perhaps the most interesting finding is that there seems to be a "sweet spot" for the relative loudness (intensity) of the additional reflection energy. The specific the "sweet spot" loudness setting varies with the particulars of the set-up, but if the rear-firing driver is 1 dB too loud, clarity starts to be degraded. Listeners adjusted the loudness of the rear-firing driver by hand using a remote volume control which had 1 decibel increments, and in blind testing they independently arrived at the exact same setting.

For example, I’ve experimented with adjusting the ITDG from 5 ms to 10 ms, 15 ms… all the way up to 30 ms (in a BRIR environment), and I’ve also manipulated the overall shape of the ETC — from the typical exponential decay pattern to forms that consider note-to-note masking as discussed in David Griesinger’s papers.
And depending on each combination, the results all sounded quite different.
I think it would be even more helpful if others could also share what they perceive from certain speaker characteristics based on that kind of information.

Would you be willing to share your perceptions from your experiments with varying the initial time delay gap (ITDG)?
 
Last edited:
Agreed.

We've done in-house experiments introducing variations in timing, intensity, and spectral content to the late-onset reflections added by the rear-firing driver(s).

Perhaps the most interesting finding is that there seems to be a "sweet spot" for the relative loudness (intensity) of the additional reflection energy. The specific the "sweet spot" loudness setting varies with the particulars of the set-up, but if the rear-firing driver is 1 dB too loud, clarity starts to be degraded. Listeners adjusted the loudness of the rear-firing driver by hand using a remote volume control which had 1 decibel increments, and in blind testing they independently arrived at the exact same setting.



Would you be willing to share your perceptions from your experiments with varying the initial time delay gap (ITDG)?


1743388421346.png

1743388550093.png

1743388581230.png


Thank you for sharing your insights.

In my case, the tests I conducted were often driven by spontaneous ideas or after reading specific papers that made me want to try something myself, so my recollection is a bit vague. I hope you’ll understand. (The image above is a capture from David's presentation.)
I found that impressions of foreground and background streams varied depending on the ITDG, even when using the same reflections. Reflections within the 5–10 ms range didn’t evoke much of a spatial impression.
Around 40–50 ms seemed to be a kind of turning point. Early reflections before 50 ms mostly contributed to the perception of the front speakers and the foreground image, whereas reflections after 50 ms began to feel more like background elements rather than part of the frontal stage.
Personally, I felt the sweet spot for ITDG was around 20–25 ms. However, the shape of the early reflections and the character of the following late reflections each contributed differently to the overall impression.
That’s why I suggested that it might be beneficial for us to share information about each other’s listening spaces. I thought it would be more meaningful if we could exchange impressions like: “In this particular room condition, with this kind of speaker dispersion pattern, and at this listening distance (considering the direct-to-reflected sound ratio, density, and spatial characteristics), this is how I perceived the sound.”
 
View attachment 440577
View attachment 440579
View attachment 440580

Thank you for sharing your insights.

In my case, the tests I conducted were often driven by spontaneous ideas or after reading specific papers that made me want to try something myself, so my recollection is a bit vague. I hope you’ll understand. (The image above is a capture from David's presentation.)
I found that impressions of foreground and background streams varied depending on the ITDG, even when using the same reflections. Reflections within the 5–10 ms range didn’t evoke much of a spatial impression.
Around 40–50 ms seemed to be a kind of turning point. Early reflections before 50 ms mostly contributed to the perception of the front speakers and the foreground image, whereas reflections after 50 ms began to feel more like background elements rather than part of the frontal stage.
Personally, I felt the sweet spot for ITDG was around 20–25 ms. However, the shape of the early reflections and the character of the following late reflections each contributed differently to the overall impression.
That’s why I suggested that it might be beneficial for us to share information about each other’s listening spaces. I thought it would be more meaningful if we could exchange impressions like: “In this particular room condition, with this kind of speaker dispersion pattern, and at this listening distance (considering the direct-to-reflected sound ratio, density, and spatial characteristics), this is how I perceived the sound.”

One thing I will say about Bregman's auditory scene analysis model and stream formation, I know from even just listening to some of the gap and pattern example from Bregman's lab that supposedly create perceptions where listener no long discriminated separate streams with one particular stimulus, or have something that sounds like a pattern become two continuous streams with changes in gap timing, that my personal experience doesn't always match the expectation. I think in the realm of certain kinds of stream formation -- and especially when we're talking about something like music where we're intentionally being attentive to multiple related but different sound streams and (multiple instruments) and are intentionally hearing them both individually and separately and separate from non-musicians sounds in the room, and when we're talking about "soundstage" and the auditory illusion we're intentionally looking to create and perceive in stereo recording and playback, so complex conscious and partially unconscious stream segregation and stream formation that's different from trying to pick out the sound of one conversation in a noisy room -- these seems to be another one of those areas were I suspect a pretty wide degree of variation among subject listeners, and variation perhaps also relating to learned experience and training.
 
May I ask, which speakers you used for "wide" and "narrow"? Interesting measurement!

JBL LSR 308 (follows Harman dispersion design ideas I suppose)
MartinLogan reQuest (dipole, electrostatic hybrid, considered "beamy", 15" x 48" panel, 12" sealed woofer, cross at 180hz)

The JBL are along the outer edge of the ML with the woofer about 50" off the floor, matching the vertical center of the panels , so almost the same placement.

The JBL are my "daily drivers" for TV and Radio or whatever, as they pull about 10w total, vs minimum idle 200W (max claimed 2800W) for the Krells powering the ML, which is worth it when I feel the need. At low levels I'd rate them (with my deaf ears) indistinguishable, if not in the sweet spot, where the ML/Krell image is much more precise (in this room).
 
Last edited:
JBL LSR 308 (follows Harman dispersion design ideas I suppose)
MartinLogan reQuest (dipole, electrostatic hybrid, considered "beamy", 15" x 48" panel, 12" sealed woofer, cross at 180hz)

The JBL are along the outer edge of the ML with the woofer about 50" off the floor, matching the vertical center of the panels , so almost the same placement.

The JBL are my "daily drivers" for TV and Radio or whatever, as they pull about 10w total, vs minimum idle 200W (max claimed 2800W) for the Krells powering the ML, which is worth it when I feel the need. At low levels I'd rate them (with my deaf ears) indistinguishable, if not in the sweet spot, where the ML/Krell image is much more precise (in this room).

When the JBL is the wide one, the difference to a non-waveguide speaker is even greater.
 
I still have a screenshot from a conversation with another user.
(Disclaimer: These are not my own measurements. The speaker placement and listening distance are quite similar, but I can't guarantee the measurements were taken from exactly the same X, Y, and Z axis position. So please just take it as a fun reference.)

1743466995807.png


Green: Arendal 1723 Tower
Purple: Revel F328be

(BTW, For reference, this user switched to Revel speakers and ended up being more satisfied.)

At the time, I couldn't find a plot for the Arendal 1723 Tower, so I made a rough comparison with the 1961 series instead. (Normalize)

1743467104153.png

1743467131239.png
 
Last edited:
One thing I will say about Bregman's auditory scene analysis model and stream formation, I know from even just listening to some of the gap and pattern example from Bregman's lab that supposedly create perceptions where listener no long discriminated separate streams with one particular stimulus, or have something that sounds like a pattern become two continuous streams with changes in gap timing, that my personal experience doesn't always match the expectation. I think in the realm of certain kinds of stream formation -- and especially when we're talking about something like music where we're intentionally being attentive to multiple related but different sound streams and (multiple instruments) and are intentionally hearing them both individually and separately and separate from non-musicians sounds in the room, and when we're talking about "soundstage" and the auditory illusion we're intentionally looking to create and perceive in stereo recording and playback, so complex conscious and partially unconscious stream segregation and stream formation that's different from trying to pick out the sound of one conversation in a noisy room -- these seems to be another one of those areas were I suspect a pretty wide degree of variation among subject listeners, and variation perhaps also relating to learned experience and training.
I’ve also tested the same reflection pattern by adjusting only the ITDG, and as you mentioned, if you personally manipulated the ITDG while listening, the perception could indeed be different. (Auditory and spatial perception cues don’t stem solely from ITDG—it’s just one of many contributing factors, and I mentioned it simply as an example.)
That said, the reason I brought this up is because I believe that if a bit more information were provided about how a speaker's dispersion characteristics interact with the space it’s being listened to in, it would allow for deeper discussions and make it easier to relate to someone’s impressions.
 
My preference is hearing into the recording. IMO strong early reflections are detrimental to image focus, perceived depth and clarity. My speakers are setup for time-intensity trading, which is not for everyone's taste, and not possible with any speaker. As a result of very careful alignment, speaker to speaker interaction, as well as the speaker to room interaction provides for a coherent sound field where perceived tonality and clarity doesn't change much even outside the listening position.


03.jpg


L and R (purple/pink), and both playing (cyan). When both speakers are playing, early reflections are further attenuated and provide no clue where each of the speakers is located as a sound source. This can be observed better in standard view of the impulse responses:

04.jpg


As far as perceived width (and depth), IME all the cues are in the recording. Some are (subjectively) narrow, some are wide, some even immersive, deep and "holographic". :)
 
I’ve also tested the same reflection pattern by adjusting only the ITDG, and as you mentioned, if you personally manipulated the ITDG while listening, the perception could indeed be different. (Auditory and spatial perception cues don’t stem solely from ITDG—it’s just one of many contributing factors, and I mentioned it simply as an example.)
That said, the reason I brought this up is because I believe that if a bit more information were provided about how a speaker's dispersion characteristics interact with the space it’s being listened to in, it would allow for deeper discussions and make it easier to relate to someone’s impressions.

It gets complicated with stereo playback, because we're trying to use stereo recording and playback to summon an illusion of an entirely different space than the listening space and trying to trick the ear into NOT locating the speakers as the sound source but having the speakers seem to disappear as the apparent sound source and create a believable sonic illusion that, say, we're in a giant live room and there are massed violins to the left of the stage and the sound of the French horn coming from 40 feet across the stage to the right and is ricocheting off a big wall 20 feet behind it, even though we know we're in out 10X15 listening rooms.

And in terms of auditory stream segregation, we're not just trying to segregate the stream of the music from the sound of the HVAC or the transformer hum, but also segregate the musical streams from one another -- one instrumental part from another -- while at the same time perceiving the whole. It's a special effect and an illusion, and there are substantial differences in the program material and whether or not it even is trying to summon that kind of illusion at all (like an all in-the-box dance pop production like a Weeknd record or something vs. a three-spaced omni Cozart and Fine orchestral recording), and well, I suspect in those local clue that for different subjects break the magic of the illusion.

In my experience, the fewer local sound cues that suggest the sound is coming from sources inside the room, and certainly that sound is coming from the two particular speakers, the more "realistic" the illusion of a recorded soundstage can be, given a recording designed to capture and reproduce that sort of soundstage.

To me that involves reducing the audibility of everything from reflected sound from local boundaries and their arrival time, to local time decay overhangs whether they're bass decays or pinging flutter echo, to wall shudder and floor vibrations, etc. Regardless of the dispersion of the speakers, the more inert the room, the less the contributions from the room, the more realistic the illusion of the recorded soundstage can be.

I also think, to your point about easier to relate to someone's impressions, I think program material makes an enormous difference too. I can take a close mic'ed, iso recorded, pan potted multi mono "soundstage" recording and play it back in my local room and manipulate the local presentation of width all day by changing the relationship between direct and reflected sound in the room and that's a pretty difference experience from listening to an orchestral recording make in a hall -- whether it's a classic old Fine Mercury or like the great live at Boston Symphony Hall recording of Messiaen's Turangalila Symphonie recently released with the BSO engineered by Shawn Murphy.

Sometimes I think we get too reductive with this stuff looking for "rules" that apply broadly, but in reality there's a like more variables and varieties of experience than we sometimes feel comfortable with. It's one thing to evolutionarily develop a way to locate the sound of the snap of a twig of a predator stalking you in the woods over tens of thousands of years of human development, it's another thing to figure out "the" way to manipulate that over just 70 years of stereo recording and playback, and only 100 years of recorded music all together.
 
It gets complicated with stereo playback, because we're trying to use stereo recording and playback to summon an illusion of an entirely different space than the listening space and trying to trick the ear into NOT locating the speakers as the sound source but having the speakers seem to disappear as the apparent sound source and create a believable sonic illusion that, say, we're in a giant live room and there are massed violins to the left of the stage and the sound of the French horn coming from 40 feet across the stage to the right and is ricocheting off a big wall 20 feet behind it, even though we know we're in out 10X15 listening rooms.

And in terms of auditory stream segregation, we're not just trying to segregate the stream of the music from the sound of the HVAC or the transformer hum, but also segregate the musical streams from one another -- one instrumental part from another -- while at the same time perceiving the whole. It's a special effect and an illusion, and there are substantial differences in the program material and whether or not it even is trying to summon that kind of illusion at all (like an all in-the-box dance pop production like a Weeknd record or something vs. a three-spaced omni Cozart and Fine orchestral recording), and well, I suspect in those local clue that for different subjects break the magic of the illusion.

In my experience, the fewer local sound cues that suggest the sound is coming from sources inside the room, and certainly that sound is coming from the two particular speakers, the more "realistic" the illusion of a recorded soundstage can be, given a recording designed to capture and reproduce that sort of soundstage.

To me that involves reducing the audibility of everything from reflected sound from local boundaries and their arrival time, to local time decay overhangs whether they're bass decays or pinging flutter echo, to wall shudder and floor vibrations, etc. Regardless of the dispersion of the speakers, the more inert the room, the less the contributions from the room, the more realistic the illusion of the recorded soundstage can be.

I also think, to your point about easier to relate to someone's impressions, I think program material makes an enormous difference too. I can take a close mic'ed, iso recorded, pan potted multi mono "soundstage" recording and play it back in my local room and manipulate the local presentation of width all day by changing the relationship between direct and reflected sound in the room and that's a pretty difference experience from listening to an orchestral recording make in a hall -- whether it's a classic old Fine Mercury or like the great live at Boston Symphony Hall recording of Messiaen's Turangalila Symphonie recently released with the BSO engineered by Shawn Murphy.

Sometimes I think we get too reductive with this stuff looking for "rules" that apply broadly, but in reality there's a like more variables and varieties of experience than we sometimes feel comfortable with. It's one thing to evolutionarily develop a way to locate the sound of the snap of a twig of a predator stalking you in the woods over tens of thousands of years of human development, it's another thing to figure out "the" way to manipulate that over just 70 years of stereo recording and playback, and only 100 years of recorded music all together.

Elements such as localization, proximity, envelopment, clarity, and even coherence are all perceived and recognized through the complex interplay of various auditory functions across different frequency ranges and time domains. In fact, the ITDG I mentioned earlier is just a small part of these elements, and even that can still provide meaningful information when someone shares details about their playback environment.
While this thread—including the OP and other users—is focused on topics like radiation patterns, reflections, and preferences or perceptions related to those, it seems that you’re trying to delve into something more fundamental.
I understand that you’re sharing your thoughts on stereo playback, but I also feel that this might not be the most appropriate thread for that discussion.


1743511008111.png

1743511199616.png

1743511072994.png


For example, in my case, I start with the response of my anechoic chamber—free from reflections—and normalize my personalized DF accordingly, correcting for pinna coloration that arises from various speaker angles. (Sometimes I also incorporate crosstalk cancellation or even multi-array setups.)
This approach takes into account the characteristics of the speakers, the playback environment, and even the coloration caused by the pinna response from specific azimuth angles. The more distinct features each part has, the more likely they are to act as perceptual cues, and personally, I consider all of those factors.
But how is that relevant in the context of this thread?
I'm not saying I disagree with your points, but in this particular discussion, I'm more interested in hearing how others perceive the radiation characteristics of their own speakers and what kind of impressions they get from them.
 
Last edited:
Really interesting thread.

Even though a little bit out of the scope of this thread, you may be interested in my rather subjective experiments on wide-3D reflective dispersion of super-tweeter sound using hard-heavy random surface material.

- A new series of audio experiments on reflective wide-3D dispersion of super-tweeter sound using random-surface hard-heavy material:
Part-1
_ Background, experimental settings, initial preliminary listening tests: #912
Part-2_ Comparison of catalogue specifications of metal horn super-tweeter (ST) FOSTEX T925A and YAMAHA Beryllium dome tweeter (TW) JA-0513; start of intensive listening sessions with wide-3D reflective dispersion of ST sound: #921
Part-3_ Listening evaluation of sound stage (sound image) using excellent-recording-quality lute duet tracks: #926
Part-3.1_ Listening evaluation of sound stage (sound image) using excellent-recording-quality jazz trio album: #927
Part-4_Provisional conclusion to use Case-2 reverse reflective dispersion setting in default daily music listening:
#929


Then, I have implemented such wide-3D dispersion of super-tweeter sound in my latest audio setup;
- The latest system setup of my DSP-based multichannel multi-SP-driver multi-amplifier fully active audio rig, including updated startup/ignition sequences and shutdown sequences: as of June 26, 2024: #931
 
Last edited:
I find wide directivity to be like salt, great in moderation but can overwhelm more subtle flavors. I've been comparing Philharmonic Audio HTs (AMT ribbon) and Genelec 8361a's which are medium directivity and with a lot more clarity. In an instant A/B test I almost always like the wide directivity speaker. For longer listens, I prefer the smaller soundstage Genelecs because there's more detail to experience.
 
Back
Top Bottom