The Physics of 3-D Soundstage

Ricardus · Nov 1, 2022

tallbeardedone said:
So what term (if not soundstage) do you use for where you place instruments horizontally in a mix?

It's called PANNING. Most stereo people know the word, so why do they not use it, and instead opt for the ridiculous SOUNDSTAGE nonsense?

There is only L-R panning, and whatever depth you can fool people into believing exists through reverb.

But reverb is (or was more in the past) used so liberally all of the content sounded buried in reverb.

The rest are lies told to you by the home stereo community.

tallbeardedone · Nov 1, 2022

Ricardus said:
It's called PANNING. Most stereo people know the word, so why do they not use it, and instead opt for the ridiculous SOUNDSTAGE nonsense?

There is only L-R panning, and whatever depth you can fool people into believing exists through reverb.

But reverb is (or was more in the past) used so liberally all of the content sounded buried in reverb.

The rest are lies told to you by the home stereo community.

Fair enough but you have to admire it doesn’t have the same ring as soundstage or the visual image. “You’ll here the violin on the far right of the PANNING.” Just doesn’t sound right to me.

tallbeardedone · Nov 1, 2022

Cars-N-Cans said:
In terms of psychoacoustics to me it makes more sense to think in terms of delay rather than phase, albeit the two can be used interchangeably. In a nutshell, the auditory center uses the relative level differences, timing differences, and spectral coloration of the sound that results from the interaction with the pinnae, head, and torso to localize things. An interesting experiment would be to hang a blanket or some other suitably absorptive medium behind the listening position and see if the apparent acoustical image of the dog barking changes. I suspect the early reflections arriving from behind are what gives that effect, but it could be perceptual as well since that is often how we judge distance.

I found this interesting from QSounds wiki:
“QSound is essentially a filtering algorithm. It manipulates timing, amplitude, and frequency response to produce a binaural image. Systems like QSound rely on the fact that a sound arriving from one side of the listener will reach one ear before the other and that when it reaches the furthest ear, it is lower in amplitude and spectrally altered due to obstruction by the head. However, the ideal algorithm was arrived at empirically, with parameters adjusted according to the outcomes of many listening tests.”

levimax · Nov 1, 2022

Ricardus said:
It's called PANNING. Most stereo people know the word, so why do they not use it, and instead opt for the ridiculous SOUNDSTAGE nonsense?

There is only L-R panning, and whatever depth you can fool people into believing exists through reverb.

But reverb is (or was more in the past) used so liberally all of the content sounded buried in reverb.

The rest are lies told to you by the home stereo community.

Maybe depth is hard to fake on a artificially mixed recording but plenty of ambient recordings have depth. One example is the Stereophile test CD of someone walking around in a room

hex168 · Nov 1, 2022

tallbeardedone said:
Here's a physics fact: the intended 3-dimensional stereo soundstage is ONLY possible along the center line between precisely placed stereo speakers.

Here’s why: Stereo reproduction uses either time-of-arrival delay or intensity-difference (volume difference) between channels to place sound at a specific angle within the soundstage (see photo 1)

For example, to place the image in the center there must be zero time delay between the sound waves from left and right channels to your ears or the sound from each speaker must be exactly the same volume, or a combination of the two. To make the sound source appear 30 degrees off-center a time delay of approximately 1.1ms or a level difference of approx 15dB or a combination of the two (0.5ms with 6dB volume difference) must be used (see photo 2).

Long story short, the ONLY way to get the correct time delay or volume difference at your ears is to be EXACTLY mid-line between well-placed speakers. If you're off to the right or left, by the laws of physics, you are changing the time-delay and sound pressure difference at your ears, and the image/soundstage will shift accordingly.

source: https://www.dpamicrophones.com/mic-...v2UiPaHNb0uc7mi30HmoW2nimVctCU-c8wZBC2E5p_5JA

Can't you expand the width of that centerline through time-intensity trading, i.e. toe-in the speakers (with appropriate controlled directivity) so that as one moves off-center to the right, the delay to the right speaker decreases but the intensity decreases as you get further off-axis, so you still hear the "object" appropriately centered? (I hope I got that right, Duke, help?)

Cars-N-Cans · Nov 1, 2022

Ricardus said:
It's called PANNING. Most stereo people know the word, so why do they not use it, and instead opt for the ridiculous SOUNDSTAGE nonsense?

There is only L-R panning, and whatever depth you can fool people into believing exists through reverb.

But reverb is (or was more in the past) used so liberally all of the content sounded buried in reverb.

The rest are lies told to you by the home stereo community.

Recordings done using various forms of stereo microphone arrays and even YT videos done using good quality stereo microphones can have good imaging with a good sense of position and depth on stereo speakers. While I agree that "soundstage" can be unnecessarily broad and vague, panning conversely is quite restrictive and only covers one aspect of stereo imaging, namely that imposed by the mixing engineer.

Duke · Nov 1, 2022

hex168 said:
Can't you expand the width of that centerline through time-intensity trading, i.e. toe-in the speakers (with appropriate controlled directivity) so that as one moves off-center to the right, the delay to the right speaker decreases but the intensity decreases as you get further off-axis, so you still hear the "object" appropriately centered? (I hope I got that right, Duke, help?)

@tallbeardedone one is correct in that the soundstage is BEST along the centerline. Time/intensity trading results in a BETTER (arguably "enjoyable" or "acceptable") soundstage for off-centerline listeners than is normally the case, but it's not as good as along the centerline.

tmuikku · Nov 1, 2022

edit. didn't read all the updates before posting this so hex168 and Duke already touched the subject of time-intensity trading. I'll leave the post here though as there is some more points.

Cars-N-Cans said:
That would imply that the soundstage is not very stable for one reason or another. My experience is if there is not any interference, there is quite a bit of freedom to move about the desk (or console in this instance) without having he imaging change appreciably. Obviously if one moves far off to the side the sound collapses into each respective speaker, but ideally the sweet spot should be large enough to be useful. I don’t know of all the factors involved, but I would surmise the listening window of the speaker plays an important role. A wider radiation pattern will mean that the SPL level heard by each ear does not change much with movement within the listening window since the pattern is flat and wide (edit: but worth noting there will be more side reflections). The ITD will still be altered, of course, but this removes ILDs associated with small head movements. But with a narrow radiation pattern and the need to directly face the speakers, there are potentially large SPL gradients once one moves away from the mid-position, and this induces ILDs as well as ITDs, which is likely to induce additional image shift. Just some food for thought, perhaps.

With constant directivity narrower coverage speakers one can use toe-in to beat this, somewhat. Its got name "time intensity trading" I think. When moving to side from center line between speakers we get closer in time for that side speaker but also further off-axis and get less intensity, sound level decreases. And the opposite happens for the far side, we are now more on-axis so more level to compensate the fact we are further in time. This should should keep image stable. On the other hand less toe-in is completely opposite, as moving to side one gets closer to the speaker in time and in more on-axis so also level increases while further speaker gets further in time and level. These should guarantee phantom center fast collapses to the closest speaker, unstable image.

I have such speakers, but I reckon pattern is too wide as it doesn't work as well as I hoped, although image is quite fine for big area the sweet spot is still small where everything stays in place. Here is what I mean by that, I can move quite a bit before phantom center collapses to the closest speaker, this is good. But, to my disappointment, phantom center moves in front of me, with me, which is interesting, but of course far sides of the image is still bound to speakers which kind of skews the whole image. I think trade between time and intensity is not enough to keep the center image in place, intensity should change more than what I now have, which means narrower directivity.

There is also benefits recarding 3D imaging but require more symmetric setup than I have. ~45deg toed-in speakers radiate very little to the closest boundaries and quite strongly to opposite side of room. Trick is not to have acoustic treatment to keep these late reflevtions to have envelopment/spaciousness that kind of suffers with narrow coverage and toe-in. For people chasing 3D imaging this shouldn't be a problem though, I think. In my situation, big room asymmetric setup these dont help so much so envelopment isn't too good, image is though. I could toe-in less, but it goes blurry big image.

As disclaimer I haven't listened wide coverage speakers long time and never in this room.

ps. just add third speaker as real center and the problems should go away. Too little envelopment? just add rear speakers to create some.

Cars-N-Cans · Nov 1, 2022

Speaking of time intensity trading, here is an interesting one that has intensity and time delay as a matrix with a periodic buzzing tone: Time intensity trading.

The central 0 dB/0 ms which, has a stable image. But moving away on the diagonal where there is successively more trading taking place to keep the sound centered results in a progressively less stable image. An example would be moving to the -2 dB/0.22 ms or 2 dB/-0.22 ms, reaching an extreme with the values -6 dB/0.66 ms or 6 dB/-0.66 ms. Even in headphones its not quite perfect due to variations. In the speakers its ridiculously hard as any small deviations from position, reflections, or comb filtering will throw it off. Of course in this situation its to be expected as the values are being deliberately chosen to maximize the instability of the sound image, and not reinforce it. For me, anyway, its an interesting little demo. Tying it into our original point, I would say this also touches on the notion of how sensitive a setup can be to small deviations from the ideal listening position. Ideally one would want it to be as insensitive as possible, but that is the opposite extreme of the proposition that we have to be exactly at the right position and distance from the loudspeakers to get correct imaging. Like so many things in life in practice things usually fall somewhere in-between with lots of different compromises that can be made.

Cars-N-Cans · Nov 1, 2022

tmuikku said:
With constant directivity narrower coverage speakers one can use toe-in to beat this, somewhat. Its got name "time intensity trading" I think. When moving to side from center line between speakers we get closer in time for that side speaker but also further off-axis and get less sound. And the opposite happens for the far side, we are now more on-axis and further in time, which should keep image stable. Less toe-in is completely opposite, as moving to side one gets closer to the speaker in time as well as in level increases, which quickly collapses to the closest speaker, unstable image.

Definitely an interesting point

. I wonder how much of that factors into the design of professional monitors? Obviously there is a need to minimize reflections and keep the imaging focused, and they seem to fall between some cheapie bookshelf speakers that have very narrow listening windows of maybe 15-20 degrees, and some larger home theater and hi-fi speakers that just about play out to the edges of the front baffle. I'm sure there are all sorts of considerations and design trade-offs to be made but that is one area that seems to be the most variable amongst different loudspeaker designs.

Cars-N-Cans · Nov 1, 2022

tmuikku said:
Here is what I mean by that, I can move quite a bit before phantom center collapses to the closest speaker, this is good. But, to my disappointment, phantom center moves in front of me, which is interesting, but of course far sides stay still which kind of skews the whole image.

I think with conventional stereo imaging with a phantom channel that is pretty much unavoidable as the speakers themselves are what anchors the overall soundstage (a somewhat controversial term in this thread thus far) with the L and R extremes being each speaker, and the phantom image falling between them and being variable unless its an LCR setup like you say.

But, I think I have the perfect solution! (Proceeds to poke Harman with a stick) "C'mon guys, where is our imaging target? You have one for speaker and headphone tonality!" Ah, if only things were as simple as getting shiny things in a box that do all the magic for us. Well, for some people it may be if they have enough money to waive around...

kongwee · Nov 1, 2022

tmuikku said:
There is also benefits recarding 3D imaging but require more symmetric setup than I have. ~45deg toed-in speakers radiate very little to the closest boundaries and quite strongly to opposite side of room. Trick is not to have acoustic treatment to keep these late reflevtions to have envelopment/spaciousness that kind of suffers with narrow coverage and toe-in. For people chasing 3D imaging this shouldn't be a problem though, I think. In my situation, big room asymmetric setup these dont help so much so envelopment isn't too good, image is though. I could toe-in less, but it goes blurry big image.

I don't even need to toe in with my desktop monitor setup. In fact will cause some tonality issue that I don't like. Of course, it is will be different from monitors/speakers to monitors. Keep the good triangle relationship and off wall as possible. You won't loose the tight center and beyond speaker width and depth.

onion · Nov 1, 2022

Cars-N-Cans said:
I would think its less sensitive while you are within the region where the XTC is effective. In principal, there the sources in the recording are what you perceive as the origin of the sounds you hear, which is more robust than when the speakers themselves are perceived as the source like you normally would.

Out of curiosity, have you had a chance to try that system? Curious with how it works.

Yes - I have that system. It works really well in my well-treated room with XTC cancellation magnitude approaching 10dB. I much prefer listening via Bacch, probably because discrete sound sources in the music sound like they are located in 3d space rather than a 2d plane bound by the speakers. It sounds better and is less fatiguing.

For phantom image, there is a slider in Bacch that enables this to be shifted to the left or right. This may be useful if the speaker-room setup causes the phantom image to not be anchored to the centre for the listener.

tmuikku · Nov 1, 2022

^^yes, of course direct sound should be to target so if speakers are to be toed in then their design axis should be of course intended listening axis. Or conversely, use toe-in to adjust the frequency balance so that you prefer the sound. In general, high frequencies drop down in level the more off-axis one listens to. There could be anomalies in the response on-axis and also off-axis which might make it hard to find good balance. Hopefully the speakers were selected so that their intended listening axis and thus toe-in works in your listening environment. Or even better, the directivity is smooth or constant so that the angle doesn't matter too much, ideally. No matter what the directivity, while adjusting direct sound with toe-in also other things change like sound towards the nearby boundaries, which also affects perceived sound at least some. Ideal omni speaker would be immune to all this as it would sound the same on all directions.

The old way of design loudspeakers was to make on-axis sound "flat" so they ought to be listened to on-axis, this was the design axis. Nowadays we have rather easy to use methods and free software to actually see what the directivity is and to see response to pretty much any direction, while designing speakers, and thus ability to optimize for any/all directions if we want by manipulating the structure to have suitable acoustic properties. If a speaker is ~constant directivity then it doesn't matter much what the toe-in is tonality wise as every axis sounds roughly right (hopefully) as sound is ideally problem free to any (forward) direction. While of course toe-in would still affect some, as there is still relationship between amount of high frequencies to low frequencies, some kind of tilt.

If its DIY speaker with DSP and physical structure that makes it ~constant directivity any toe-in and listening axis can be quite easily tuned to have any target response. EQ affects power response so its basically less or more high frequencies in room with varying toe-in, while on-axis response is held what ever the target is.

If the speakers and listening setup is far from boundaries, early reflections come late enough, it probably doesn't matter what the directivity is. At least the direct sound is more important than off-axis sound than if boundaries were near by and reflections more significant in relation. Perhaps any directivity would work fine as long as the direct sound is fine. The image should not suffer mucho.

tmuikku · Nov 1, 2022

Cars-N-Cans said:
I think with conventional stereo imaging with a phantom channel that is pretty much unavoidable as the speakers themselves are what anchors the overall soundstage (a somewhat controversial term in this thread thus far) with the L and R extremes being each speaker, and the phantom image falling between them and being variable unless its an LCR setup like you say.

But, I think I have the perfect solution! (Proceeds to poke Harman with a stick) "C'mon guys, where is our imaging target? You have one for speaker and headphone tonality!" Ah, if only things were as simple as getting shiny things in a box that do all the magic for us. Well, for some people it may be if they have enough money to waive around...

Yeah, I just had this expectation of rock solid phantom image that stays put even if I move some and it was bit of a bummer when it wasn't. But yeah, add real source on the middle and there it is, so its not that big of a deal, just reality.

As for imaging target, I think it should be so that the brain is fooled enough to transfer the sensation / awareness more into the recording, away from the local surrounding and reality. Basically speakers should disappear, room should not identify itself, local noise floor should be low, eyes closed, open a beer to get scent of the pub the performance took place and so on, zone in

tmuikku · Nov 1, 2022

Cars-N-Cans said:
Definitely an interesting point . I wonder how much of that factors into the design of professional monitors? Obviously there is a need to minimize reflections and keep the imaging focused, and they seem to fall between some cheapie bookshelf speakers that have very narrow listening windows of maybe 15-20 degrees, and some larger home theater and hi-fi speakers that just about play out to the edges of the front baffle. I'm sure there are all sorts of considerations and design trade-offs to be made but that is one area that seems to be the most variable amongst different loudspeaker designs.

There is probably good ones, and then there is cheap ones that just look they would be good but are made so cheap can't really fulfil such requirements even if the designer wanted. By the way there is some standards for sound work that address some of the reflections and what not, how most of the mix rooms are probably done and what engineers were possibly listening when making the records https://tech.ebu.ch/docs/tech/tech3276.pdf. I assume there must be monitor speakers that are made for such market segment and environment, like there is the cheap monitors that are targeted for the other market segment politely called "consumer".

Perhaps constant/smooth directivity is not too important when the room and positioning is good, as in studios. If you are shopping speakers try to find (spinorama) measurements to see how the speaker acoustic radiation is and try to relate it to your room and application. But, its not the whole story for good sound as there is more than just directivity, like what the speaker emits. If it has noisy electronics, or has nasty resonances or audible distortion and problems like that the acoustic radiation doesn't matter much if there are these sore thumbs sticking out and ruining it anyway. Build/buy problem free speaker system to get good sound, good drivers, good signal chain, good acoustic radiation, good placement, enough SPL capability with wide enough bandwidth and so on.

ferrellms · Nov 1, 2022

tallbeardedone said:
I’ve been able to get an immersive 3-D soundstage in my room by getting all first reflection >6ms from direct sound (and attenuated as much as possible with absorption) and sitting about 30cm inside the tip of equilateral triangle (quite near field). This is of course hugely room and speaker dependent, but definitely possible. I can place each instrument in an orchestra in 3-D space in, for example, “Jack Sparrow” by Royal Philharmonic. Very cool effect.

Sounds like my setup and listening results.Very cool effect.

tallbeardedone · Nov 1, 2022

Cars-N-Cans said:
Speaking of time intensity trading, here is an interesting one that has intensity and time delay as a matrix with a periodic buzzing tone: Time intensity trading.

The central 0 dB/0 ms which, has a stable image. But moving away on the diagonal where there is successively more trading taking place to keep the sound centered results in a progressively less stable image. An example would be moving to the -2 dB/0.22 ms or 2 dB/-0.22 ms, reaching an extreme with the values -6 dB/0.66 ms or 6 dB/-0.66 ms. Even in headphones its not quite perfect due to variations. In the speakers its ridiculously hard as any small deviations from position, reflections, or comb filtering will throw it off. Of course in this situation its to be expected as the values are being deliberately chosen to maximize the instability of the sound image, and not reinforce it. For me, anyway, its an interesting little demo. Tying it into our original point, I would say this also touches on the notion of how sensitive a setup can be to small deviations from the ideal listening position. Ideally one would want it to be as insensitive as possible, but that is the opposite extreme of the proposition that we have to be exactly at the right position and distance from the loudspeakers to get correct imaging. Like so many things in life in practice things usually fall somewhere in-between with lots of different compromises that can be made.

This is super useful tool! And exactly what I've been looking for to help map placement in my soundstage (sticking with the term!) Thanks for the link. Would this indicate that, due to the same effect, any sound that is placed within in the soundstage using a combination of time-delay and intensity is less stable/focused than one that uses ONLY time-delay OR intensity difference? Just thinking out loud.

Also, I notice that in my room the tone increases slightly in db when a combination is used instead of solely time-delay. This makes sense as the speakers are using a level-difference, but it's interesting to note that if an audio engineer uses a combination of these two processes that the instruments placed using both time-delay and dB difference will sound louder and thus appear closer in the mix. Any audio engineers care to comment? Is this common practice?

Thanks again for the really helpful resource.

Cars-N-Cans · Nov 1, 2022

onion said:
Yes - I have that system. It works really well in my well-treated room with XTC cancellation magnitude approaching 10dB. I much prefer listening via Bacch, probably because discrete sound sources in the music sound like they are located in 3d space rather than a 2d plane bound by the speakers. It sounds better and is less fatiguing.

For phantom image, there is a slider in Bacch that enables this to be shifted to the left or right. This may be useful if the speaker-room setup causes the phantom image to not be anchored to the centre for the listener.

Thanks for the info. Currently I'm getting my XTC passively simply using the direct sound, RFZ, and the head shadowing effect. Works very well for me, but with the speakers only about 75-80 degrees apart (as opposed to the usual 60 degrees), the cancellation is a more modest 5-6 dB. More separation than that gives a greater null, but at lower volumes a hole starts appearing in the center of the soundstage, probably due to the reduction in perceived energy there, not to mention its getting outside of the listening window of the speakers since they are tower speakers facing outward. The remainder of my controls are old-school pots to adjust the relative levels of each channel. Seems to work well enough

Also with the BACCH setup does it have any sense of height? On my system the images are fully 3D, but the height is entirely dictated by the tweeter elevation since there are no cues outside of the tweeter's physical location. I think they have (or had) this in their LiveAudio for the JAMBOX. From the few demos I have found, it really seems to work and allows actual height as well, which is nice as at that point the entire soundstage is fully independent of the speakers. Never clear to me whether that was a spin-off from their BACCH processors or just for the JAMBOX alone. Its a nice effect, and something they could implement given they make measurements using in-ear microphones, so some form of HRTF fitting/estimation could be done.

Cars-N-Cans · Nov 1, 2022

tallbeardedone said:
This is super useful tool! And exactly what I've been looking for to help map placement in my soundstage (sticking with the term!) Thanks for the link. Would this indicate that, due to the same effect, any sound that is placed within in the soundstage using a combination of time-delay and intensity is less stable/focused than one that uses ONLY time-delay OR intensity difference? Just thinking out loud.

Also, I notice that in my room the tone increases slightly in db when a combination is used instead of solely time-delay. This makes sense as the speakers are using a level-difference, but it's interesting to note that if an audio engineer uses a combination of these two processes that the instruments placed using both time-delay and dB difference will sound louder and thus appear closer in the mix. Any audio engineers care to comment? Is this common practice?

Thanks again for the really helpful resource.

Glad you found it useful

The Physics of 3-D Soundstage

Addicted to Fun and Learning

Active Member

Active Member

Major Contributor

Senior Member

Addicted to Fun and Learning

Major Contributor

Senior Member

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Major Contributor

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Active Member

Addicted to Fun and Learning

Addicted to Fun and Learning

Similar threads