• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Soundstage and imaging

pablolie

Major Contributor
Forum Donor
Joined
Jul 8, 2021
Messages
2,104
Likes
3,578
Location
bay area, ca
I tried that video a while back, the guy is waaay too irritating for me. Also, classical music: who can make sense of that? But I love a good vectorscope. :)

More seriously, I understand the comment about different behaviour at different frequencies and agree (normal nonlinearities in the recording/reproduction/hearing processes, surely?) but his key point was that inter-channel time difference should trump amplitude difference, but for people here so far, it seems it doesn't.
There is nothing cast in stone. I *do* listen to classical music a lot. The instruments up front are typically way more defined in their place, the stuff in the back becomes a tad less defined (I will not claim I hear "depth layering", I just think classical music orchestras are arranged on purpose to feature the typical solo-ists up front). It is also the way it sounds if you score the best seats in a classical concert hall.

And I agree that every room reacts differently to different frequencies, and that is one of the biggest challenges (and biggest improvements) we address to extract maximum performance.
 

Axo1989

Major Contributor
Joined
Jan 9, 2022
Messages
2,908
Likes
2,958
Location
Sydney
There is nothing cast in stone. I *do* listen to classical music a lot. The instruments up front are typically way more defined in their place, the stuff in the back becomes a tad less defined (I will not claim I hear "depth layering", I just think classical music orchestras are arranged on purpose to feature the typical solo-ists up front). It is also the way it sounds if you score the best seats in a classical concert hall.

And I agree that every room reacts differently to different frequencies, and that is one of the biggest challenges (and biggest improvements) we address to extract maximum performance.

That makes sense.

I added some thoughts on the classical music presentation per that video while you were posting but generally yes I'd expect orchestra layout to be non-accidental, and various tastes in how this is presented in recordings obviously exist. For me as an occasional listener, classical music recordings can sound like one blended instrument, and I'm not necessarily a fan of that presentation. It may well be realistic, depending on the concert hall and where you sit in it. I've heard performances in say Sydney Opera House concert hall, but avant-garde/amplified not acoustic/traditional. Probably the last time I did the latter was when parents/grandparents took us to the The Nutcracker, or Peter and the Wolf (the latter was educational, for sure, I remember being pretty interested in that pedagogical approach).

As for the room, yes absolutely. Of course we adjust and can hear things more-or-less happily in all sorts of rooms, but I think removing anomalies is beneficial and increases clarity as well as emotional impact and enjoyment (certainly for me).
 

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,720
Likes
6,015
Location
US East
I wouldn't say it's an error, it's just an example to back up what he explains to be the consequence of the conflicting information. If you listen carefully, the image is not all the way to the left, (as in 15 db amplitude but 0 timing difference), and his voice in fact looses some clarity and is somewhat smeared. Listen to his voice from 9:24 and then quickly jump to 12:31 to compare.
I constructed a series of WAV files (using the "arctic_a0010.wav" file from CMU Arctic database). The file "centered.wav" has both left and right channels equal amplitude (center panned).

I delayed the left channel by 15 samples (fs=48000, delay=0.3125 ms), which pulls the sound image to right -- file named "delayed_left_15_samples.wav".

Then I constructed a series of amplitude panned files, from left/right amplitude difference of 1, 2, 3 ... to 12 dB, to pull the image back towards left. You can listen to them to hear how the image shifts from right to left with the different amplitude panning. (File size limit prevented me from putting all WAV files in one ZIP archive.)

It seems to me that a 6 dB amplitude difference sort of pulled the image back to center -- Audacity screen shot below, showing a higher amplitude but delayed left channel.

delayed_left_15_samples_amp_diff_6_dB.png
 

Attachments

  • Time Intensity Trading 1.zip
    3.6 MB · Views: 30
  • Time Intensity Trading 2.zip
    3.8 MB · Views: 40

Chrispy

Master Contributor
Forum Donor
Joined
Feb 7, 2020
Messages
7,942
Likes
6,101
Location
PNW
Mostly when I see comments about soundstage and imaging they're more about other issues than what's in the recording where these things largely originate (as well as your own speaker/room setup). I think for the most part it's over-used terminology compared to reality.
 

IAtaman

Major Contributor
Forum Donor
Joined
Mar 29, 2021
Messages
2,410
Likes
4,172
I constructed a series of WAV files (using the "arctic_a0010.wav" file from CMU Arctic database). The file "centered.wav" has both left and right channels equal amplitude (center panned).

I delayed the left channel by 15 samples (fs=48000, delay=0.3125 ms), which pulls the sound image to right -- file named "delayed_left_15_samples.wav".

Then I constructed a series of amplitude panned files, from left/right amplitude difference of 1, 2, 3 ... to 12 dB, to pull the image back towards left. You can listen to them to hear how the image shifts from right to left with the different amplitude panning. (File size limit prevented me from putting all WAV files in one ZIP archive.)

It seems to me that a 6 dB amplitude difference sort of pulled the image back to center -- Audacity screen shot below, showing a higher amplitude but delayed left channel.
That is very interesting, thanks a lot for putting these together. Without YT compression, effects were much easier to identify.

I had 3 thoughts:
  • I don't think you can fully compensate for ITD with ILD or vice versa. None of the samples sounds anything like the centered sound. And in some samples I could hear his voice coming from one channel and "s" sounds from the other, supporting the argument of "smearing" in the original video and implying frequency dependency. Very interesting.

  • While searching for the most centered image, depending from which direction I "approached" the middle, the answer I found was different, not unlike divide by zero problem in math. What I mean is, 4db diff is definitely R and 8db is definitely L but the ones in between were confusing and felt sometimes like L sometimes like R depending on whether I started going "up" or "down" the list. If I listen to 5 first and then 6, 6 sounds more R. If I play 7 first and than 6, 6 sounds more like L. Again, very interesting.

  • For almost all samples, it felt like sound was coming from behind. My theory is that as my brain can not reconcile ITD and ILD information into a reasonable solution, it gets confused and assumes the sound must be coming from behind.
 
Last edited:

audiofooled

Addicted to Fun and Learning
Joined
Apr 1, 2021
Messages
533
Likes
594
That is very interesting, thanks a lot for putting these together. Without YT compression, effects were much easier to identify.

I had 3 thoughts:
  • I don't think you can fully compensate for ITD with ILD or vice versa. None of the samples sounds anything like the centered sound. And in some samples I could hear his voice coming from one channel and "s" sounds from the other, supporting the argument of "smearing" in the original video and implying frequency dependency. Very interesting.

  • While searching for the most centered image, depending from which direction I "approached" the middle, the answer I found was different, not unlike 1/0 problem in math. What I mean is, 4db diff is definitely R and 8db is definitely L but the ones in between were confusing and felt sometimes like L sometimes like R depending on whether I started going "up" or "down" the list. If I listen to 5 first and then 6, 6 sounds more R. If I play 7 first and than 6, 6 sounds more like L. Again, very interesting.

  • For almost all samples, it felt like sound was coming from behind. My theory is that as my brain can not reconcile ITD and ILD information into a reasonable solution, it gets confused and assumes the sound must be coming back from behind.

I get the same impressions. It's conflicting information and the image looses focus. The voice has sort of a fuzzy boundary, also slightly shifted to the back. It sounds unnatural. Like it has a bit of reverb or halo around it, but it cannot be associated with any normal room the voice is being recorded in, or any artificial effect that would be convincing for that matter.

To me this begs the question how well capable our ears-brain are in hearing human voices in various types of environments. Also, when needed, how well we do in omitting information that we deem unnecessary. Or how would someone feel when there's too much redundant information, or so conflicting that we are unable to do so. To me, examples of this kind sound like someone would hold an empty can of beverage in front of his mouth when talking to me. It would be irritating.

Thank you @NTK for your effort, it is indeed much clearer than YT, also people may hear it in different volume settings (YT video I posted is a bit too quiet).
 

Sokel

Master Contributor
Joined
Sep 8, 2021
Messages
6,161
Likes
6,260
I will just describe what i heard from the whole sequence.
At first (after the centered one) started with the image somehow to the right but a little vague.
From then on the image sifts gradually to the left ending at the outer edge of the left speaker.
Also the sound was if it coming from the back of the speakers at the mid height.

(there was some annoying cracks,I hope it's not my rig)

Edit:That's with @NTK files,not YT of course,didn't try YT at all.
 
Last edited:

Sokel

Master Contributor
Joined
Sep 8, 2021
Messages
6,161
Likes
6,260
It's funny how expectancy works thought.
I'm used to have a certain test (is like a ktick sound) that creates an ark starting from right to left,rising a little (or gaining depth,depends if I sit closer or farer) at the center of the image and now that I did that test my mind probably expected something similar and made me feel discomfort about the straight line this one has.
Weird stuff.
 

audiofooled

Addicted to Fun and Learning
Joined
Apr 1, 2021
Messages
533
Likes
594
Apparently there is quite a lot you can do with ITD's and ILD's but there's no free lunch when it comes to fidelity, clarity and image focus. Why not push this to the limits to try and see if there's any interest in further discussion:


On my headphones, effects are quite convincing. Externalization is as about as best I've ever heard. On the other hand, no clarity and no fidelity. Even though localization runs smoothly, there's simply no position where this sounds natural.

On my 2.1 system which is DIY and has controlled directivity, this also tracks well around the soundstage, with great depth, also height, but there's a gap in presenting the utmost backwards information. In my mind there's a conflict and it simply disregards it as if it were much attenuated. It sounds weird and not surprisingly so. This is something only a multichannel system could do.
 
Last edited:

goat76

Major Contributor
Joined
Jul 21, 2021
Messages
1,343
Likes
1,490
I constructed a series of WAV files (using the "arctic_a0010.wav" file from CMU Arctic database). The file "centered.wav" has both left and right channels equal amplitude (center panned).

I delayed the left channel by 15 samples (fs=48000, delay=0.3125 ms), which pulls the sound image to right -- file named "delayed_left_15_samples.wav".

Then I constructed a series of amplitude panned files, from left/right amplitude difference of 1, 2, 3 ... to 12 dB, to pull the image back towards left. You can listen to them to hear how the image shifts from right to left with the different amplitude panning. (File size limit prevented me from putting all WAV files in one ZIP archive.)

It seems to me that a 6 dB amplitude difference sort of pulled the image back to center -- Audacity screen shot below, showing a higher amplitude but delayed left channel.

View attachment 298246

I listened to your sound files and what I heard using my speakers that are set up 30 degrees in an equilateral triangle.

  1. the original file is of course centered.
  2. A delay of just 0.3125 ms in the left channel can only pull the sound image to about 8 degrees to the right.
  3. With a 1 dB amplitude difference, the image is pulled closer to the center but is still located on the right side.
  4. With a 2 dB difference, the image is close to the center but still on the right side.
  5. 3 dB difference, now the image is centered.
  6. From 4 dB, the image starts to drift to the left and further and further to the left as the amplitude difference is raised.
  7. At 11 dB, the image is very close to the left speaker, and at 12 dB it pretty much comes directly from the elements of the left speaker.

The Haas effect will work with up to 40-50 milliseconds delay depending on the sound object. If it's just a short "click" sound, the delay will cause separation and split the sound into two separated sounds at much shorter delay times than that.

I made another sound demo using the same voice recording as you did, but I used longer delay times from 0.5 ms to 3 ms.

It's a single 2 MB FLAC file that contains 9 changes from sentence to sentence:
  1. The first one is the original centered voice recording.
  2. The second one has a delay of 0.5 ms in the left channel moving the image a bit to the right.
  3. This one is delayed by 1 ms in the left channel moving the image further to the right.
  4. Delayed 2 ms in the left channel and moves to image even further to the right.
  5. Delayed 3 ms in the left channel and now I hear it all the way to the right speaker.
  6. Still delayed at 3 ms in the left channel but with an 11 dB stronger signal in the left channel, pulling the image all the way to the left speaker.
  7. Still delayed by 3 ms in the left channel but the amplitude difference is now reduced to 9 dB, and I hear the image just slightly on the inside of the left speaker.
  8. Still delayed by 3 ms in the left speaker and the amplitude difference is now down at 7 dB, which pulls the image closer to the center but I hear it still a little bit to the left.
  9. Still delayed by 3 ms in the left speaker but now with an amplitude difference of 5 dB. I hear this as a diffused sound that is not distinctly coming from either the left or the right speaker, but at the same time, coming from both speakers somewhat equally.
Here is the file: https://www.dropbox.com/s/me9zr7wiossvuf0/Voice Delay Test.flac?dl=0

How do you hear it on your speaker systems? :)
 

audiofooled

Addicted to Fun and Learning
Joined
Apr 1, 2021
Messages
533
Likes
594
I listened to your sound files and what I heard using my speakers that are set up 30 degrees in an equilateral triangle.

  1. the original file is of course centered.
  2. A delay of just 0.3125 ms in the left channel can only pull the sound image to about 8 degrees to the right.
  3. With a 1 dB amplitude difference, the image is pulled closer to the center but is still located on the right side.
  4. With a 2 dB difference, the image is close to the center but still on the right side.
  5. 3 dB difference, now the image is centered.
  6. From 4 dB, the image starts to drift to the left and further and further to the left as the amplitude difference is raised.
  7. At 11 dB, the image is very close to the left speaker, and at 12 dB it pretty much comes directly from the elements of the left speaker.

The Haas effect will work with up to 40-50 milliseconds delay depending on the sound object. If it's just a short "click" sound, the delay will cause separation and split the sound into two separated sounds at much shorter delay times than that.

I made another sound demo using the same voice recording as you did, but I used longer delay times from 0.5 ms to 3 ms.

It's a single 2 MB FLAC file that contains 9 changes from sentence to sentence:
  1. The first one is the original centered voice recording.
  2. The second one has a delay of 0.5 ms in the left channel moving the image a bit to the right.
  3. This one is delayed by 1 ms in the left channel moving the image further to the right.
  4. Delayed 2 ms in the left channel and moves to image even further to the right.
  5. Delayed 3 ms in the left channel and now I hear it all the way to the right speaker.
  6. Still delayed at 3 ms in the left channel but with an 11 dB stronger signal in the left channel, pulling the image all the way to the left speaker.
  7. Still delayed by 3 ms in the left channel but the amplitude difference is now reduced to 9 dB, and I hear the image just slightly on the inside of the left speaker.
  8. Still delayed by 3 ms in the left speaker and the amplitude difference is now down at 7 dB, which pulls the image closer to the center but I hear it still a little bit to the left.
  9. Still delayed by 3 ms in the left speaker but now with an amplitude difference of 5 dB. I hear this as a diffused sound that is not distinctly coming from either the left or the right speaker, but at the same time, coming from both speakers somewhat equally.
Here is the file: https://www.dropbox.com/s/me9zr7wiossvuf0/Voice Delay Test.flac?dl=0

How do you hear it on your speaker systems? :)

Hey, this is fun, thanks :)

I listened through headphones only but it would be interesting to compare it to the speakers. So far, my impressions are that 1-5 works as described but 6-9 are all diffused with more difference in focus then in localization.

6. Would this be a typo? (I doubt both delay and amplitude are in the left channel?) Anyway, this is most awkward sounding and the voice is stretched in space the most.

7-9 are having a bit less diffusion than 6, with gradual image shifts to the right.
 

IAtaman

Major Contributor
Forum Donor
Joined
Mar 29, 2021
Messages
2,410
Likes
4,172
It's a single 2 MB FLAC file that contains 9 changes from sentence to sentence:
  1. The first one is the original centered voice recording.
  2. The second one has a delay of 0.5 ms in the left channel moving the image a bit to the right.
  3. This one is delayed by 1 ms in the left channel moving the image further to the right.
  4. Delayed 2 ms in the left channel and moves to image even further to the right.
  5. Delayed 3 ms in the left channel and now I hear it all the way to the right speaker.
  6. Still delayed at 3 ms in the left channel but with an 11 dB stronger signal in the left channel, pulling the image all the way to the left speaker.
  7. Still delayed by 3 ms in the left channel but the amplitude difference is now reduced to 9 dB, and I hear the image just slightly on the inside of the left speaker.
  8. Still delayed by 3 ms in the left speaker and the amplitude difference is now down at 7 dB, which pulls the image closer to the center but I hear it still a little bit to the left.
  9. Still delayed by 3 ms in the left speaker but now with an amplitude difference of 5 dB. I hear this as a diffused sound that is not distinctly coming from either the left or the right speaker, but at the same time, coming from both speakers somewhat equally.
Here is the file: https://www.dropbox.com/s/me9zr7wiossvuf0/Voice Delay Test.flac?dl=0

How do you hear it on your speaker systems? :)
That is very interesting. I did listen to it with 2 different open back headphones and near-field speakers. As expected, speakers sound very different to headphones. The effect is still audible but a lot more subtle compared to the headphones. I will do a bit of listening test later on and post my impressions as well. Thanks a lot for this.
 

goat76

Major Contributor
Joined
Jul 21, 2021
Messages
1,343
Likes
1,490
Hey, this is fun, thanks :)

I listened through headphones only but it would be interesting to compare it to the speakers. So far, my impressions are that 1-5 works as described but 6-9 are all diffused with more difference in focus then in localization.

6. Would this be a typo? (I doubt both delay and amplitude are in the left channel?) Anyway, this is most awkward sounding and the voice is stretched in space the most.

7-9 are having a bit less diffusion than 6, with gradual image shifts to the right.

I think all 6 to 9 have a more or less diffuse sound, but with loudspeakers, it will most likely sound more directional to one or the other side of the stereo field.

“6.” is not a typo, the delay must be in the left speaker for the sound to arrive earlier in the right speaker to pull the sound to the right, and the stronger amplitude must also be in the left speaker to pull the sound to the left. In my system, +11 dB in the left speaker wins over 3 ms earlier arrived sound in the right speaker so that the direction is clearly coming from the left speaker.

With the full channel separation you get with headphones, it’s possible you here it more as a wider stereo effect in “6.”, and less so with lesser amplitude difference from 7 to 9.
 

Sokel

Master Contributor
Joined
Sep 8, 2021
Messages
6,161
Likes
6,260
I think all 6 to 9 have a more or less diffuse sound, but with loudspeakers, it will most likely sound more directional to one or the other side of the stereo field.

“6.” is not a typo, the delay must be in the left speaker for the sound to arrive earlier in the right speaker to pull the sound to the right, and the stronger amplitude must also be in the left speaker to pull the sound to the left. In my system, +11 dB in the left speaker wins over 3 ms earlier arrived sound in the right speaker so that the direction is clearly coming from the left speaker.

With the full channel separation you get with headphones, it’s possible you here it more as a wider stereo effect in “6.”, and less so with lesser amplitude difference from 7 to 9.
I had to write them down,they go fast since it's only one file!
So.
1 centered
2 center-right
3 more to the right
4 hmm right
5 dead right
6 left
7 left
8 diffused left
9 diffused hmm,more leftish to me.

(hope I didn't mixed them up,it was fast! )
 

goat76

Major Contributor
Joined
Jul 21, 2021
Messages
1,343
Likes
1,490
I had to write them down,they go fast since it's only one file!
So.
1 centered
2 center-right
3 more to the right
4 hmm right
5 dead right
6 left
7 left
8 diffused left
9 diffused hmm,more leftish to me.

(hope I didn't mixed them up,it was fast! )

I think it helps to point with the arm in the direction I hear the sound coming from, It makes it easier to pinpoint the direction of the more diffused sounds in 6 to 9, even when it’s a short time between them.
 

tmtomh

Major Contributor
Forum Donor
Joined
Aug 14, 2018
Messages
2,782
Likes
8,183
I think it helps to point with the arm in the direction I hear the sound coming from, It makes it easier to pinpoint the direction of the more diffused sounds in 6 to 9, even when it’s a short time between them.

I haven’t tried these yet, but in general I agree about pointing to where you perceive the sound coming from. However, in my experience the pointing is more consistently accurate (to where I’m actually hearing/perceiving the sound) if I close my eyes, then point, then open my eyes to see where I’ve pointed. I’ve often been surprised!
 

Ricardus

Addicted to Fun and Learning
Joined
Mar 15, 2022
Messages
843
Likes
1,153
Location
Northern GA
Not sure what you mean? Whether real or imaginary, the "width" of the stereo image is a very real thing. You can drop stuff on the very far right or left, or in the dead center, or somewhere in-between - and that's what the perception of "stage width" in a stereo image is.
Yes. I'm aware. What I meant was I've worked in dozens of studios with lots of middling engineers and a few who have shiny things, and I just don't hear the word "soundstage" used in professional studios, unless they're talking about the big buildings TV shows and movies are shot in.

We're all well aware of what happens when we twist the pan pot.
 

audiofooled

Addicted to Fun and Learning
Joined
Apr 1, 2021
Messages
533
Likes
594
What about distance and depth? Phase? How much is in the recording and what should loudspeakers and setup, in room reflections do?

Some interesting links on Geoff Martin's page:


An example of the recording and map he talked about in his BeoLab90 presentation:


AES paper on audibility of phase response differences:


(Sadly I'm not a member but if someone would be kind enough to share some key points of the paper, maybe what's so different in audibility thresholds in headphones and loudspeakers, you're most welcome)

"For example, one of the experiments that we did here at B&O some years ago showed that a difference as small as 3 degrees in the phase response matching of a pair of loudspeakers could cause a centrally-located phantom image to lose precision and start to become fuzzy."
 

dlaloum

Major Contributor
Joined
Oct 4, 2021
Messages
3,165
Likes
2,428
What about distance and depth? Phase? How much is in the recording and what should loudspeakers and setup, in room reflections do?

Some interesting links on Geoff Martin's page:


An example of the recording and map he talked about in his BeoLab90 presentation:


AES paper on audibility of phase response differences:


(Sadly I'm not a member but if someone would be kind enough to share some key points of the paper, maybe what's so different in audibility thresholds in headphones and loudspeakers, you're most welcome)

"For example, one of the experiments that we did here at B&O some years ago showed that a difference as small as 3 degrees in the phase response matching of a pair of loudspeakers could cause a centrally-located phantom image to lose precision and start to become fuzzy."
I think people underestimate the influence of phase coherence in reproducing tangible spatial illusions.

This is exacerbated by complicated daw recording chains that use various non phase coherent filters / eq, with the ultimate result of an impressionistic rendition rather than a realistic one.

This may be one reason why some of the most " real " high fidelity recordings were done very simply... ( as in, one of the reasons they sound so real is the simplicity of the recording chain, which as a consequence keeps the phase coherence intact) - and those types of recordings typically used simple crossed mics or similar strategies, which capture the original signal in a simple manner, and minimise mixing which may dilute the phase clarity of the direct sound.
 

mixsit

Member
Forum Donor
Joined
Jul 20, 2019
Messages
75
Likes
23
“6.” is not a typo, the delay must be in the left speaker for the sound to arrive earlier in the right speaker to pull the sound to the right, and the stronger amplitude must also be in the left speaker to pull the sound to the left. In my system, +11 dB in the left speaker wins over 3 ms earlier arrived sound in the right speaker so that the direction is clearly coming from the left speaker.
It takes quite a bit of audio level to 'undo the image shift from very small differences in arrival time.
Left over from my home studio/audio learnin' days :>)
 
Top Bottom