• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

How do we perceive “soundstage” and “imaging”?

STC

Active Member
Joined
Mar 7, 2017
Messages
278
Likes
114
Location
Klang Valley
I recall seeing that and the Lexicon thing years ago.

This is interesting - similar to what I experienced - from:
https://forums.prosoundweb.com/index.php/topic,156030.0.html?PHPSESSID=3iunvg8v9qomorr3ls2vo2uca3

"I have been involved in installation of several of these type systems and used to have the only portable VRAS (which is what the Constellation was called before Meyer bought it and put a new name on it) that we used to take around for demos.

For any of these type systems (there are several manufacturers who have this type of system-and each has advantages and disadvantages), the room has to be DEAD-NOT LIVE.

You CANNOT take away existing reflections (reverb).

You can only ADD IT.

Yes these type of systems can help congregational singing when the room is dead.

But putting them in a live room is a TOTAL waste of money.

All they can do is make a bad situation worse.

They REALLY need to be aware of this before making a large financial mistake. "

Further on - interesting concept:

"Constellation is used for a different purpose in the restaurants - there, it is essentially a fancy sound masking system. The idea is that if you make the restaurant totally dead acoustically it will be too quiet and a table will overhear conversation from the next table over. No acoustic treatment and the restaurant will get too loud very fast. Constellation allows you to start with a totally dead room and play around getting the exact amount of reverberation you want to make sound from the next table over unintelligible while setting a limit on how loud the space gets. "

One of the big things in recording studio control room design was LEDE - Live end, dead end. The front with the far field speaker doghouses was deadened with something like Sonex, the rear of the room was treated using RPG Diffusors.

RPG Diffusors:

Quadratic: http://www.rpgeurope.com/products/product/modffusor.html
View attachment 50737View attachment 50738




The also make various others like the Hemiffussor - you'll see this on one of the late night talk shows
https://www.bhphotovideo.com/c/prod...r_Systems_HEMIP_2_Hemiffusor_W1_Diffusor.html

View attachment 50736

After spending a lot of time doing sound with the Pittsburgh Symphony Orchestra outdoors for their Point State Park gigs, there's nothing like hearing them in a place like Heinz Hall or Carnegie - even with artificial reverb systems (we tried - had telephone poles in the park for them) Sucked real bad. Even the recording done using the various mic trees and such... ehhh.

The acoustical power output of 102 people playing a fortississimo on something like Copeland's Rodeo is not within reach of any system I've ever heard. And what's real interesting is that on stage it sucks too. All you hear is brass and percussion, depending on what section you're standing in. Nothing like what you hear out in the hall...

Without going out of topic, I emphasized the 'domestic concert hall" to show that it is possible to create the spatial information of a venue to give realism. The main point is still crosstalk cancellation of the front stereo pair to produce the correct ILD and ITD at ears.
Do you have an overview of your system and software? 30 channels is a serious investment, and if you're maxing an i9 that's pretty mind-boggling. Or are you basically using the system described on the ambiophonics page - the LaScala impluses (only 24 channels shown) with Voxengo? What bandwidth is needed in the ambience channels? My limited experiments with ambio were mind-blowing on the right material, but my sense is that on things like Cowboy Junkies Trinity Sessions low-frequency ambience is actually important meaning you can't just use little satellite speakers.

As I indicated above, my mad-scientist idea is to try to do this over headphones by recording appropriate HRIR responses. this would require physical speakers for measurement but not as a permanent set-up. It might ultimately not work without head-tracking, but I figure it's an interesting avenue of investigation.

I missed to reply this post earlier.
Surprising the cost is much lower than my audiophiles setup. I used to have Mytek, Theta Classe, Supratek and Marantz SA11S2. After getting CrownXLS and blindtests, I got rid of Classe and then all of them for a Motu interface which can give you up to 128 channels. And a few $20 digital amps for the surround. For the multi channel format the rear and side is driven by Marantz and Sony Amp as you need better speakers and amplifiers for those discrete channels.

All these thru is fed thru Reaper. The crosstalk itself hardly do not take even 1% of the CPU but the SIR2 convolution engine is the one taking my CPU.

I was using i7-7700 and it was at 82%. I thought the i9 would reduce the load and would allow me to add more channels but it turned that the load actual increased by 2% to 84%. But the ambiance sounded much nicer. Never understood this part.
p.s. I wrongly quoted 82% in my early post. It should have been 84%.
 

John Galt

Member
Forum Donor
Joined
Feb 11, 2020
Messages
96
Likes
102
Great discussion. Enjoyable reading while listening to The Dark Side of the Moon on my recently upgraded system, thanks in part to all the great information in this forum.

Thank you, and cheers!
 
Last edited:

dshreter

Addicted to Fun and Learning
Joined
Dec 31, 2019
Messages
808
Likes
1,258
As to headphones, they fail to take into consideration head related transfer/impulse functions and the auris externa - see the lectures from Dr Land at Cornell below...

Also see this -https://www.jneurosci.org/content/24/17/4163

And this: https://pdfs.semanticscholar.org/0e76/923ed6c85fcdd8d9a2f269d5c7493b3c3abd.pdf
"Clearly, localization is not isolated to simply the sounds heard. Many more effects contribute to
localization than that proposed by the duplex theory. Although Wightman & Kistler have shown that a
virtual auditory space can be generated through headphone delivered stimulus, they are still lacking some
key features. The ability to accurately reproduce elevation localization may be a problem for aircraft
simulations. Other cues such as head movements and learning may also help in sound localization. For
commercial applications where localization does not need such accuracy, an average HRTF can be
created to externalize sounds."


Also see this https://pages.stolaf.edu/wp-content/uploads/sites/406/2014/07/Westerbert-eg-al-2015-Localization.pdf
View attachment 44078
There's been many a patent that discusses trying to get headphones to accurately mimic human hearing and it's interaction to the environment

In addition - see this https://core.ac.uk/download/pdf/33427652.pdf

I wrote this a while ago on the Hoffman forum:
Typically, bass is pretty much omnidirectional below about 80-100 - the entire structure begins moving.

During studio construction one of the things we do with infinite baffle/soffit mounting designs is to isolate the cabinets from the structure to minimize early energy transfer - this keeps the structure from transmitting bass faster than air to the mix location. Sound travels faster thru solids - recall the ol' indian ear-on-the-rail thing?

Why does sound travel faster in solids than in liquids, and faster in liquids than in gases (air)?

One thing you want to avoid is the bass from speaker coupling to the building structure and arriving at you ear sooner than the sound from the speakers. This can cause a comb filtering where you lose certain frequencies due to cancellation.

Google recording studio monitor isolation and note the tons of isolation devices sold for this reason...

Here's a doghouse design for UREL 813's I did a while ago:
hex6.jpg


As to mixing - I rarely use pan pots for directional info in my mixes. I use various time-based methods to try and simulate the precedence effect as well as directional cues and stimulate impulse responses / head related transfer functions (HRTF).
Head-related transfer function - Wikipedia
"A pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a particular point in space."

One thing it does is to really open up the mono field, since instruments are now localized and can be sized depending on the early reflections I set up in something like a convolution reverb.

A great write up here on the Convolvotron:
HRTF-Based Systems – The CIPIC Interface Laboratory Home Page

Part of a great resource for modern sound localization efforts for HMI audio:
The CIPIC Interface Laboratory Home Page – Electrical and Computer Engineering

As to low frequency information:
From Sound localization - Wikipedia :
Evaluation for low frequencies
For frequencies below 800 Hz, the dimensions of the head (ear distance 21.5 cm, corresponding to an interaural time delay of 625 µs) are smaller than the half wavelength of the sound waves. So the auditory system can determine phase delays between both ears without confusion. Interaural level differences are very low in this frequency range, especially below about 200 Hz, so a precise evaluation of the input direction is nearly impossible on the basis of level differences alone. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound's lateral source, because the phase difference between the ears becomes too small for a directional evaluation.[11]

Interesting info here from Dr. Bruce Land on sound localization - end of #25 and into #26


Note the comment concerning using a bus on a DAW to mimic HRTF. Also note his refernece to the CIPIC database .

This prevents the "ear pull" associated with unbalanced RMS levels across the ears. As Dr. Land mentions, your ear localizes based on time as well as amplitude. The interaural time difference ITD (Interaural time difference - Wikipedia ) is as critical as Interaural Level Differences (ILD). As he states, humans learn early on to derive directional cues from impulse responses at the two ears.

One thing that has to be said is the significant differences in head related transfer function between various people - but note the chart where he mentions the one person with a -48dB notch at 6kHz - the curves up to around 5kHz are fairly close and in the A weighted range...

Another great lecture on sound localization from MIT:
20. Sound localization 1: Psychophysics and neural circuits

I used various time-based techniques on this - a remix of Whole Lotta Love from the original multitracks:
Remix of WLL

Watch/listen to the Comparison video...

Another technique is to use double tracking and artificial double tracking (ADT) which will spread the instrument/spectra across the panorama - Automatic double tracking - Wikipedia
- tho this can lead to mono compatibility issues... some effects that do this use a bunch of bandpass filters whereas you can set the delays for each band. Note what George Martin and Geoff Emerick mention using older analog style, tape-based ADT during the Anthology sessions. Again, these techniques reduce the unbalanced feeling across the head but still open up the stereo field to allow all the instruments to sit in the stereo image.

Double tracking - both natural and ADT - is prevalent in a lot of the metal mixes - for instance:
Remix of the Curse of the Twisted Tower
Note that the opening of the first clips was locked into what it is since it didn't exist on the multitracks and was flown in on the original release. But on the other samples compare the mixes and notice they don't sound as disjointed across the panorama as does the older, pan-only mixes. One of the band members commented on how he was able to hear his solos better.
[/QUOTE]
This is super cool stuff. I'm very interested in how these different approaches to localization are preserved better or worse based upon speaker dispersion characteristics and room acoustics.

Beyond accuracy, speaker dispersion also varies significantly (wide vs narrow). There are also different philosophies on acoustic treatment. Some approaches that favor side wall reflection and diffusion, my suspicion is this accentuates what is on the recording - for example additional reverb could make a recording sound more spacious. Others favor maximizing direct sound and killing all first reflections, which might be technically more faithful to the recording but perhaps performs worse for subjective preference.

Understanding how localization is created in a recording should be very telling for what is necessary to preserve it within the listening environment.
 

onion

Senior Member
Joined
Mar 5, 2019
Messages
343
Likes
383
I have a setup in a room for music and movies.
For music - music sources-Bacch4Mac-Mac mini USB output-Lyngdorf 3400-speakers (including in-wall speakers and two subs)
For movies - AV sources-Anthem MRX amp-Lyngdorf3400 (HT bypass for subs and FrLt/ FrRt)- speakers and subs

Music definitely works better with more sound absorption of first reflections. Placing rear wall sound absorption panels improves stereo imaging and reverb. I believe this is due to how Bacch4Mac works.
Movies actually suffer a little bit with too much sound absorption. The same rear wall panels that benefit music sound quality actually cause the rear speakers to become more easily localisable and the soundfield less enveloping.
 

onion

Senior Member
Joined
Mar 5, 2019
Messages
343
Likes
383
Yes - though only for music. It's a 2+2 setup (2 speakers, 2 subs) that is room-corrected and driven by the Lyngdorf 3400 amp. In practice, BACCH doesn't really affect the subs as they operate at frequencies too low to be relevant for stereo imaging.

For movies, neither the Mac mini nor BACCH are part of the setup.
 

onion

Senior Member
Joined
Mar 5, 2019
Messages
343
Likes
383
How can it do that? For Atmos/ DTS sources, are you suggesting using the AV pre-outs for the three front speakers and routing them somehow into 2 stereo channels on the Mac mini before going into BACCH?
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,336
Likes
6,705
On the subject of 3D soundstage, @Tom Danley posted this demo in another thread. I have to be honest and say I wasn't totally sure if I'd ever "clearly" experienced the height "dimension" in stereo(at least not without turning on Auro3D), but the UP Left and UP Right files really do travel cleanly upward in the vertical "dimension". Really quite cool how they do it, too:

"By embedding the filtering characteristics of the pinna into the audio signal, sound can be moved around the listener's head from a single pair of loudspeakers."

It may be an illusion, but my brain can't really tell the difference :D.
 

MakeMineVinyl

Major Contributor
Joined
Jun 5, 2020
Messages
3,558
Likes
5,874
Location
Santa Fe, NM
I have always had some extremely directional phase coherent speakers and always had an extremely pin-pointable depth in sound reproduction (only at 1 particular listening position spot in a conditioned room) which I could never achieve with non-directional speakers.
I have also heard a horn speaker that was quite directional in a large space that could recreate a large and stable soundstage, seemed room dependent with that one.
Just my experience. It may well differ from those of others.

This has been my experience as well as my horn speakers are very directional and the result is that they project spatial cues in recordings very accurately. Most speakers cannot reproduce binaural recordings to full effect (or at all) because of contamination from room reflections - these can.

True, the effect deteriorates out of the sweet spot, but I always sit there for serious listening. ;)
 

wiggum

Member
Joined
Nov 27, 2018
Messages
97
Likes
64
I didn't read all the 9 pages, but the only time I have perceived soundstage is with the sofalizer plugin in ffmpeg. Sofalizer uses HRTF files published in sofa format.

For example, I listen to music like this

Bash:
mpv --af-add=sofalizer=sofa=ClubFritz6.sofa:radius=2:gain=0:elevation=0:interpolate=1 myfile.mp3

You can download the sofa file from ClubFritz files. More sofa files are available at https://www.sofaconventions.org/mediawiki/index.php/Files

I have found through trial & error, ClubFritz6 is the most realistic.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
I know but just wanted to state the fact that a very convincing soundstage experience with headphones is certainly possible. IME it's more convincing than 2 channel stereo via speakers.

I have to admit though that this works very good with radio dramas but not with music - at least I haven't heard yet a good dummy head recording of music.

I find that simply wearing headphones/earphones is enough to destroy any spatial realism. Perhaps one can get use to having something on one's head and ears...
 

Joeshmoe

Member
Joined
Nov 2, 2021
Messages
13
Likes
10
Hi, all -Noob here - I found this site after experiencing a serious soundstage / imaging perception upgrade with new equipment. I certainly appreciate all the information on this thread. I'm not a sound engineer, but I do have a PhD in neuroscience, largely in electrophysiology, which gives me some perspective on processing.

OK - I have a decent system (Prima Luna 400 series preamp and monoblock power amps, Golden Ear Triton Reference speakers) which I thought sounded fantastic with a BluOS DAC / streamer. I listen mostly to Tidal. Stereo store now lent me a Moon 260D DAC, and comparing them A/B is, uh, ear-opening. With the BluOS, yes, I can tell positioning horizontally, but not great, and all happening between the speakers. With the Moon, the soundstage immediately unfolds to the walls, 5-6 ft to the side of each speaker. And spatial localization of instruments, vocals, becomes very noticeably more precise. It really varies by recording - Superb ones so far include Bill Frisell Baha, Peter Green A Fool No More, Yo-Yo Ma et al, Beethoven Op.56...pretty much across alll genres (I, uh, wouldn't know about EDM or rap). More modern or remastered recordings are generally better. Pink Floyd Dark Side of the Moon was disappointing, (I think it was better when I was 13 on my crap stereo), whereas the remastered Stairway to Heaven was exquisite - guitar left, with every string definable; flute clearly about 5 ft to the right of my right speaker. The vocals even seemed to come from an elevation about 3 ft above the guitar, and centre.

So, looking at the neurophysiology of it (Great MIT lecture someone posted here earlier), your brain localizes sound by both temporal and dynamic differences between the two ears. The temporal sensitivity for a just noticeable difference is 10 μs (and dynamic sensitivity 1dB). 10 μs is ridiculously small - I spent a good chunk of my life measuring neuronal action potentials, and those are about 1 ms, so 10 μs = 1/100th of that. So, for totally accurate spatial resolution, it would seem that the system error has to be less than 10 μs, or at an accurate sampling frequency of better than 100kHz. I'm speculating that reflected sound, i.e., that which would produce a soundstage to the outside of the speakers, would have compounding error, and so require even more precision.

So, maybe the perception of soundstage and imaging is highly dependent on the quality of the timing of the DAC? And, I guess, for its accuracy of dynamic differences in sound between the two ears (1 dB accuracy).

I'm going to try the Moon 680D next. Stereo salesman says the difference between that and the 280D is as big as that between the 280D and the BluOS. I can't conceive of it being better, but very willing to try. Quite a surprise to me, but the opening of the soundstage and precision of imaging hugely increased the subjective pleasure I get from recordings.

Any thoughts from people here appreciated.
 

Inner Space

Major Contributor
Forum Donor
Joined
May 18, 2020
Messages
1,285
Likes
2,938
Welcome aboard, @Joeshmoe, and please stick around. There's a lot to learn about neural stuff, and any help will be appreciated.

In turn, we might end up impressing on you the bizarre and counterintuitive fact that we really don't know what we hear, unless we try it with strict controls. Experience has shown that a sales guy saying tempting things can produce huge confirmation bias later. It really would be technically astonishing if the BluOS and the Moon actually sounded so different - absent obvious defects or mismatches, of course.

Let's try to figure it out together - could be fun.
 

Joeshmoe

Member
Joined
Nov 2, 2021
Messages
13
Likes
10
Hi, and thank you for the welcome! I'm looking forward to learning, seems like a great bunch here.
I have some familiarity with the bizarreness of perceptions with humans! And I certainly agree as to commonality of confirmation bias - but in this case, the salesman only told me it would be "better" before I listened at home - the perceptions were my own (although agreed to by the salesman when I next spoke to him a few days later). And I can certainly tell the difference blind, with someone else pushing the A/B buttons randomly.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
So, looking at the neurophysiology of it (Great MIT lecture someone posted here earlier), your brain localizes sound by both temporal and dynamic differences between the two ears. The temporal sensitivity for a just noticeable difference is 10 μs (and dynamic sensitivity 1dB). 10 μs is ridiculously small - I spent a good chunk of my life measuring neuronal action potentials, and those are about 1 ms, so 10 μs = 1/100th of that. So, for totally accurate spatial resolution, it would seem that the system error has to be less than 10 μs, or at an accurate sampling frequency of better than 100kHz. I'm speculating that reflected sound, i.e., that which would produce a soundstage to the outside of the speakers, would have compounding error, and so require even more precision.
The time resolution of audio is explained here: https://troll-audio.com/articles/time-resolution-of-digital-audio/ Your calculation is erroneous.

That 10μs is valid when comparing differences between signals in each ears for headphones, and only using specific test signals, not when presented to both ears through speakers.

Dynamic sensitivity depends on bandwidth of the sound and its amplitude. 0.1dB is audible across broadband differences. This is why, usually, you hear differences between electronics in the chain—because you did not to match levels between gear. A less likely but not to be dismissed reason is that your speakers are fairly directional, especially vertically, which makes small differences in your listening position produce large changes in the direct sound reaching your ears. https://www.stereophile.com/content/goldenear-technology-triton-reference-loudspeaker-measurements Small differences in spectral content and overall output will not sound like your playing something louder or quieter; the bigger signal will sound larger, fuller, brighter, more present, etc.

The stereo salesman is messing with you when he says this or that piece of gear is so much better. I.e., he's just doing his job, making you want the next piece.
 

Joeshmoe

Member
Joined
Nov 2, 2021
Messages
13
Likes
10
I'm impressed with the speed of people responding here !

The time resolution of audio is explained here: https://troll-audio.com/articles/time-resolution-of-digital-audio/ Your calculation is erroneous.

The time resolution of digital audio link you gave is interesting, and seems to make sense - I'll have to think about it. Thanks.
That 10μs is valid when comparing differences between signals in each ears for headphones, and only using specific test signals, not when presented to both ears through speakers.
I don't think I can agree with that. The 10μs interaural time difference sensitivity would seem to be a source-agnostic neural feature.
See this link someone posted earlier: 20. Sound localization 1: Psychophysics and neural circuits

Dynamic sensitivity depends on bandwidth of the sound and its amplitude. 0.1dB is audible across broadband differences. This is why, usually, you hear differences between electronics in the chain—because you did not to match levels between gear. A less likely but not to be dismissed reason is that your speakers are fairly directional, especially vertically, which makes small differences in your listening position produce large changes in the direct sound reaching your ears. https://www.stereophile.com/content/goldenear-technology-triton-reference-loudspeaker-measurements Small differences in spectral content and overall output will not sound like your playing something louder or quieter; the bigger signal will sound larger, fuller, brighter, more present, etc.

Sorry, I should have been more clear. By "dynamic" I meant to refer to intra-aural level difference sensitivity in sound localization, again, a brain rather than equipment feature.
The stereo salesman is messing with you when he says this or that piece of gear is so much better. I.e., he's just doing his job, making you want the next piece.
Exactly. Which is why I like to test things at home, with my own system, where all else can be equal, and I can do blind A/B judgements,,,
 
Last edited:

dshreter

Addicted to Fun and Learning
Joined
Dec 31, 2019
Messages
808
Likes
1,258
Exactly. Which is why I like to test things at home, with my own system, where all else can be equal, and I can do blind A/B judgements,,,
One very important concept if doing blind A/B comparisons is that it's truly blind (someone else does the switching and you don't know what you're listening to), and that it is level matched (all other things being equal, a louder source will sound better).

For there really to be that dramatic of a difference, something would need to be functionally wrong with your BluOS streamer. Those are all decent devices, and they certainly don't have any kind of limitation that would interfere with stereo imaging. If you're convinced the two units really sound that different even when level matched, I would investigate what is wrong with your streamer.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
I don't think I can agree with that. The 10μs interaural time difference sensitivity would seem to be a source-agnostic neural feature.
See this link someone posted earlier: 20. Sound localization 1: Psychophysics and neural circuits
It's not a neural feature. Let me clarify what I mean: this number was ascertained through psychoacoustic experiments (based on subjects responses) and not through modelling of neural responses.

Here's a recent study explaining all the relevant history and details: https://asa.scitation.org/doi/10.1121/1.5087566
Sorry, I should have been more clear. By "dynamic" I meant to refer to intra-aural level difference sensitivity in sound localization, again, a brain rather than equipment feature
Here the relevant metric is channel balance and linearity (the DAC's ability to track and accurate reproduce level differences). You'll see that in almost all cases DACs have no problem with this.
Exactly. Which is why I like to test things at home, with my own system, where all else can be equal, and I can do blind A/B judgements,,,
The test should be ABX rather than AB, and unless you measured the electrical output of both DACs using a voltmeter or some other tool that's accurate in the audio band, this is the likely culprit.

Otherwise it will be something broken, which is unlikely. All that soundstage stuff you're hearing is the speaker's interaction with the room, playing back what's in the music. DACs hardly come into it, in the sense that most do a competent job.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,830
The time resolution of digital audio link you gave is interesting, and seems to make sense - I'll have to think about it. Thanks.

In the context of localization the only thing that matters is channel to channel accuracy and since most ADCs\DACS are near perfectly synchronized, the timing accuracy is as well, within the limits of bandwidth and SNR of the system. For all practical purposes it is a 0 issue and much better in digital than any other format.

I don't think I can agree with that. The 10μs interaural time difference sensitivity would seem to be a source-agnostic neural feature.
See this link someone posted earlier: 20. Sound localization 1: Psychophysics and neural circuits

You shouldn't agree with it. It is wrong with a caveat. You do get interaural time difference cues with speakers, BUT, and this is a big but, that can never extend the image outside the speakers (specifically the angle). In starts to get fuzzy in the area of the angle of the speakers. Headphones don't have that limitations because they do not have crosstalk.


Sorry, I should have been more clear. By "dynamic" I meant to refer to intra-aural level difference sensitivity in sound localization, again, a brain rather than equipment feature.

You are missing the other mechanism by which we localize. Spectral content. Our body impacts the spectral content that reaches the interior of our ears and gives us further clues as to where things are, like height. It is very difficult to encode in two channel audio with speakers.
 
Top Bottom