How do we perceive “soundstage” and “imaging”?

Thomas_A · Jan 7, 2020

Wombat said:
You said it opened-up dimensionally. Width and depth specifically.

Because I focus on what the ears pick up in comparison to the other songs. Nothing else.

Thomas_A · Jan 7, 2020

Wombat said:
I did make that recognition. Unfortunately the postings so far mostly are individual anecdotes.

I posted some links to research regarding depth perception - but there has been no comments to those.

mocenigo · Jan 7, 2020

Thomas_A said:
I posted some links to research regarding depth perception - but there has been no comments to those.

I missed those. Please remind me, I will also try and find them. I am very curious.

Thomas_A · Jan 7, 2020

mocenigo said:
I missed those. Please remind me, I will also try and find them. I am very curious.

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#range
https://www.academia.edu/7275281/EN..._WITH_DYNAMIC_CUES_RELATED_TO_SOURCE_POSITION

ajawamnet · Jan 7, 2020

mocenigo said:
I missed those. Please remind me, I will also try and find them. I am very curious.

I posted some stuff on HRTF/HRIF - see these posts:

https://www.audiosciencereview.com/...-“soundstage”-and-“imaging”.10623/post-294650

https://www.audiosciencereview.com/...-“soundstage”-and-“imaging”.10623/post-294742

There's a great set of vids from Prof. Land at Cornell talking about localization... it's a college-level course for FPGA development but has some great info on how humans are thought to perceive aural location cues...

ajawamnet · Jan 7, 2020

ajawamnet said:
I posted some stuff on HRTF/HRIF - see these posts:

https://www.audiosciencereview.com/forum/index.php?threads/how-do-we-perceive-“soundstage”-and-“imaging”.10623/post-294650

https://www.audiosciencereview.com/forum/index.php?threads/how-do-we-perceive-“soundstage”-and-“imaging”.10623/post-294742

There's a great set of vids from Prof. Land at Cornell talking about localization... it's a college-level course for FPGA development but has some great info on how humans are thought to perceive aural location cues...

Also in that first link above is a link to the UC Davic CIPIC Interface Lab homepage - lots of great info there...

xr100 · Jan 19, 2020

ajawamnet said:
As to modern pop type music, it's so artificial to begin with - you're simultaneously inside the piano, in the middle of the drum kit, in the middle of the guitar amp, horn, etc... while also being right in front of the mouth of the singer. The artform of pop recording that pioneers like Sir George Martin, Quincy Jones, Roger Nichols and others created is that it IS an artificial aural landscape.

I think it is important here to place the development of music in the 20th Century era into context.

As a "thought experiment" of sorts, consider pounding, "four-to-the-floor" (four kicks per bar) 140+bpm "EDM." The last thing you'd want would be a large amount of reverb, if any, on the kick. (OK, so the kick may well be entirely synthetic... but...) On the other hand, you could well get away, in such idioms, with drenching a vocalist in reverb...

That represents fairly extreme cases--however, with less extreme cases, it still doesn't necessarily make any sense to attempt to "copy" a "real" acoustic environment--idioms have developed around (and in some cases are only possible by) NOT being constrained in this way.

It otherwise makes little sense to be constrained to one either--why would one want to have a "band"-type spatial layout with the drummer at the back somewhere, which as "seen" from the "audience" position would end up more or less collapsing to mono? Additionally, it would limit spatial separation--hi-hats would be in the same position as a centre-panned vocalist, for instance--psychoacoustically suboptimal.

As for imaging in an "audiophile" sense on the vast majority of "pop/rock" music--well... for the most part... you might find pre-delayed reverb etc. on vocals for "projection" but... I think a lot of imagination is going on with such descriptors as "there was a palpable sense of the band being in the room"... panning and reverb is often just there for basic spatial placement, separation, sense of space and to make instruments "sound good" rather than to create a "3D image."

Somewhat on the flip side, over on the Sound on Sound website, there is a good article on the recording techniques used by engineer Bruce Swedien in the creation of Michael Jackson's thriller. To quote:

"'Rock With You' is also an excellent showcase for another of Swedien's creative live‑room production techniques. Each of the backing-vocal lines was first double‑tracked with a close mic, then Jackson moved a couple of steps back from the mic for another pass, while Swedien increased the preamp gain to match his level with the previous takes. Finally, an even more distant pass was captured using a Blumlein stereo pair, again matched for level. The result: an increased density of early reflections, which creates a natural depth and width to the soundfield."

ajawamnet said:
I think the only way that we will ever get to a method of TRUELY realizing any acoustic environment is through a holographic method of controlling the listening environs' air molecules. I doubt that any conventional multidriver transducer type system will ever work.

Wave field synthesis? (EDIT: Well, that has just been somewhat trashed by the video linked to in post #149...)

xr100 · Jan 19, 2020

Kal Rubinson said:
I have not yet read the replies to your initial post but I question whether any headphones or in-ear phones actually can create or, rather, re-create a sense of space.

IME of "generic" synthetic HRTF-based processing is that it is very difficult to position sounds frontally (as in e.g. "centre channel" location,) and generally difficult to create much sense of distance. For example, sounds can be heard from behind--but it's as if someone's "breathing down your neck" (complete with "tingling" sensations!)

IIRC, this is exactly what has been found in (some of) the literature.

Another aspect is that the localisation sounds spatially "warped," although with subsequent experiences (after listening to synthetic "binaural" material on headphones) of the "real world," it also becomes (more) obvious that our aural perception is spatially warped and error-prone, anyway.

Creative Labs' "Super X-Fi" system works by generating custom HRTFs based the on analysis of photos (of your ears.) I have not tried it...

xr100 · Jan 19, 2020

Proximity/early reflections... (David Griesinger)

ajawamnet · Jan 19, 2020

xr100 said:
Proximity/early reflections... (David Griesinger)

Wow that audio sucks... wanders in and out of phase - see the vid link of the audio

http://www.ajawamnet.com/2020-01-18_20-33-12.mp4

Great talk tho - too bad it's a bit tough to listen to...

STC · Feb 18, 2020

ajawamnet said:
Interesting concept, tho I still find it lacking in representation of a concert hall. ...

The concept of Ambiophonics + domestic concert hall is about creating concert hall sound with existing recordings. It can be 2.0, 5.1, 7.1 or ATMOS.

https://www.ambiophonics.org/the-home-concert-hall

xr100 · Feb 18, 2020

STC said:
The concept of Ambiophonics + domestic concert hall is about creating concert hall sound with existing recordings. It can be 2.0, 5.1, 7.1 or ATMOS.

https://www.ambiophonics.org/the-home-concert-hall

It would be interesting to test the impulse responses on that page; however, I suggest not using convolution of sampled IR's but rather "algorithmic" reverb. Apart from anything else, the sample is "static"--even e.g. small air currents in the "real world" mean that reverb is constantly slightly changing.

STC · Feb 18, 2020

xr100 said:
It would be interesting to test the impulse responses on that page; however, I suggest not using convolution of sampled IR's but rather "algorithmic" reverb. Apart from anything else, the sample is "static"--even e.g. small air currents in the "real world" mean that reverb is constantly slightly changing.

You can get other IRs too. I have tried different halls IRs and while there can be difference but EQ the convolution engine practical can make them indistinguishable with level controlling. I am using 100 over IRs and it gives you realistic sense of spaciousness and envelopment. The concept is somewhat similar to...

ajawamnet · Feb 18, 2020

STC said:
The concept of Ambiophonics + domestic concert hall is about creating concert hall sound with existing recordings. It can be 2.0, 5.1, 7.1 or ATMOS.

https://www.ambiophonics.org/the-home-concert-hall

But it would only be in the sweet spot of the speakers.

In a real hall, as you walk around, the sound from stage changes. Recall that the measure of sound absorption is the Sabine, based on cushions:

http://waywiser.fas.harvard.edu/objects/11606/sabines

Me thinks that the way the sound interacts inside of a place like that hall cannot be replicated to a moving listener even with a zillion channel system of drivers in a relatively small room.

Same with a concert using a PA system, tho that's be a bit easier since the original acoustic environs would also be thru an x-channel system of drivers/cabs. But even then, the instantaneous amount of cubic volume of air that something like a large flown, in-line array with multiple 18" subs or servodrives can move would dwarf any home system.

For example, go get a DVD and play it in the best "pinky-pinky" rich person's home theater you can find, then go see/listen to the same flick in a real theater.

For me the difference in my mid-fields with a 12" three way (I use them as far fields) is striking as compared to near fields or some 5:1 home thing that has tiny drivers. The effect larger far fields have on the perceived transients of things like a tympani or a kick drum and low toms dwarfs anything I've heard come out of small driver systems.

STC · Feb 18, 2020

ajawamnet said:
But it would only be in the sweet spot of the speakers.

In a real hall, as you walk around, the sound from stage changes. Recall that the measure of sound absorption is the Sabine, based on cushions:

http://waywiser.fas.harvard.edu/objects/11606/sabines

Me thinks that the way the sound interacts inside of a place like that hall cannot be replicated to a moving listener even with a zillion channel system of drivers in a relatively small room.

Same with a concert using a PA system, tho that's be a bit easier since the original acoustic environs would also be thru an x-channel system of drivers/cabs. But even then, the instantaneous amount of cubic volume of air that something like a large flown, in-line array with multiple 18" subs or servodrives can move would dwarf any home system.

For example, go get a DVD and play it in the best "pinky-pinky" rich person's home theater you can find, then go see/listen to the same flick in a real theater.

For me the difference in my mid-fields with a 12" three way (I use them as far fields) is striking as compared to near fields or some 5:1 home thing that has tiny drivers. The effect larger far fields have on the perceived transients of things like a tympani or a kick drum and low toms dwarfs anything I've heard come out of small driver systems.

Stereo is anti social. The best sound is always at the sweet spot meaning it is confined to one person. In a concert hall, the best sound usually confined to first few rows but some may find other spots to their liking but that’s subjective.

Ambiophonics is intended to recreate realism with the existing recordings. That means any format where stereo is part of them.

In a good concert hall the RT would be around 2s. To achieve the even and sustained decay the design is complex and the final results may not be what one intended.

With artificial ambiance, the user decides what RT60 is the best for the recording. We need not be concerned how the long RT would affect the inteIligibity as we are in control of the level of the reverberation. That is something not possible in a real hall because the RT is determined by the size and material used . I have a choice from choosing 0.3s to 3s and even more like the Meyer system. Even now I cannot utilize the full ambiance as the i9-9900K CPU already touching 82 %.

But mere words not going to convince audiophiles or even audio engineers who seemed to overlook the importance of psychoacoustics for the sound to be perceived as real. This is the cheapest and better solution for existing audiophile system.

Of course you won’t find a review of it in the Stereophile because there is no product to sell and it is hard to convince the readers that Amplifier and cables would not make 3D sound as good as IACTX . The institute is open to public.

It gives you a better sound than multi channel system. For an example a violin can be made to sound like this with my 30 ambiance speakers.

Just because I could do them doesn’t mean I am without a choice. From the numerous visitors I had, I can safely say that the right RT is very much to individual taste and there seemed to be no exact number. Some like the classical with 2.2s RT and some with 1.7s but not one listener preferred the natural RT of the room compared to the generated ambiance.

dwkdnvr · Feb 18, 2020

xr100 said:
IME of "generic" synthetic HRTF-based processing is that it is very difficult to position sounds frontally (as in e.g. "centre channel" location,) and generally difficult to create much sense of distance. For example, sounds can be heard from behind--but it's as if someone's "breathing down your neck" (complete with "tingling" sensations!)

IIRC, this is exactly what has been found in (some of) the literature.

Another aspect is that the localisation sounds spatially "warped," although with subsequent experiences (after listening to synthetic "binaural" material on headphones) of the "real world," it also becomes (more) obvious that our aural perception is spatially warped and error-prone, anyway.

Creative Labs' "Super X-Fi" system works by generating custom HRTFs based the on analysis of photos (of your ears.) I have not tried it...

There is also Impulcifier https://github.com/jaakkopasanen/Impulcifer which is a diy/open source way to do the same thing. Based on feedback on the Head-Fi thread linked from the github site, using the Sound Professionals binaural mics seems to generate good results.

I'm intrigued by this, but haven't yet tried it. IMHO the real 'killer app' for this isn't to measure an existing 5.1/7.1 system and just play back standard multichannel soundtracks, but instead realize the large-scale ambiophonic type of system that STC describes over headphones rather than needing a huge and complicated speaker array. I think a 'simple' ambio dipole can be created by just measuring the IR of a single center channel in both ears, and then using those IRs directly as left/right filters in normal stereo playback. This is going to be my first attempt. If that works, then looking into adding additional ambience channels would be phase 2.

The main barrier to this simple approach is the lack of head tracking; I believe the Smyth Realizer and BACCH4Mac which are similar personal-hrtf systems do include head-tracking which suggests that it's probably rather important, but this ideally requires support from the convolution engine - you can probably do it in a mixer stage that accepts a large number of individual convolution outputs, but that would probably be computationally more expensive.

dwkdnvr · Feb 18, 2020

STC said:
my 30 ambiance speakers.

Do you have an overview of your system and software? 30 channels is a serious investment, and if you're maxing an i9 that's pretty mind-boggling. Or are you basically using the system described on the ambiophonics page - the LaScala impluses (only 24 channels shown) with Voxengo? What bandwidth is needed in the ambience channels? My limited experiments with ambio were mind-blowing on the right material, but my sense is that on things like Cowboy Junkies Trinity Sessions low-frequency ambience is actually important meaning you can't just use little satellite speakers.

As I indicated above, my mad-scientist idea is to try to do this over headphones by recording appropriate HRIR responses. this would require physical speakers for measurement but not as a permanent set-up. It might ultimately not work without head-tracking, but I figure it's an interesting avenue of investigation.

ajawamnet · Feb 18, 2020

STC said:
Stereo is anti social. The best sound is always at the sweet spot meaning it is confined to one person. In a concert hall, the best sound usually confined to first few rows but some may find other spots to their liking but that’s subjective.

Ambiophonics is intended to recreate realism with the existing recordings. That means any format where stereo is part of them.

In a good concert hall the RT would be around 2s. To achieve the even and sustained decay the design is complex and the final results may not be what one intended.

With artificial ambiance, the user decides what RT60 is the best for the recording. We need not be concerned how the long RT would affect the inteIligibity as we are in control of the level of the reverberation. That is something not possible in a real hall because the RT is determined by the size and material used . I have a choice from choosing 0.3s to 3s and even more like the Meyer system. Even now I cannot utilize the full ambiance as the i9-9900K CPU already touching 82 %.

But mere words not going to convince audiophiles or even audio engineers who seemed to overlook the importance of psychoacoustics for the sound to be perceived as real. This is the cheapest and better solution for existing audiophile system.

Of course you won’t find a review of it in the Stereophile because there is no product to sell and it is hard to convince the readers that Amplifier and cables would not make 3D sound as good as IACTX . The institute is open to public.

It gives you a better sound than multi channel system. For an example a violin can be made to sound like this with my 30 ambiance speakers.

Just because I could do them doesn’t mean I am without a choice. From the numerous visitors I had, I can safely say that the right RT is very much to individual taste and there seemed to be no exact number. Some like the classical with 2.2s RT and some with 1.7s but not one listener preferred the natural RT of the room compared to the generated ambiance.

I recall seeing that and the Lexicon thing years ago.

This is interesting - similar to what I experienced - from:
https://forums.prosoundweb.com/index.php/topic,156030.0.html?PHPSESSID=3iunvg8v9qomorr3ls2vo2uca3

"I have been involved in installation of several of these type systems and used to have the only portable VRAS (which is what the Constellation was called before Meyer bought it and put a new name on it) that we used to take around for demos.

For any of these type systems (there are several manufacturers who have this type of system-and each has advantages and disadvantages), the room has to be DEAD-NOT LIVE.

You CANNOT take away existing reflections (reverb).

You can only ADD IT.

Yes these type of systems can help congregational singing when the room is dead.

But putting them in a live room is a TOTAL waste of money.

All they can do is make a bad situation worse.

They REALLY need to be aware of this before making a large financial mistake. "

Further on - interesting concept:

"Constellation is used for a different purpose in the restaurants - there, it is essentially a fancy sound masking system. The idea is that if you make the restaurant totally dead acoustically it will be too quiet and a table will overhear conversation from the next table over. No acoustic treatment and the restaurant will get too loud very fast. Constellation allows you to start with a totally dead room and play around getting the exact amount of reverberation you want to make sound from the next table over unintelligible while setting a limit on how loud the space gets. "

One of the big things in recording studio control room design was LEDE - Live end, dead end. The front with the far field speaker doghouses was deadened with something like Sonex, the rear of the room was treated using RPG Diffusors.

RPG Diffusors:

Quadratic: http://www.rpgeurope.com/products/product/modffusor.html

The also make various others like the Hemiffussor - you'll see this on one of the late night talk shows
https://www.bhphotovideo.com/c/prod...r_Systems_HEMIP_2_Hemiffusor_W1_Diffusor.html

RPG_Diffusor_Systems_HEMIP_2_Hemiffusor_W1_Diffusor_484929.jpg

After spending a lot of time doing sound with the Pittsburgh Symphony Orchestra outdoors for their Point State Park gigs, there's nothing like hearing them in a place like Heinz Hall or Carnegie - even with artificial reverb systems (we tried - had telephone poles in the park for them) Sucked real bad. Even the recording done using the various mic trees and such... ehhh.

The acoustical power output of 102 people playing a fortississimo on something like Copeland's Rodeo is not within reach of any system I've ever heard. And what's real interesting is that on stage it sucks too. All you hear is brass and percussion, depending on what section you're standing in. Nothing like what you hear out in the hall...

xr100 · Feb 18, 2020

ajawamnet said:
In a real hall, as you walk around, the sound from stage changes. […]

Me thinks that the way the sound interacts inside of a place like that hall cannot be replicated to a moving listener even with a zillion channel system of drivers in a relatively small room.

Early reflections can certainly be changed based on relative position in an algorithmic reverb. Why would it need to be exactly replicated to be identical to a "real" space?

ajawamnet said:
Same with a concert using a PA system, tho that's be a bit easier since the original acoustic environs would also be thru an x-channel system of drivers/cabs. But even then, the instantaneous amount of cubic volume of air that something like a large flown, in-line array with multiple 18" subs or servodrives can move would dwarf any home system.

What counts is the SPL at the listening position...

ajawamnet said:
For example, go get a DVD and play it in the best "pinky-pinky" rich person's home theater you can find, then go see/listen to the same flick in a real theater.

I've been to flagship cinemas, e.g. full-blown Atmos system with 5x tri-amplified JBL ScreenArrays, 16x JBL 4645B (2242H 18" drivers) for the screen speakers/front LFE; IIRC the rears/overheads are JBL 9320's and JBL SCS 12's. The speakers are driven by almost 90kW of Crown amplification fed by dbx digital loudspeaker management units. Yes, the impact is amazing. However, there's nothing that cannot be replicated at home (well, other than being restricted to the consumer versions of Atmos)--EXCEPT for the scale (unless you're fortunate enough to be able to install a ~70ft. wide screen at home) and the acoustics of what is no longer a "small room." (Schroder frequency etc.)

(I'm not even sure the above system can actually hit peak reference levels, particularly down to the lowest frequencies. Room volume is probably far above 100,000cu.ft. Limiters would be set in the dbx units to stop overdriving, but it's certainly LOUD.)

And, most auditoria are not fortunate enough to have systems installed that meet the above specs, let alone properly calibrated/tuned like that one is (that cinema is used for premières etc.)

But yes, I definitely agree, an underspecified home system will of course not provide the full "impact" possible.

STC · Feb 18, 2020

ajawamnet said:
But it would only be in the sweet spot of the speakers.
and BACCH4Mac which are similar personal-hrtf systems do include head-tracking which suggests that it's probably rather important, but this ideally requires support from the convolution engine - you can probably do it in a mixer stage that accepts a large number of individual convolution outputs, but that would probably be computationally more expensive.

BACCH is XTC of loudspeakers playback. BACCH developer is one of founder of Ambiophonics Institute. The BACCH is technically better than Ambiophonics because the crosstalk attenuation is said to be around 20dB. However, for concert hall music you don't need that high cancellation as you do not have musicians whispering at your ears. But even with BACCH, the domestic concert hall surround is still used to create the reverbs which will be missing. BACCH is more suitable for extreme XTC so that you can hear the bees right at your ears. Ambiophonics also can produce that effect but this a free option vs $55000 full fledge BACCH just for the crosstalk cancellation. The cheaper version is around $2000. I think WEISS too got a product with BACCH.

How do we perceive “soundstage” and “imaging”?

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Active Member

Active Member

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Active Member

Addicted to Fun and Learning

Active Member

Active Member

Active Member

Senior Member

Senior Member

Active Member

Addicted to Fun and Learning

Active Member

Similar threads