• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

A theoretical model for stereo imaging

I'm sitting here listening to my dipoles... Setup is symmetrical along the long wall. Distance between speakers is less than to listener. "Wall" behind the speakers is windows partly diffused by curtains and plants. Stereo image is always between the speakers (except Pink Floyd :cool: ) and virtual mono is coherent, but spaciousness is very pleasant.

I have heard many dipoles in more damped rooms, but the magic doesn't happen. I suppose Linkwitz's recommended setup being suboptimal. First arrival wavefront is dominant for freq balance and imaging (including first refl. nulls), late arrival makes spaciousness of some kind. Omnipoles or Eickmeier IMP like speakers send more energy backwards and sideways so they create wider more diffuse imaging and are perhaps more sensitive to placement and wall material (I have omnis at my summer cabin but have not tested them at home). The "new" theoretical model of Eickmeier from 1989 is outdated, we know much more about sound perception nowdays (read Griesinger, Lokki et al.)

View attachment 483718 View attachment 483714

View attachment 483715View attachment 483719
So what would your angles be for direct and reflected sound in the room?
 
I have never tried to measure angles and pathlengths. Speakers are rotated about 30deg to middle. Speaker separation is about 2,5m, distance about 3m. Room width is 6-7m.

If I wanted wider stereo image, I would increase speaker separation, but that would not be practical or nice looking. At home I have two other setups which have normal monopole speakers, but living room is best. Spectral balance is good everywhere, also in the kitchen...

Ps. I hate headphones, but have/had many good ones
 
Last edited:
I do not see the contradiction. "Plain ole horizontal reflections" are exactly what I would call "reflection theory".

I have no idea what a virtual reflection is supposed to be in contrast to a "non-virtual reflection".
There is just a correspondence of "reflections" off a flat wall and "virtual sources" at the symmetry location.
Sorry for my unclear imprecise terminology. As I think though how to model reflections, I see your point about "reflection theory".
 
There is an awful lot of confusion here. And this goes on for another 4 pages? Haven't read those yet, but let me s'plain. I am basically an industrial designer. I didn't work much in that field because they wanted me to participate in the Viet Nam war. So I had a career as an Air Force navigator and instructor in navigation. I got interested in audio early on as a teen but later on in my career I got really sucked in because of the Bose 901. I found that the manual was wrong on speaker positioning, which caused all of the confusion and erroneous opinions from CU and the dealers. So I wrote to Bose and we talked for over an hour on the phone from the factory to my home in England at the time. Since then I visited the factory, had a chat with Dr. Bose, and Joe Veranth, the chief engineer, told me about a technique for drawing the reflected sound from architectural acoustics called the image model, where you draw it as additional sources on the other side of the walls rather than vectors bouncing around. That was so much more visual a method and illustrated what was going on with the reflected sound it started a whole new chapter of my life. How stereo imaging really works and how it can help all audio people if we just look at the basic theory of what it is we are hearing rather than the 2 ears/ 2 speakers confusion ever since Blumlein.
OK enough yipyap. If you are truly interested in this, here is me on video from a BAS presentation. It is kind of long and has interruptions but the little animations are helpful in getting the concepts involved. This is material that you will never find anywhere else - very basic and very conceptual, but important in understanding the difference between field type systems and (binaural) head related systems. Those who make it through please comment or question me about it:


Now realize that I am not an audio engineer and do not have access to Harman International or Bose or Bang & Olafson to prove all of this systematically and with armies of college students in listening groups, so I wrote the paper to get those who are interested to do the work. I built a few speakers but they didn't all hit anyone over the head so I had Dan Neubecker build a pair for me. They are pretty damn good but I have had more ideas since. I am going to attempt to suggest ways that big companies can construct more elaborate, repeatable proofs of concept but who knows what will happen. I am 82 now and getting a little tired of this. Proving stuff like this requires both listening and contruction in larger spaces - the effects may be getting more subtle now with the availability of the open baffle and line source dipoles, tho folks still don't understand why those designs sound so good or how to position them.

Gary
 
I read the original 1989 AES paper and this 2025 article. Before reading, I was also familiar with image modelling and ray tracing techniques from architectural acoustics.

@geickmei I think these are interesting and pull together the right sources, but I don't think the theory works for three reasons.
  1. Image modelling only applies to the statistical region of small rooms, above transition and modal. In the transition region, what we hear is the result nonminimum phase response of wavelengths that approach the dimensions of the room, while the modal region comprises a spatial map of minima and maxima due to wavelength being much larger than the room dimensions. From an image perspective, it is as if the images in the transition and modal regions are partial, broken and forced both inside and outside the room.
  2. True radiation patterns of physical instruments are incredibly complex, while synthesized sounds are a-spatial... in the sense that they are typically not modelled with a radiation pattern in mind. Digital and hardware reverberation in a studio context makes no assumptions about radiation patterns and are used ubiquitously along with other processing tools, not to mention the how panning has been traditionally used. Normal microphones record only the pressure (or velocity, or a combination of both) at a specific point in a live stage, but they do not act as spatial samples. In fact spatial sampling of a physical soundfield requires 2*f samples per wavelength, the same as digital audio requires 2*f samples to capture the highest frequency. I'm sure for space this number can be rigorously reduced using a psychoacoustically-correct mathematical model, like we have seen successful in audio compression algorithms.
  3. The requirements to have significant space behind and around the speakers (outside of creating trouble with SBIR) is highly impractical, either at home, a studio or most performance venues.
As a sort of related side note, there have been past attempts to use speakers in place of instruments in an orchestra. Two come to mind, the most famous being the Acousmonium by Francois Bayle of the Group de Recherches Musicales in France (I believe the stage was called a performance or projection area for sound) and other by Tapio Lokki in Aalto University in Finland. Both were for research, the latter hard science, while the former concerned composition and humanistic study.

The key undeveloped aspect of the texts that inform stereo theory, for me, is why stereophonic sound is so perceptually satisfying when circumstances are right despite presenting so many flaws under investigation. Stereo's requirements are also fairly low and resilient compared to more the complex multichannel, binaural or immersive audio. One of the consequences of having no perceptual model is that we don't have clear guides on assessing or optimizing rooms or speakers. Lots of really good advice, but filled with divergences and contradictions (stereo vs. mono bass, narrow vs. wide vs. multipolar speaker radiation, absorptive vs. reflective room design).
Curvature - You have succeeded in making this whole trip so confusing we will never come out of the other side. Yet within your ride the loudspeaker orchestra is the most illustrative way of explaining it that I have come up with. Check out my Mars paper:


Gary
 
Curvature - You have succeeded in making this whole trip so confusing we will never come out of the other side. Yet within your ride the loudspeaker orchestra is the most illustrative way of explaining it that I have come up with. Check out my Mars paper:


Gary
Best compliment I've gotten in a long time.

I'll read that paper.
 
This is a really good point. IMT can be adjusted to have slightly different explanatory principles below the transition region. That's one objection down.

On my other two objections, my final comments about the lack of a perceptual model concern something I've been thinking about for a long time:
  • Recordings can sound really good, even though they are compromised in a spatial sense.
  • Speakers can sound really good, even when suboptimally positioned or designed.
IMT doesn't have enough explanatory power to address either point, I think.

Somewhat separately, I'd like to refer to architectural acoustics, where this modelling method used.

View attachment 483195
There are applications (for reverberation modelling) where a 3D model is used, and what's seen is a multiplication of images that accounts for later and later reflections. The initial model ends up exploding in complexity when you start drawing mirror images of virtual sources as well.

View attachment 483193

View attachment 483198

I think IMT acknowledges this complexity in the articles implicitly, but certainly doesn't dive into it. There is probably something here worth thinking about. Not sure exactly what at the moment.
 
I don't think speakers designed for use in an Image Model Theory set-up are currently on the market, so at this point it there would be some DIY involved.

And you bring up a good point. I would expect some EQ of the deliberately reflective energy to be beneficial, as the walls may not produce "perfectly flat frequency" reflections. And in some rooms the walls may simply be too absorptive, or non-existent on one side, or whatever.

As for how much leeway there is in the L R symmetery, I don't know.
Duke I think you understand that IMT requires a whole new look at speaker design. It should be clear that we will be equalizing for the power response in the room, taking into account the whole room response rather than the axial response alone. We will also design controls for the D/R ratio to match the imaging to the room. I wrote all of that, but it is all getting so unnecessarily confusing that everyone can't hold all of the principles in mind at one time.

Another important point, on the question about mirror images of the mirror images going into infinity, I did an analysis of how many times specular reflections can go on and found that by the time they get to the 3rd reflection they are so distant and diffuse they are nonexistent. We are concerned with only the earliest reflections in imaging.
 
View attachment 483195
There are applications (for reverberation modelling) where a 3D model is used, and what's seen is a multiplication of images that accounts for later and later reflections. The initial model ends up exploding in complexity when you start drawing mirror images of virtual sources as well.

View attachment 483193

View attachment 483198

I think IMT acknowledges this complexity in the articles implicitly, but certainly doesn't dive into it. There is probably something here worth thinking about. Not sure exactly what at the moment.

This is pure silliness. Sound does not reflect that way. Maybe in a reverberation chamber but not in reality. This is like those silly vector drawings with arrows going everywhere.

By the time you get to the 3rd reflection the sound would have gone far enough and been diffused and absorbed so much that it would be gone. These fellows who make these drawings don't seem to know about reverb time, in which the sound is gone by about 3 or 400 mS. Probably halfway through that time (200mS) it would be diminished enough to have no influence on audibility - especially of imaging. Think about it.

Gary
 
With such questions about the reproduction I always look to the live sound fields for the answer. What are all of the acoustic qualities of the live sound that we can refer to for the answer. CERTAINLY live sound does not come from two points in space spouting out a narrowly dispersed stream of direct sound toward your ears. That would be absurd, but here we are in 2025 thinking just that.

There are obviously different approaches to reproducing music. The approach you seem to be fully committed to is the one where your listening room acoustics play a big part in the overall sound reaching the listener's ears, as if the music event takes place in your listening room. This is the "they are here" approach, as in the musicians are playing in your room, rather than you, as the listener is "transported" to the acoustic space of the venue where the recording took place, which is the "you are there" approach of optimizing the sound. (Some people seem to mix up the "they are here" and the "you are there" approaches).

In the "you are there" approach, it's not the job of the loudspeakers to “mimic” the sound dispersion characteristics of the live sound field (or the despersion pattern of different instruments, for that matter). That is up to the recording engineers to do their best to capture that live sound field as completely as possible, using a suitable placement of several microphones with suitable polar pattern characteristics.
When the recording part of capturing the live sound field is achieved, most of that live sound field, including all the directional information of that live sound field, will come through with the direct sound, granted that the direct sound is the dominating sound reaching the listener's ears.

What you seem to be missing with the latter described method of "you are there" (with a dominating direct sound) is that the listener will hear "into" the stereo phantom image, which contains all kinds of directional cues of the live sound field. In other words, the listener will not just hear "live sound coming from only two points in space" as you describe it, as the phantom image will contain all sorts of directional cues appearing to be coming from different angles within that stereo phantom field, and many times even appear to be coming from points way outside the position of the loudspeakers.
 
There are obviously different approaches to reproducing music. The approach you seem to be fully committed to is the one where your listening room acoustics play a big part in the overall sound reaching the listener's ears, as if the music event takes place in your listening room. This is the "they are here" approach, as in the musicians are playing in your room, rather than you, as the listener is "transported" to the acoustic space of the venue where the recording took place, which is the "you are there" approach of optimizing the sound. (Some people seem to mix up the "they are here" and the "you are there" approaches).

In the "you are there" approach, it's not the job of the loudspeakers to “mimic” the sound dispersion characteristics of the live sound field (or the despersion pattern of different instruments, for that matter). That is up to the recording engineers to do their best to capture that live sound field as completely as possible, using a suitable placement of several microphones with suitable polar pattern characteristics.
When the recording part of capturing the live sound field is achieved, most of that live sound field, including all the directional information of that live sound field, will come through with the direct sound, granted that the direct sound is the dominating sound reaching the listener's ears.

What you seem to be missing with the latter described method of "you are there" (with a dominating direct sound) is that the listener will hear "into" the stereo phantom image, which contains all kinds of directional cues of the live sound field. In other words, the listener will not just hear "live sound coming from only two points in space" as you describe it, as the phantom image will contain all sorts of directional cues appearing to be coming from different angles within that stereo phantom field, and many times even appear to be coming from points way outside the position of the loudspeakers.
Goat, if you would please read my article you could see how I cover this duality of methods or systems. Jens Blauert states it very precisely as field type vs head related, reconstructing a "synthetic sound field" modeled after the real thing vs binaural. Stereo played without cancelling the crosstalk can be very good but the sound is limited to between the speakers. I think what you are thinking of is loudspeaker binaural, which is not stereo. The larger discussion is in my article.

Gary
 
By the time you get to the 3rd reflection the sound would have gone far enough and been diffused and absorbed so much that it would be gone. These fellows who make these drawings don't seem to know about reverb time, in which the sound is gone by about 3 or 400 mS. Probably halfway through that time (200mS) it would be diminished enough to have no influence on audibility - especially of imaging. Think about it.

Seems to me there will be quite a few audible reflections within that 200 milliseconds you mention.

My understanding is that the mean free reflection path length can be calculated by this formula:

Mean free reflection path length = 4x(room volume)/(room interior surface area)

For example if the room is 20 feet by 15 feet by 8 feet, we get 4x2400/1160 = about 8.3 feet for the mean free reflection path length. This figure would be reduced if we took the surface area of large objects in the room into account.

Sound travels about 225 feet in 200 milliseconds, and 225/8.3 = 27 reflections within the first 200 milliseconds.

I agree that only the first few will have a significant effect on image location, but apparently the remaining twenty-something will still be audible, and so presumably will be contributing something, perhaps influencing spaciousness and/or timbre.

Am I missing something?
 
Seems to me there will be quite a few audible reflections within that 200 milliseconds you mention.

My understanding is that the mean free reflection path length can be calculated by this formula:

Mean free reflection path length = 4x(room volume)/(room interior surface area)

For example if the room is 20 feet by 15 feet by 8 feet, we get 4x2400/1160 = about 8.3 feet for the mean free reflection path length. This figure would be reduced if we took the surface area of large objects in the room into account.

Sound travels about 225 feet in 200 milliseconds, and 225/8.3 = 27 reflections within the first 200 milliseconds.

I agree that only the first few will have a significant effect on image location, but apparently the remaining twenty-something will still be audible, and so presumably will be contributing something, perhaps influencing spaciousness and/or timbre.

Am I missing something?
Hi Duke -

Not a bad analysis but what do you think it shows us? I have been drawing pictures of such an analysis - just the horizontal plane ones, the axial and tangential:

1763420052939.png


This is the image model of the 3rd reflections in a reverberation chamber, showing the first, second, third, and fourth axial and tangential ones and how far away they get to be, let alone how diffused, absorbed, and fallen off they would be in that time in a normal room. The image model technique very clearly shows how many bounces, what directions, and how far travelled. For example, measure the distance from any blue dot to the listener with a dividers and you get travel time, count the number of walls the blue line goes through and that is the number of bounces. Lay the distance travelled against a lapse rate table for sound travel through air and see approx how much it loses even in a reverb chamber.

We find that the 1st reflections in this particular size room (20 x 35 ft) travel 20 to 30 ft, 2nd reflections 37 to 60 ft, and 3rd reflections 50 to 90 ft.

For the gain lapse rate if we assume a start loudness of 60 dB after bouncing 10 ft off a 5 ft away front wall, then after the first reflection they are between 43 and 40 dB, the 2nd reflection is 38 to 34 dB, and our 3rd reflection is between 36 and 31 dB.

By the 4th reflection it has bounced so many times there is nothing left of it to measure - especially in a normal room!

Now consider that for EACH face of each loudspeaker there are 4 first reflections, 8 second reflections, 10 third reflections, 14 4th reflections... and that is just axial and tangential, not considering the oblique because we are concerned mainly with the horizontal.

What all of this means to me is that in a normal room the main reflections that we can hear distinctly that affect imaging are the first and the corner secondaries from the front of the room. That is confirmed by listening. The soundstage expands in depth and width with an ample supply of first and second reflections from the rear of the radiation pattern. You have experienced this with dipoles and open baffle, even if you were never a 901 owner.

I learned the hard way that we also don't want a strong sidewall reflection with two channel, because it pulls the whole soundstage to one side as you go off center.

This has been IMT 101, the beginning of how we should be filling a room with direct and reflected sound to build a soundstage with the direct and reflected sounds that were recorded. The image modeling technique shows us visually and listening confirms the spatial nature of sound in rooms. We MUST LOSE the binaural confusion of sending two direct sound channels to the ears, thinking that it will form the image psychoacoustically in the "earbrain."

Gary
 
This is pure silliness. Sound does not reflect that way. Maybe in a reverberation chamber but not in reality. This is like those silly vector drawings with arrows going everywhere.
What do you mean? These models are used in architectural acoustics to calculate the early reflections and reverberant tail. They get far more complicated and, what's more, they work.

It's the complications that matter, and get closest to "reality", and I don't think you want to acknowledge them in a serious way.
 
We MUST LOSE the binaural confusion of sending two direct sound channels to the ears, thinking that it will form the image psychoacoustically in the "earbrain."
Why? I can see that IMT could be beneficial for considering how speaker placement, directionality and treatment can be optimized to create spaciousness, but you are still using stereo to create localization. IMT doesn’t replicate the sound field the original space, in the relatively rare cases it existed at all.
 
Not a bad analysis but what do you think it shows us?

I think it shows us that the in-room reflection energy remains audible (and, therefore, arguably, of perceptual significance) for a lot longer than those first few reflections.

We find that the 1st reflections in this particular size room (20 x 35 ft) travel 20 to 30 ft, 2nd reflections 37 to 60 ft, and 3rd reflections 50 to 90 ft.

That's a pretty big room so I think it would be fair to assume an RT60 of .4 seconds, or 400 milliseconds. During that time, the sound waves will have travelled about 450 feet.

Your description of the first three reflections indicates a total path length of 50 to 90 feet, and I'm not going to do the analysis but that fourth reflection's total path length might be somewhere around 80 to 120 feet. 120 feet would take about 107 milliseconds. If RT60 is a good metric, the implication is that we still have another roughly 300 milliseconds to go until the reflections fade into insignificance.

That being said, I do concede that RT60 is a questionable metric for small rooms because (if I understand correctly) the reflection field is not truly diffuse, but I think it tells us something useful about how long it takes the net in-room reflection energy to decay into inaudibility.

For the gain lapse rate if we assume a start loudness of 60 dB after bouncing 10 ft off a 5 ft away front wall, then after the first reflection they are between 43 and 40 dB, the 2nd reflection is 38 to 34 dB, and our 3rd reflection is between 36 and 31 dB.

I haven't checked but will assume that's true for discrete reflection paths. But as time goes on the number of discrete reflection paths is also increasing, so the net in-room decay is not as rapid as it woud appear from focusing on decay-with-distance for a single reflection path.

By the 4th reflection it has bounced so many times there is nothing left of it to measure - especially in a normal room!

Perhaps for that individual reflection path, but the energy in that first reflection continued to spread and was distributed between multiple reflection paths at the second bounce, and so on with each successive bounce. So if we want to see what happened to the energy in the first bounce, imo we need to include the energy in the succeeding generations of bounces.

If I have misunderstood you, please correct.

JUST TO BE CLEAR: My skepticism about how quickly the in-room reflection energy ceases to be of audible significance does not affect my admiration for the innovative exploitation of phantom sources in your approach to loudspeaker/room interaction.
 
Duke, I am sure you realize that the above model was in a reverberation chamber, or else it would be nonsense. Normal listening rooms have enough absorption to suck the reverb time down to your figure of less than 400 mS. My drawing was also of the horizontal plane only, but in reality there are many more reflections contributing to the reverb time - the oblique. These reflections need to be managed to make patterns within the small listening room that are similar to those in the original. This is IMT - comparing the repro patterns and ratios to those of a typical original performance space. THAT is what we will be studying when we get rid of the binaural confusion theory with its strong direct field from two points in front of you in a "stereo triangle" that thinks it is a pair of giant headphones.

Gary
 
That being said, I do concede that RT60 is a questionable metric for small rooms because (if I understand correctly) the reflection field is not truly diffuse, but I think it tells us something useful about how long it takes the net in-room reflection energy to decay into inaudibility.

I wondered about that, so I decided to test it. Answer: IMO reverberant fields do exist, but only above a certain frequency.
 
Back
Top Bottom