• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Stereo Crosstalk Elimination (reduction) Par Excellence!

Oh, so you just modified your Mac Mini and are playing through 3 speakers -R; L+R; -L

Nice! Do you have a 4th speaker and amp channel? If so, you could copy it completely. -R L; very wide space in between; then R -L

Where the inverted channels are on the outside. Also, you would take the inverted channels and frequency limit them (my guess is about 400Hz to 5000Hz).

I think a 4 speaker set would provide much more flexibility than the Polks because the speakers could be time aligned to not require being so close together and allow some “toe-ing”
I'm using 5 speakers, and 7 amp channels. The three center speakers are limited to 900 Hz and up. The corner horns are spaced widely and play 900Hz and lower in standard stereo. They have midbass horns playing from 200 to 900, and bass horns playing below 200 Hz. Testing the imaging using panning I've found I've had to put a considerable delay on the center, equivalent to about 1.5" extra distance to get the panning to move hard left and right without stray sounds coming from the wrong direction. The problem is, I like the tone a little better when the imaging isn't as clear. I'll have to keep playing around with it. Maybe I can EQ it back. There's something definitely more atmospheric about this sound. My normal setup with L-R and R-L in the side channels makes the center very clear and hard, which is what I wanted. This makes it a litttle more dreamy, spacey, while still being very clearly located. It is sensitive to head position, like before. If you're a little off axis the center image can move around. If you're far enough off axis it snaps back to the center.
 
I'm using 5 speakers, and 7 amp channels. The three center speakers are limited to 900 Hz and up. The corner horns are spaced widely and play 900Hz and lower in standard stereo. They have midbass horns playing from 200 to 900, and bass horns playing below 200 Hz. Testing the imaging using panning I've found I've had to put a considerable delay on the center, equivalent to about 1.5" extra distance to get the panning to move hard left and right without stray sounds coming from the wrong direction. The problem is, I like the tone a little better when the imaging isn't as clear. I'll have to keep playing around with it. Maybe I can EQ it back. There's something definitely more atmospheric about this sound. My normal setup with L-R and R-L in the side channels makes the center very clear and hard, which is what I wanted. This makes it a litttle more dreamy, spacey, while still being very clearly located. It is sensitive to head position, like before. If you're a little off axis the center image can move around. If you're far enough off axis it snaps back to the center.
Nice! What do you use 3 center channels? Or are some of the center channels those bass or midbass horns?

I think you posted a pic a few pages back. Is that still the setup?
 
Nice! What do you use 3 center channels? Or are some of the center channels those bass or midbass horns?

I think you posted a pic a few pages back. Is that still the setup?
The 3 center channels are BL409 horns with JBL2426H drivers. The tweeters are in the center. Everything else is in the room corners. I got the tone figured out. 5 sample delay @ 48kHz sample rate seems to work really well.
 
I just added a volume slider to control the amount of in-phase information on each side channel. -R, L+R, -L is too weak in the middle for most music. It's real spacey but a little too spacey. Center panned vocals sound a little lost in space. My preference varies depending on what I'm listening to, but I think something like 0.7L-R, L+R, 0.7R-L is really nice on just about everything. It may be because of my squished listening environment making the tweeters too close to me that I need to soften that center just slightly. It also takes a stridency out of massed strings that was bothering me. If I move my head to the left or right, the center image follows me a little but it mostly stays close to the center and sounds solid without sounding too forward. I'm going to call this an improvement! However, this only produces a significant effect if the delay is put on the center channel. Without the delay both mixes sound very close to the same.

A little more listening and I'm discovering that the effect can be dramatically different on different recordings. For instance, listening to The Smith's Half a Person the apparent imaging width is dramatically increased by turning down the in-phase info on the side channels with the time delay on. With Air's Cherry Blossom Girl I get a very wide and immersive effect with the in-phase info on the side channels turned up all the way and the center time delay turned off. The tone is solid and clean too, but if I adjust it like I did for The Smiths it actually degrades the tone and weakens the spacial effects.
 
Last edited:
I just added a volume slider to control the amount of in-phase information on each side channel. -R, L+R, -L is too weak in the middle for most music. It's real spacey but a little too spacey. Center panned vocals sound a little lost in space. My preference varies depending on what I'm listening to, but I think something like 0.7L-R, L+R, 0.7R-L is really nice on just about everything. It may be because of my squished listening environment making the tweeters too close to me that I need to soften that center just slightly. It also takes a stridency out of massed strings that was bothering me. If I move my head to the left or right, the center image follows me a little but it mostly stays close to the center and sounds solid without sounding too forward. I'm going to call this an improvement! However, this only produces a significant effect if the delay is put on the center channel. Without the delay both mixes sound very close to the same.

A little more listening and I'm discovering that the effect can be dramatically different on different recordings. For instance, listening to The Smith's Half a Person the apparent imaging width is dramatically increased by turning down the in-phase info on the side channels with the time delay on. With Air's Cherry Blossom Girl I get a very wide and immersive effect with the in-phase info on the side channels turned up all the way and the center time delay turned off. The tone is solid and clean too, but if I adjust it like I did for The Smiths it actually degrades the tone and weakens the spacial effects.
That’s how Polk has them. Approx 3dB per speaker lower for the inverse outer speakers.
 
That’s how Polk has them. Approx 3dB per speaker lower for the inverse outer speakers.
Good to know. I understand why. It gets kind of crazy at full volume. However, it can be compelling on some recordings. My last impression was that it takes the center 1/3 or so of the soundstage and stretches it across the room. So for recordings that don't have a lot of width it works like a stereo width expander.
 
I made an embarrasing discovery today. In the -R, L+R, -L mode I had the left and right channels swapped. I then added 5 samples of delay to the center to get the cancellation straigthened out and thus put the imaging back on the correct sides. That excess delay was causing all the coloration and distance for the phantom center. Now that I've got that straightened out, both modes sound just about the same, and no center delay is required, although the imaging sounds a little better at 1 sample delay.

I think the -R, L+R, -L might be slightly superior overall to my original porposition. It takes some forwardness off the phantom center, and the side imaging seems more focused. It's also a wider soundstage overall. It's main oddness is that if I move slightly off center the phantom center follows me for a couple feet before moving back into the middle, somewhat stretched. The phantom center is basically more sensitive to listening position. I'm running all signals at 100%. No need now for cutting back 50%.
 
Last edited:
I made an embarrasing discovery today. In the -R, L+R, -L mode I had the left and right channels swapped. I then added 5 samples of delay to the center to get the cancellation straigthened out and thus put the imaging back on the correct sides. That excess delay was causing all the coloration and distance for the phantom center. Now that I've got that straightened out, both modes sound just about the same, and no center delay is required, although the imaging sounds a little better at 1 sample delay.

I think the -R, L+R, -L might be slightly superior overall to my original porposition. It takes some forwardness off the phantom center, and the side imaging seems more focused. It's also a wider soundstage overall. It's main oddness is that if I move slightly off center the phantom center follows me for a couple feet before moving back into the middle, somewhat stretched. The phantom center is basically more sensitive to listening position. I'm running all signals at 100%. No need now for cutting back 50%.
Nice! It can be done totally analog then with a single signal combiner. The “sample” delay is simply moving the speaker back by an inch or two. Then a 3 channel amp. If gain control is needed. That could be done with an amp with a signal attenuator or an inline plug.
 
Nice! It can be done totally analog then with a single signal combiner. The “sample” delay is simply moving the speaker back by an inch or two. Then a 3 channel amp. If gain control is needed. That could be done with an amp with a signal attenuator or an inline plug.
Indeed! Just move the center speaker back. I'd do that except it's physically challenging at the moment. Yes, totally doable full analog.
 
Sorry if previously discussed (I skipped several pages)...

In the 3 speaker setup shouldn't the centreline of the outer speakers be E from the centreline of the centre speaker. I.e. the centrelines of the two outer speakers should be 2E apart. Where E is the total ear separation.

In the first page it was said that the outer speakers were one ear spacing apart which deosn't seem right for equal path lengths.

1000029108.jpg


Apologies if that is what was meant all along.
 
Sorry if previously discussed (I skipped several pages)...

In the 3 speaker setup shouldn't the centreline of the outer speakers be E from the centreline of the centre speaker. I.e. the centrelines of the two outer speakers should be 2E apart. Where E is the total ear separation.

In the first page it was said that the outer speakers were one ear spacing apart which deosn't seem right for equal path lengths.

View attachment 388283

Apologies if that is what was meant all along.
Thanks for the drawing. Yes! If I said the outer speakers were one ear spacing apart, that's wrong. They should be as you have drawn here. What's surprising is how well it continues to work when they are spaced further apart if you get far enough away from the array. If you get too far it collapses to mono. So with 1 foot center to center spacing between the drivers (2 foot from center left to center right) you typically get a pretty good result sitting about 8 feet back. It also works down to lower frequencies with the wider spacing. That's with the L-R , L+R, R-L array. With the -R, L+R, -L array I don't know yet how it differs. So far with my current spacing, which is wider than E in your dawing, I've found that a slight delay on the center seems to help this setup. So, it may be that this would work best if my horns weren't so big so I could get them closer together, as shown in your dawing.
 
I remembered your setting while doing a simple simulation that STC asked me to do by e-mail.
This is just a simulation and will definitely depend on the filters or settings that apply.

1726714982875.png


dummy head LL,LR (30dgree 229.2us ITD)

1726715019138.png


And highlighted response is 0 degree, mono center.
I don't remember your settings exactly, and I have no knowledge of XTC in this array way because I've never done it myself. I just thought of it, so please understand.

According to my intuition, the center and the front are aligned, and I think we should add a little delay then, because it's to cancel LR, not LL.
So I'm going to add a delay + reverse phase to the mono.

In reality, if you shoot a pure signal and an inverse phase signal and record it, you can simply check it out, but it gets a little complicated when both ears are separated like this.
So I'm going to play tricks temporarily.

1726715216867.png

First of all, we have added a delay to the center.


1726715266786.png


And I'm going to put it together. (simultaneous playback)


1726715316928.png


And by subtracting the original LR and LL from the created sum response, each modified LL,LR can be estimated when the monocenter is added in the inverse phase by considering the time.


1726715373220.png


And here's the result.

1726715487353.png


IChange in ILD (when purple is the original 30-degree angle stereo, blue is the mono inverse phase added)
The big peak that regularly appears at 4k may have me mis-estimating ITD (should I not have delayed it? I don't know)
There seems to be a slight decrease in ILD.
And maybe it's because it's a simulation of putting the original signal as it is without any special filtering.
I didn't have any specific purpose and I remembered your setting, so I did it. Of course, I expect it to be more accurate if you measure it yourself with a microphone in your ear.
 
I remembered your setting while doing a simple simulation that STC asked me to do by e-mail.
This is just a simulation and will definitely depend on the filters or settings that apply.

View attachment 393234

dummy head LL,LR (30dgree 229.2us ITD)

View attachment 393235

And highlighted response is 0 degree, mono center.
I don't remember your settings exactly, and I have no knowledge of XTC in this array way because I've never done it myself. I just thought of it, so please understand.

According to my intuition, the center and the front are aligned, and I think we should add a little delay then, because it's to cancel LR, not LL.
So I'm going to add a delay + reverse phase to the mono.

In reality, if you shoot a pure signal and an inverse phase signal and record it, you can simply check it out, but it gets a little complicated when both ears are separated like this.
So I'm going to play tricks temporarily.

View attachment 393236
First of all, we have added a delay to the center.


View attachment 393237

And I'm going to put it together. (simultaneous playback)


View attachment 393238

And by subtracting the original LR and LL from the created sum response, each modified LL,LR can be estimated when the monocenter is added in the inverse phase by considering the time.


View attachment 393239

And here's the result.

View attachment 393240

IChange in ILD (when purple is the original 30-degree angle stereo, blue is the mono inverse phase added)
The big peak that regularly appears at 4k may have me mis-estimating ITD (should I not have delayed it? I don't know)
There seems to be a slight decrease in ILD.
And maybe it's because it's a simulation of putting the original signal as it is without any special filtering.
I didn't have any specific purpose and I remembered your setting, so I did it. Of course, I expect it to be more accurate if you measure it yourself with a microphone in your ear.
Thank you! I'll have to look this over more closely to make sure I understand what you've done, but those peaks on the blue line look familiar. The effect is a sort of shimmery brightness on things panned to the left or right. It's a complex mess! The center sounds good. I enjoyed it but after comparing again with a physical barrier I concluded it's still a good way off from that standard, and has some problems that regular 2 speaker stereo doesn't. My last effort involved just turning off the mix up there where those peaks start happening.
 
My last effort involved just turning off the mix up there where those peaks start happening.
=( o_O

but those peaks on the blue line look familiar.
Yes. I don't know exactly what that is triggered from because I don't have experience with arrays like I said.

however, I remember that because systems such as batch are expensive and somewhat more stable than DSP processing, papers often see the contents of multi-array XTC.
So I think it's probably a matter of my simulated setup.
 
Wondering if tweaking this can optimize cancellation?

 
Wondering if tweaking this can optimize cancellation?

I'm pretty sure that phase manipulation could improve the performance. Head tracking too, just like it makes other XTC methods work better.
 
I'm pretty sure that phase manipulation could improve the performance. Head tracking too, just like it makes other XTC methods work better.
I suppose because youre in the digital domain and not worried as much about latency (syncing to video) that you could do that with a free plugin instead of a $400 box
 
I suppose because youre in the digital domain and not worried as much about latency (syncing to video) that you could do that with a free plugin instead of a $400 box
I have the tools I need to experiment with doing phase adjustments. For now I can't because I don't have a 3 speaker array running. I've set up my system in a very wide standard stereo arrangement. It's got some benefits, and to my surprise the phantom center sounds better to me with very wide spacing than with a more conventional listening triangle, which to my ears seems to be the worst case situation. I think the 3 speaker system might be really good for some situations. My main room is awkard for using it, pushing me up against the back wall, and requiring an acoustic panel behind my head and a thick wool rug on the floor to really make it work well. However I think it might work well in my living room so I should try it out there. Problem is, I need another computer, and that has to be added to the signal chain. So it's inconvenient. For now I'm just using the TV audio out there, and for how that system is used it seems fully adequate.
 
Hello @Tim Link Happy new year.
I’m not sure if this is entirely relevant to this thread, but since it’s ultimately about crosstalk, I’ll keep it brief.

1736326536401.png


Setting up a physical barrier so that the left channel is heard only by the left ear and the right channel only by the right ear results in the opposite ear missing what it’s supposed to hear. It feels as though all opposite ear information, including head shadowing, is entirely muted.

On the other hand, using arrays or back-and-forth signal cross-repetition (both universal and personalized), the negative aspects are minimized while retaining what the right ear must hear from the left channel (preserving the speaker-like identity).

And I suddenly had the thought today—what if crosstalk cancellation were applied only to the opposite ear?
Yes, of course, this is impossible in a real-world setting (arrays, XTC-type software, etc.).
But as you know, I’m a BRIR user, and since I can manipulate each impulse individually, I wanted to test this out. So, I opened the files of another user I’ve been helping recently.
It might sound confusing, but I’ll try to keep it as brief as possible.

1736326006978.png


This is the response of a certain user. Naturally, the impulse includes ITD,
and you can see the left/right channel responses for the left and right ears. ILD is also present.

1736326054035.png


And this is the crosstalk, including head shadowing, derived from the right speaker's right ear and the left speaker's right ear /// the left speaker's left ear and the right speaker's left ear.

1736326348266.png


The yellow graph represents the original left speaker's right ear response, and the dark blue graph shows the response after cancellation.
I also checked how the combined left and right responses would look, as it would be heard in a real-world scenario.

1736326374621.png


Since the user's file is not from a 30-degree stereo setup, the shape of the dip is slightly different, but you can still see the typical dip caused by crosstalk, as Dr. Toole also mentioned.

1736326408819.png


The added red graph represents the response when the cancellation signal is applied accurately only to the opposite ear and then combined. The dip has disappeared.

This is something that’s impossible in reality, but what I did was precisely remove only the negative crosstalk from each opposite ear response.

I’d like to hear various perspectives on the results of this test.
Since this isn’t my file, I couldn’t accurately judge the sense of localization when listening, but with pink noise, I could clearly hear a difference in tonal balance. However, the overall image didn’t change significantly. When listening binaurally, it seemed to come closer to the ears, but this is something I’ll need to test later with my own file.

In my opinion, this demonstrates that accurately personalized crosstalk cancellation was performed with just a single cancellation step, without any unnecessary signals, and this is also reflected in the combined response.
However, the auditory impression of this feels different from typical signal crossovers or array-based crosstalk cancellation.
 
Last edited:
Hello @Tim Link Happy new year.
I’m not sure if this is entirely relevant to this thread, but since it’s ultimately about crosstalk, I’ll keep it brief.

View attachment 419530

Setting up a physical barrier so that the left channel is heard only by the left ear and the right channel only by the right ear results in the opposite ear missing what it’s supposed to hear. It feels as though all opposite ear information, including head shadowing, is entirely muted.
Yes, and this can make the sound stage abnormally wide on some recordings, especially if they are purely level panned. Hence my thinking that an idealized crosstalk cancellation scheme would heavily weight the crosstalk reduction to center panned sounds, and reduce it's power for sounds panned harder left or right.
On the other hand, using arrays or back-and-forth signal cross-repetition (both universal and personalized), the negative aspects are minimized while retaining what the right ear must hear from the left channel (preserving the speaker-like identity).

And I suddenly had the thought today—what if crosstalk cancellation were applied only to the opposite ear?
Sorry if I'm missing the point, but I though it was only applied to the opposite ear! It just has to be done over and over again because both ears pick up the correcting signal, unless a local wearable device was used right at each ear to correct the crosstalk. But in this case, if it worked perfectly, then still the opposite ear would hear nothing from that channel, so it'd just be simpler because the correction signal would not have to be recursive. I still think we have to find a way to reduce the effect as sounds are panned further left or right. I don't know how to do that other than to analyze the signal and break it down by panned direction, similar to what Dolby Pro-Logic up-mixing does. The best solution seems to be a very pure up-mixer of some kind that can turn 2 channels into a larger number of channels across the front of the room. 7 channels across the front I think would be great. Not surround sound, but instead a front sound stage only with many more actual sound emination points. Ideally it'd be enough channels all close together enought that up through the critical midrange at least we'd be able to recreate wavefront shapes from the recording venue in the listening venue. That'd be the closest thing to having an actual opening between your room and the original recording venue. But that would take a LOT of channels and speakers.

With BRIR type technology, it's possible to do it all with in ear monitors. But there'd need to be an actual map or model of the location of each instrument and other sound source, including reflecting walls, included with the recording. So there'd be probably hundreds of channels in the recording, along with the map. Software could then mix that down to two channels using your HRTF and combined with head tracking could reproduce the listening environment and even allow you to move around inside of it.

 
Last edited:
Back
Top Bottom