• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Dr. Edgar Choueiri explains BACCH

These days I finally managed to get my system back online.
As I said initially, as an audio enthusiast I am quite curious to hear the effect of an XTC on my system.
So I was about to download HAF X-Talk, which I think is still reasonably priced. However, I realized that the uBACCH demo does not require any more effort to set it up, I felt that it could actually be a better test for experimenting with an XTC, being an algorithm theoretically without tonal/dynamic penalties.
So I put it into the system, which fortunately in my case is already set up to insert VST plugins, and therefore I don't have to purposely introduce further complication to the system (which is relevant for many).
According to what the manufacturer says, my DALI speakers have wide dispersion and my listening environment is very reflective (RT60 which reaches 700ms on 3kHz), furthermore it is asymmetrical. I necessarily use a DRC (Dirac or Audiolense) to correct problems below Schroeder, while above I have to limit it because the correction reduces its effectiveness (in area) and reduces the soundstage, which in fact normally benefits from early reflections.
By inserting uBACCH in this configuration, adjusted with the actual angle of the speakers, at first glance I noticed a slightly different spatial effect that depended from song to song. I couldn't say exactly in what terms, but it wasn't particularly better.
At some point I reasoned that room reflections are taking place in the XTC band, so I decided to toe-in the speakers and try using Audiolense to correct up to 6kHz with a rather wide FDW, in order to improve direct/indirect sound ratio in the listening position.
Then I adjusted uBACCH with a limited pink noise signal on the band affected by XTC, in order to obtain the most lateral sound possible by ear (as suggested on their site).
The effect obtained in this way was decidedly different. I experienced a much more defined and "3D" soundstage. Really nicer than the system without XTC, at least in that point.
This confirmed to me the dependence on the impulse response for XTC to be effective.
Unfortunately I don't have a binaural microphone so I can't see what's happening...
There is a further consideration: The fact that the soundstage obtained with only DRC up to Schroeder is narrower than that with XTC and DRC beyond Schroeder, and at the same time wider than that with only DRC beyond Schroeder, could mean that the reflections normally present compensate for the negative effect of crosstalk in recreating the soundstage.
By virtue of this, to compare the results with a more significant criterion, I created two projects in Reaper (which allows you to switch them on the fly). One with DRC up to Schroeder and the other with DRC up to 6kHz plus XTC.
From this comparison I can confirm that XTC and extended DRC gave me the best impression of spatiality.
Now, there is no point in thinking again about what is more correct because we agreed there is no single answer to the question. It's more a matter of personal paradigm.
The fact remains that I had not previously considered the possibility of using XTC in a complementary way to a DRC beyond Schroeder. From this perspective, I recognize that the variability of the result is lower than that theoretically expected with XTC alone, and that in any case the result is subjectively better.
Almost to the point of reconsidering my opinion on the cost, in view of the fact that for this amount it is difficult to obtain a better subjective improvement.
It would be nice to have more concrete objective elements to support this consideration, especially to understand the coexistence relationship between xtc and reflection with wide dispersion speakers and highly reflective room.
Being passionate about audio at heart, I think I will continue to investigate.

UPDATE:
After days of subjective test I have the idea that uBACCH can be, in my specific case, an appropriate complement to the correction of the speakers in terms of spatiality. With DRC I have never been able to improve in this sense, probably for the benefit of the reflections, in fact I limit it to Schroeder and planned to treat room. uBACCH improves the soundstage in a clear and broad way in practically all cases (I listen mainstream music, presumably with artificial spatial cue), although it does not provide that 360-degree audio effect of headphones with binaural coding and it works on about 20 cm per side in my case (the biggest disadvantage for the price to me).
I also checked with measurement outside the sweet spot and there is no FR alteration with it, so no additional penalties.
As a correct spatial reference/target cannot be conceived in most cases, for that relatively limited point where the IR is more optimized by the DRC and uBACCH is effective, it seems to me the latter could represent an advantageous investment for the mid band compared to the treatment of the room (which is relatively uncertain and unsightly).
I would be curious to understand if the flat directivity index to have a spectrum of reflections consistent with direct sound is still an important thing or not, and if the presence of reflections mitigates the extreme panning where present in the audio track.
Unfortunately, with other XTC I just tried I couldn't get the same good soundstage... I suppose that this is due to the peculiar technology that avoid tonal penalties and works at more traditional speaker angles, plus the feature to adjust the center channel gain.
With these criterion I talked a bit with my wife and after a practical reflection we agreed that this could be a better and cheaper compromise than the treatment of the room I was thinking about (for her immense joy). Especially because we are renting and in my house I am the only one who cares about the audio at the moment, so saving money for a single sweet spot makes sense.
So, fresh from this epiphany, not denying substantial subjective improvements and seeing the current discount I decided, a little impulsively, to pull the trigger, contravening the most objectivist part of me.
Objectively I was skeptical for the price due to the various reasons said (IR dependence, artificial audio tracks, single person limitation), but this convenience and subjectively good result unexpectedly prevail, in fact this makes me feel comfortable enough.
I believe this experience shows that the value of the technology cannot be found entirely on a theoretical and general level. It can be practical and user specific, as well as requesting proper setup. And also it is not very immediate to realize, especially for people adapted to the most canonical audio science.
Therefore, debating at theoretical/general level, however correct and argued, it is not particularly useful, as already mentioned. But overcoming the objective barrier is not immediate when something has an additional relatively high cost. I'm a little sorry but I suppose the usefulness of this discussion is in this regard.
However, this makes me curious to experience the Audiophile version to understand for the price how far can it improve in the dynamic sweet spot, and to remain ok outside and/or for others.
ASR should be useful in this investigation, but given the results so far, I took someone's suggestion and I tried to ask the company a few questions about.
As an audio enthusiast and basically fascinated by this refined technology, I would be curious to try it on my system, especially given this experience with Ubacch.
I will try to understand if I can risk losing those $200 for the trial.
In the meantime, I want to try experimenting with narrower directivity speakers (Klipsch) to see what subjectively you can get from something cheap, supported by BACCH technology.
 
Last edited:
At some point I reasoned that room reflections are taking place in the XTC band, so I decided to try using Audiolense to correct up to 6kHz with a rather wide FDW, in order to make the impulse response cleaner, albeit in a limited area based on frequency.
Then I adjusted uBACCH with a limited pink noise signal on the band affected by XTC, in order to obtain the most lateral sound possible by ear (as suggested on their site).
The effect obtained in this way was decidedly different. I experienced a much more defined and "3D" soundstage. Really nicer than the system without XTC, at least at that point.
This confirmed to me the importance of the impulse response for XTC to be effective.
You are right, frequency range is roughly 200-6000hz, which means that the gating response is roughly 5ms.
A sub-5ms response means that the response should be as pure as possible (without any Negative comb filtering).
Some impulse compensation (with FIR filters or Dirac, DRC, etc.) can match the impulse somewhat, but obviously it won't restore the altered response.
So the more uncontrolled (and therefore more variable) such a hyperbasic is, the more unpredictable it can be in XTC, where impulses from both sides are mixed.
(e.g., it should be similar to the original tonal balance, but it's not, or the left and right stages feel different when they should be, etc.)
 
Hagen Quartett, Mozart K. 546, Fugue
https://open.qobuz.com/track/4254603

Here, the strings are extremely well focused and staged in their customary places both in width and depth. This is from a multi-CD set of all Mozart string quartets and contains recordings made at different times and different venues. The ones done at the Schloss Mondsee Festsaal, including this one, are the best ones in terms of pinpoint imaging. In this fugue the instruments enter one by one, which makes it easy to gauge their locations.
G R E A T !! the (aural) world doesn't have to be flat. And ORC here, at last in my setup, behaves particularly well (not always the case).
That's almost exactly how the real thing sounds in the front seats of a good, medium sized concert hall.
 
It only works if the recording is binaural.

[Edit]
To add some clarification to my response above a bit more, when I said "it only works" I mean it works as an accurate spatial reproduction of the sound. With non-binaural recordings, XTC will most often give a pleasant "enhancement" to the spatial feel of the reproduced sound over standard 2 channel stereo. But it is not going to be an accurate reproduction.

I agree about the more pleasant effect with normal (stereo) recordings, but I am not sure if I agree with the (not so) accurate reproduction due to Bacch. I've been listening to music in a few setups with mainly direct sound (in highly acoustically adapted rooms) without and with Bacch. I believe that in those rooms reproduction is quiet accurate, at least more accurate than in not treated rooms.

My room is acoustically treated (ceiling, walls, floor) but not that far-reaching as those rooms that I've been in, but I use Bacch for some time now. To my ears, the sound in my room comes closer (in terms of the 3D sound bubble) to the sound in highly acoustically adapted rooms now with Bacch. And yet, there is a difference. In those extreme treated rooms with direct sound the sound was more close to the headphone-effect. With Bacch it comes closer, but without the oppressive effect of a headphone. It's more open. I love the 3D bubble Bacch creates with normal recordings. It also enhances Q-sound effects in my setup.

I am not sure which situation (the headphone effect of mainly direct sound in higly damped rooms) or my situation (stereo with Bacch in a reasonable treated room) is more accurate, but there are more similarities than differences. So I guess, the Bacch effect in 2 channel stereo could give a more accurate reproduction than without.
 
In my experience, XTC processing is the biggest upgrade you can do for your box speaker stereo sound system.

For basic XTC software, this browser plugin is good start. https://magic.audio/web-extension-addon
For basic XTC hardware, MV Silicon's BP1064a2 DSP chip is an option. (As found in most Arylic products.)

XTC not only works for box speakers, it can improve soundbars also.

Interestingly, with open baffle speakers both delayed and out of phase signals radiate from the back side of the drivers by design.
 
In my experience, XTC processing is the biggest upgrade you can do for your box speaker stereo sound system.

For basic XTC software, this browser plugin is good start. https://magic.audio/web-extension-addon
For basic XTC hardware, MV Silicon's BP1064a2 DSP chip is an option. (As found in most Arylic products.)

XTC not only works for box speakers, it can improve soundbars also.

Interestingly, with open baffle speakers both delayed and out of phase signals radiate from the back side of the drivers by design.
Open baffle I suspect work well for XTC due to their dispersion pattern.
 
What have you found to be the best camera for headtracking? Is there a good one with night vision?
 
What have you found to be the best camera for headtracking? Is there a good one with night vision?

Bought an ELP infrared USB camera on Amazon.
First one died after a year, Amazon replaced it.
The 3.5mm lens is good if you place the camera close to you.

I found the model with an 8mm lens, which is good for a seating distance of 2.5m.

Nice thing is it switches between normal image sensing and illuminated infrared light in the dark.

Very happy.


 
I’m joining this thread again after a long time. Wishing everyone a happy and prosperous New Year in 2025!
This thread is related to Bacch, but since it’s the thread on ASR with the largest number of people interested in XTC, I’m leaving a post here. (To be precise, I also made a brief post in Tim Link’s personal XTC thread, but most XTC users are here.)

I’d like to hear the opinions of those who have experience with XTC regarding the points below.
This isn’t anything too profound—it’s just my experiment.
There are a few graphs and some text, so you might have to scroll down quite a bit. I apologize in advance for that.



icon_18.gif


In most reviews of XTC, the phrase "the speakers disappear" is quite common.
I also agree with this, but I was wondering if this was the intended outcome or if it was due to additional effects.

1736565594944.png


Starting briefly with the physical barrier, its purpose is to prevent the opposite ear (in this case, the right ear) from hearing the sound when the left speaker is playing.
It functions similarly to IEMs or headphones, delivering sound to only one ear.
However, in my opinion, this barrier blocks most of the ILD/ITD that naturally occurs when listening to speakers in real-life situations.
(If we assume that stereo speaker listening has its own identity, then that identity is hindered and lost due to the barrier.)

Therefore, array-based methods and basic recursive cross-iteration DSP XTC processing approaches ensure that both ears can hear the sound. (This applies regardless of whether it is universal or personalized.)
The cancellation signal leaves traces in the opposite ear, and it repeatedly reduces them to eliminate those traces.
But that’s unavoidable. In reality, we always hear with both ears, and it’s impossible to play a desired filter from one speaker and have it reach only one ear accurately.
However, since I can freely manipulate or synthesize each channel using BRIR impulses, I decided to give this a try.
(Originally, I also performed XTC using the traditional Bacch or Race methods.)

So, I recently tested this using the files of another individual I was assisting with calibration.

1736566058983.png

1736566068327.png

I extracted only the crosstalk corresponding to each path from his binaural room response.

1736566121742.png


The yellow line represents the right ear response from the left speaker (L_RightEar), while the blue line shows L_RightEar after cancellation.
And now, we need to check the combined response.
(The left ear and right ear responses from the left speaker.)


1736566394546.png

1736566407049.png

The Red graph represents the original combined response, while the Green graph shows the combined response after cancellation.
Although it’s not a perfect 30-degree stereo setup, the dip patterns resemble the typical stereo dip mentioned in Dr. Toole's book.
However, in the Green graph, you can see that these dips have completely disappeared.

Since this file is not mine, I couldn’t make an exact judgment, but even when listening to pink noise, the tonal balance differences were clearly noticeable.
I sent this to the owner of the file and waited a few days for their response.

1736566585337.png


I told him that when I sent the file, I approached it with the thought, “Could this feel like having a transparent head?” and decided to give it a try.

He then mentioned that when listening to his own BRIR, he could sense that his head was no longer blocking the sound, and it felt as though his head had become transparent. While the stereo image was maintained perfectly in front of him, at the same time, there was a sensation of sound passing through his brain—not his ears, but his brain.

I also recommended some music tracks and binaural tracksI also recommended tracks like a binaural car track where a car approaches from the front, passes by, and fades behind, or typical ASMR binaural tracks.) to him, and he was mostly satisfied and repeatedly emphasized his experience. He described it as feeling like the best aspects of speakers and headphones were combined, with contradictory sounds occurring simultaneously. His brain perceived the differences in distance and direction between these sounds, and he repeatedly stressed that this seemed paradoxical and almost incomprehensible.

And I asked him again.
I inquired whether there were any differences in the panning elements or perceived staging.
However, he mentioned that there wasn’t much change in the imaging.
The reason I asked this is that the graph I attached represents something impossible in reality—only the personalized crosstalk from the opposite ear path was precisely removed, and I didn’t add anything else.
I had anticipated that he might have impressions of specific panning elements, similar to the way people describe their experiences in reviews of general DSP XTC processing. However, that was not the case at all.
This led me to wonder if changes in certain panning elements or staging might be additional effects resulting from the repeated mixing of information between the two channels.
(Just to clarify, I appreciate and respect all XTC software, hardware, physical barriers, and theoretical approaches. I enjoy learning from them and have no negative thoughts about them whatsoever.)

You might say, “Isn’t it meaningless to discuss this since it’s something impossible in reality?” However, I’m curious to hear your opinions on these results.
I hope I’m not hijacking the OP’s thread. My apologies, and thank you for reading my post.
 
Last edited:
And I asked him again.
I inquired whether there were any differences in the panning elements or perceived staging.
However, he mentioned that there wasn’t much change in the imaging.
The reason I asked this is that the graph I attached represents something impossible in reality—only the personalized crosstalk from the opposite ear path was precisely removed, and I didn’t add anything else.
I had anticipated that he might have impressions of specific panning elements, similar to the way people describe their experiences in reviews of general DSP XTC processing. However, that was not the case at all.
This led me to wonder if changes in certain panning elements or staging might be additional effects resulting from the repeated mixing of information between the two channels.
(Just to clarify, I appreciate and respect all XTC software, hardware, physical barriers, and theoretical approaches. I enjoy learning from them and have no negative thoughts about them whatsoever.)
I tried to edit my previous post, but I can't see the Edit button....
It turned out that there were some issues with the input/output of the binaural IR the reviewer was using. After resolving them, I received an updated review.
Here are their words:
  • "I started listening to this, and I’m completely hooked. The way some people close doors or walk across the screen is truly spine-chilling."
  • "I was in a trance. It felt so three-dimensional that it was almost upsetting—like something was happening around me."
  • "It’s wrapping around me now, in a way it never did before."
  • "I briefly tapped out of the video, but it kept playing. The camera switched to another part of the road, and I could hear a car driving past me, winding away in the distance. Before it even came back, I could already tell the shape of the road. Unbelievable."
  • "But now, I’m hearing everything so precisely. It’s kind of eerie."
It’s reported that they’re experiencing such a 3D sensation with the crosstalk perfectly removed (not just channel crossover but precisely eliminating crosstalk).
I initially thought it would require considering mixing information from both channels, as done in XTC-BACCH, but that doesn’t seem to be the case. (However, I think I’ll need to listen and judge this myself to get a clearer understanding. Since I can test the intentional and unintentional effects when both channels are mixed for XTC, I’ll provide additional reports on this in a separate thread.)
Since my hearing condition isn’t great at the moment, I can’t test this myself, but I’ll need to try it with my own setup as soon as possible.
Essentially, this thread is related to BACCH, and I posted some content about DIY XTC here. However, in hindsight, I think I probably should have created a new thread.
That said, since the review has been updated, I’m writing this to make a correction.
OP, while this is related to XTC, if I’ve hijacked your thread, I apologize once again.
 
  1. How does BACCH enhance the spatial realism in the reproduction of acoustical recordings made in real acoustical environments?
    Crosstalk cancellation (XTC) techniques, such as BACCH, suppress the sound recorded on the left (right) channel of a stereo recording at the right (left) ear of the listener during stereo playback from a pair of loudspeakers. This cancellation raises the limit on the level of interaural level difference (ILD) and interaural time difference (ITD), above the levels that the speakers can deliver without XTC, allowing more of the correct spatial cues of the recorded sources to be reproduced at the ears of the listeners. (ILD is the difference between the sound pressure (in dB) caused by a given source at one ear minus the pressure at the other ear. The ITD is the difference between the sound arrival times at the two ears. Both are generally frequency-dependent functions.)
    This is best illustrated by first considering the case of ILD (the case of ITD will be discussed subsequently) and acoustical stereo mic recordings in a real space.
    Most, if not all, of the statements below can be verified by the experimentally-minded reader using the BACCH-BM microphone and the recording and extensive measurement capabilities of the BACCH-dSP application at the heart of Theoretica’s BACCH4Mac packages.
    It is insightful to first consider the general case of a Binaural dummy-head recording, then it would become easier to understand the more particular case of regular stereo miking techniques. The latter are of two general types: Type A stereo miking techniques, that rely on ITD to code the stereo image (e.g. ORTF, XY, and other coincident mic techniques, etc..) and Type B stereo miking techniques that rely on the ILD to code the stereo image (e.g. spaced omni or A-B mics, Decca tree, Jecklin disk, etc…)
    Binaural Recordings:
    The general case of binaural dummy head recording is the most natural (i.e. most akin to how humans hear) as it captures both the ILD and ITD cues, as well as the so-called spectral cues (which are associated with the non-flat frequency response imposed by the diffraction of the sound waves around the torso, head, and pinnae of the dummy, or human, head wearing the in-ear microphones.) This individualized frequency response, which helps the brain-ear system locate sound sources according to the tonal coloration the listener’s particular brain-ear system expects, becomes flatter as the frequency lowers due to the wavelength becoming larger than the objects (the torso, head, and pinnae) the sound is diffracting about. These “spectral cues”13 are used by the human ear-brain system, in addition to the ILD and ITD cues, to locate sound sources
    Let us consider the case of recording a performer on a stage in a real hall. Using a dummy-head binuaral microphone (or a human wearing in-ear mics) we would capture all three types of cues (the ILD, ITD and spectral cues) on the two channels of the stereo recording. Say the performer is located at an azimuthal angle of 50 degrees to the left of the dummy head. If one measures the ILD caused, at the dummy’s ears, by a sound source located at that location (such calcualtion consists of subtracting the SPL measured at the right ear from that at the left ear) one would find, on the average, about 8 dB (strictly speaking this depends on the distance and frequency, which for the sake of illustration, we would take to be about 10 feet and 1 kHz, respectively). If the performer, while performing (e.g. clapping), moves to the center position facing the dummy head, the ILD would drop to 0. If she moves further to the let, the ILD increases and can easily exceed 8 dB. If she approaches the recording head from the left, the ILD would build up further (due to the enhanced effect of the head shadowing the right ear) and can reach as high as 20 dB if the the performer gets very close to the left ear (since most of the sound will be blocked from reaching the right ear). As a thought experiment, let us record, using the dummy head mic, the performer as she moves (while performing) from the center position to the left position (50 degrees), and then walks to the recording head and whispers in its left ear.
    For a stereo playback system to be able to reproduce this entire spatial image accurately from the above-described recording, it must reproduce this entire range of ILD, from 0 to 20 dB at the ears of the listener. We shall now explain why a regular stereo system cannot do so without XTC.
    The problem with “regular” stereo playback system (as opposed to one with XTC) is that the maximum ILD it can deliver is that produced by the left (or right) speaker, which for a regular stereo (equilateral) triangle is about only 3-5 dB at 1 kHz (depending on the radiation pattern of the speaker and the distance of the listener from the speakers). This number can be easily verified by putting a test signal (1 kHz sinewave, or pink noise) in the left channel (and only the left channel), measuring the SPL at the left ear and subtracting from it the SPL measured at the right ear. (which can be easily done using sine sweep in BACCH-dSP to produce a plot of the entire ILD spectrum over the entire audio band.). The plot below shows such a typical measured ILD spectrum made with BACCH-dSP through a typical stereo system in the “regular stereo triangle” (+/- 30 degrees) configuration.


    FAQ14Plot1.jpg


    The black (red) curve represents the measured ILD spectrum of the left (right) speaker at the ears of the listener. Note that at 1 kHz, the ILD is about 5dB. At higher frequencies, head shadowing (which acts as a “natural XTC”) causes the ILD to rise a bit (as clearely seen in the plot), but the most important content, perceptually, (especially human voices) is below 1 kHz. Therefore, a listener listening to the recording we made above would hear the performer move from the center towards the left speaker, then gets “stuck” at the left speaker as the recorded ILD in the recording builds up above 5dB, since the reproduced ILD at the listener’s ears cannot exceed 5 dB. This should illustrate clearly the fundamental flaw in speakers-based spatial audio reproduction without XTC. [See Footonote14 for an additional, more subtle, flaw].
    XTC can remove this limitation. In particular, BACCH can deliver the maximum possible level of XTC (with zero added tonal coloration) for a given pair of speakers in a given room based on a measurement of the two-point HRTF15 of the listener with the calibrated BACCH-BM microphone. The resulting ILD spectrum (which, by definition, is the same as the XTC spectrum) is shown in the figure below for the same audio system:

    FAQ14Plot2.jpg


    It should be clear from this plot that BACCH can deliver, for the same audio system, 15 dB ILD at 1 kHz, with ILD levels well exceeding 20 dB, at the ears of the listener sitting in the sweet spot (the location where the HRTF measurement was made.) Therefore, the performer would now be perceived to walk all the way from the center, way past the left speaker, to an azimuthal angle of 50 degrees, then walk towards the listener and whisper in his left ear, much like in the real life event. This is the case irrespective of the location of the speakers, as long as the BACCH filter used during playback was designed for that particular speakers-listener configuration. (Incidentally, BACCH-dSP has a simple easy-to-use binaural recorder that allows you to verify the above by quickly making such a recording of a performer walking around you with the BACCH-BM in your ears, then immediately listen to it through a BACCH filter.)
    Now that, we hope, this is all clear for a dummy-head recording, it is easy to explain how a similar enhancement to the accuracy of spatial reproduction can be attained for a recording done with a regular stereo mic pickup.
    Type A Recordings:
    Stereo recordings done with a "Type A microphone" (ORTF, XY, coincident mic techniques) rely on mic capsules with directional pickup patterns (cardioid, hypercardioid, etc.) oriented in such a way to proportionally attenuate the sound of a source located the right (left) side of the microphone as it reaches the left (right) capsule. Therefore, it is mostly capturing the “ILD” (and in the case of a coincident stereo microphone, only the ILD). Although this “ILD” may be a bit different from the actual ILD a dummy head would capture (since the attenuation imposed by the highly directive capsules may not accurately represent the attenuation due head shadowing), it is fully capable of capturing a good part, if not all, of the wide range ( 0-20 dB) of our proverbial walking performer. Again, a stereo system without XTC will only be able to reproduce a small part of that range (up to about 5 dB) and again, the performer will be stuck at the left speakers as soon as she reaches about 30 degree azimuth to the left, and remains there throughout the rest of the recording, while in real life she was walking well past the angle (to 50 degrees) then towards the left side of the microphones. Again, the same stereo system with the BACCH filter whose measured XTC performance is shown in the plot above, can reproduce virtually the entire range of ILD, and thus can give the listener a far more accurate spatial reproduction of the full spatial image.
    The difference between a binaural recording done with a dummy head, and a stereo recording done with Type A stereo microphone, when rendered through the same BACCH filter whose XTC performance is shown in the plot above, is that the one-to-one spatial correspondence between the real image and perceived image is more accurate for the former (since the ILD is coded with the attenuation due to a human head shadowing) than the latter (since the ILD is coded with the particular attenuation due to the directivity pattern of the capsules in the Type A stereo mic). However, they both give a spatial image (through the same BACCH filter) that is far more accurate and realistic than of playback without XTC.
    Type B Recordings:
    Since "Type B" stereo recording techniques (e.g. spaced omnis) use omni-directional microphones, they rely on spacing the two microphone capsules some distance apart to pick up ITD cues (the captured ILD cues being negligible16). At first look one might (wrongly) suspect that stereo recordings done with such a stereo microphone might not benefit from XTC during playback as much as Type A or binaural recordings, since XTC only affects the level of the sound pressure at the ears. But in fact, the delay between the arrival times of a source’s sound at the left and right capsules of the microphone will not be reproduced correctly at the ears of the listener if crosstalk is present, as explained in the next (long) paragraph.
    To understand why this is the case, consider again the performer moving from the center position, where ITD is 0, to 50 degrees azimuth left while clapping her hands. A typical ITD for a source there would be something like 400 microseconds. Now if that recording of the performer clapping at 50 degrees azimuth is played back through a pair of stereo speakers, the level of the clap sound is the same on both channels (because there is little if any ILD captured by the Type B stereo microphone) but the clap on the right channel is delayed by 400 microseconds with respect to the right channel of the recording. Therefore, the sound of the clap will arrive at the left ear from the left channel first, then, after a delay time of t1 microseconds, that same sound wave will reach the right ear (t1 is the ITD that would be caused at the ears of the listener by a source located where the left speaker is located, i.e., at 30 degrees azimuth. It should be clear that t1 would be significantly less than the ITD of a sound source at 50 degrees (400 microseconds)). If, hypothetically, there is no sound from the right speaker, the listener would hear the clap coming from the location of the left speaker (which, at 30 degrees azimuth, is not the correct 50 degree azimuthal location of the real life clap). However, the right speaker will emit the clap recorded on the right channel 400 microseconds after it was first emitted by the left speaker. This same sound will reach the left ear t1 microseconds later (again If, hypothetically, there was no emitted sound from the left speaker the listener would hear the clap coming from the location of the right speaker) causing an ILD of t1, which is wrong in value, and also on the wrong side of the listener! However, due to the Hass precedence effect, the two sounds (emitted from the left and right speakers) are perceived as fused into one, and the ITD caused by the first one (from left speaker) dominates perceptually, as it arrived first, causing the listener to perceive the sound of the performer clapping to be essentially located at the left speaker, which is 30 degrees, and not the correct 50 degrees we seek [see Footnote 17 for a more accurate description of the net effect of the “fusing” of these two sounds].
    In contrast, if the crosstalk is cancelled, the left ear (and only that ear) would hear the clap emitted from the left speaker, then the right ear (and only that ear) would hear the clap from the right speaker delayed by 400 microseconds resulting in the correct ITD at the ears, and thus allowing the listener to perceive the correct real-life location of the performer, irrespective of the location of the speakers (again, assuming that the BACCH filter corresponding to that speakers-listener configuration is used).
    You can easily verify the above claim that XTC improves the spatial accuracy of Type B recordings using BACCH-dSP: First, make a recording of someone walking speaking or clapping around you while you have the BACCH-BM microphones in your ears. This first recording would be the reference binaural recording. Then make a second recording of the same performance, but this time hold each of the two capsules in each hand, spaced about 6 inches apart. Since the BACCH-BM capsules are essentially omnidirectional, this is tantamount to a "Type B recording" (spaced omnis). After the recrordings are done, play the reference binaural recording while toggling the BACCH filter on and off (which is in BACCH-dSP can easily be done by a tap of the mouse) and observe how the spatial accuracy is greatly improved when the BACCH filter is on. Finally, play the Type B recording while toggling on/off tthe BACCH filter, and you will also hear a significant enhancement in the spatial accuracy when the BACCH filter is on, as discussed above.
    In conclusion XTC greatly benefits the spatial accuracy, not only the speakers-based playback of binaural recordings, but also those of Type A and Type B recordings, (and therefore of virtually of all well-made stereo acoustical recordings in real acoustical spaces) as it allows both the ILD and ITD cues to be reproduced more correctly at the ears. If XTC works only for binaural recordings, as some people who have not carefully listened to proper XTC have wrongly surmised, no one would be interested in BACCH, as binaural recordings are a very miniscule fraction of available commercial recordings.
    There remains the important question of whether XTC can benefit the spatial rendering of recordings that are produced “artificially” by mixing audio stems (which is the vast majority of popular music). This question is addressed in the following section (to be added very soon).

  2. How does BACCH enhance the spatial imaging of "studio-mixed" recordings without altering the sound intended by the mixing engineer?
    In light of the arguments in FAQ #14 above, we can now address the case of “studio-mixed” recordings, which represent the vast majority of commercially available recordings. In such recordings, the mixing engineer (and sometimes with input from the artist(s) and/or producer(s) and, to a lesser extent the mastering engineer,) concoct an artificial stereo image from stems (most often mono stems) mostly through level panning (and, much less often, time or phase panning) between the left and right channels. Mixing to produce a realistic, pleasing or engaging stereo image is an art involving both technical knowhow and esthetic decisions.
    Many mixing engineers are truly ingenious masters. It goes without saying that their final product deserves the utmost respect and that a good hi-fi reproduction system should not degrade or fundamentally alter their construct. It is also very true that virtually all commercially available mixed recordings were mixed while monitoring on monitors without XTC.
    Depending on the techniques used and esthetic decisions made, these concocted recordings range over a wide spectrum: on one end of the spectrum are recordings aiming to emulate a real acoustic environment (e.g. a jazz club). Let us call this end of the spectrum the “pseudo-realistic end”. On the other end of the spectrum are recordings that have no binding ties to realism, and instead aim to evoke sensations, or project certain esthetic expressions (e.g. the chimes in Pink Floyd’s well-known Time track on their Dark Side of the Moon album). Let us refer to this end of the spectrum as the “artificial end”.
    We will now consider what happens when such recordings are played back through XTC.
    On the pseudo-realistic end of that spectrum, most of the arguments made in FAQ#14 above hold, to some extent, since the mixing engineer is essentially using at least an analog of ILD and ITD to produce a “realistic” stereo image like a stereo mic would, and all that XTC does is remove the artificial cieling on the ILD and ITD limits imposed by the speakers during playback. Most relevant in this context is reverb. During mixing, reverb is added algorithmically or through convolution with a real space impulse response (with the latter technique yielding far more realistic reverb). In both cases XTC unlocks the perceived reverberation from the speakers and project it into 3D space. It does so because the perception of a realistic 3D reverb is caused by late reflections (the diffuse field) arriving at the left and right ears at almost random arrival times (i.e. with low L-R correlation, in the parlance of acoustics) and without XTC the sound at the right and left ears would be highly corelated since the sound from each of the L or R channels reaches both ears. Such highly L-R corelated sound causes the listener to perceive the reverb to be largely restricted spatially a region that is mostly where the speakers are. It is hard to imagine a mixing engineer who would object to his mix reproduced with a reverb that is more 3D and less “stuck to the speakers” (as long as the tonal and level balance between the direct and reverberant sound is not altered. (BACCH is a patented form of advanced XTC that causes no alteration whatsoever to that balance as described in this standard, but highly technical book chapter.) In fact, one of the most noticeable and striking aspects of listening through a BACCH filter for the first time is the immediate sense of being in a real 3D space due to the higher L-R sound decorrelation that reverb is meant to cause at the ears.
    On the “artificial end” of the studio-mixed recordings spectrum defined above, the mixing engineer concocts an image whose panned sources constitute an artificial stereo image that does not aim to be a reflection of a reality, but rather an esthetic or artistic construct. While mixing that image the engineer is choosing to place sources in a space that is largely between the two speakers. However, as is well-known by audiophiles, even a stereo playback system without XTC can image in a 3D, albeit relatively restricted, spatial region around the speaker (often called “the soundstage”). The main reason such imaging occurs without active XTC is because the listener’s head, by shadowing the contralateral ear from the loudspeaker (i.e. the speaker on the opposite side) creates a natural crosstalk cancellation that is highly effective at higher frequencies (i.e. frequencies whose wavelengths are smaller than that of the human head). It should be clear that this natural XTC (which can be seen in the measurement shown in the first plot in FAQ#14) depends on the span between the speakers, the distance between the head and the speakers, the radiation pattern of the speakers, and the extent and relative strength of reflections in the room. A larger speaker span, a shorter distance to the head, a more directive speaker, and a higher ratio of direct-to-reflected sound, all lead to higher values of this natural XTC. This is mainly why different stereo systems in different rooms with different listener-speakers placements, can achieve different levels of “3D imaging”.
    A mixing engineer in a given studio with a certain set of stereo speakers concocts a stereo image while hearing a soundstage the spatial extent of which depends largely on the above listed parameters of the particular monitoring setup in the studio. An audiophile playing back the resulting recording through a good hi-fi stereo system at home has generally no way of knowing what these parameters were when the mix was produced, but still strives to get a good measure of a 3D soundstage. Indeed “3D soundstage” imaging of a playback system is one of the holy grails for audiophiles and audio critics. By choosing and tuning his gear and listening room to enhance such soundstage the audiophile does not betray the intent of the mixing engineer as long as the enhancement of the spatial extent of the soundstage does not come at the expense of a change in the spatial balance or tonal content of the recording during playback. It is very possible that the 3D imaging of an audiophile’s playback system has significantly better 3D imaging capability than that used by the engineer while monitoring the mix. No one would object if this were the case, or accuse the audiophile of betraying the engineer's intent.
    For such recordings (on the “artificial end” of the spectrum,) XTC cannot pretend to enhance realism during playback since the stereo image was artificially concocted in the first place. However, like in the case of natural XTC, adding more XTC actively to enhance the spatial extent of the soundstage, without altering the balance or tonal content of the recording, (which is the essential characteristic of BACCH XTC) does not strictly betray the intent of the mixing engineer since the spatial extent of the artificial soundstage was not prescribed by him. Of course, this argument becomes more tenuous if XTC leads to extreme spatial panning, which can only happen for hard left or right panned sources in the absence of reflections (e.g. in an anechoic chamber, a hard left or right panned sound source played back through a pair speakers with high levels of XTC, without any ILD or spectral cues added to the sound, would lead to the sound being perceived to be very close to the left or right ears of the listener, as if wearing headphones). Such extreme imaging does not occur in real listening rooms with typical levels of direct-to-reflected sound ratio.
    Of course, the level of active XTC during playback can be dialed down (in BACCH-dSP there is an “XTC percentage” slider that allows doing just that) but it should be clear from the above arguments that this is not recommended for acoustic recordings or for recordings on the “pseudo-realistic end” of the “studio-mixed” recordings spectrum. Moving towards the “artificial end” of the spectrum, the question of betraying the original intent of the engineer does indeed become a valid objection, but only to the extent to which XTC alters the tonal character and spatial balance of the recording (which BACCH, by design, does not do at all) and to the extent to which high levels of XTC can result in jarring extremely panned images, which can occur with BACCH but only in near-anechoic environments and with recordings having extremely panned mono images. The latter issue can be addressed by dialing back the XTC level (or in extreme but very rare cases, by bypassing XTC!).
There is no way I can get a word in edgewise and get on top of this conversation, but you are all mistaken in thinking that there is a single spatial audio theory combining everything we know about the psychoacoustics of auditory perspective systems. As explained by Jens Blauert, there are two distinct and separate systems, or methods, or theories of how to do it based on the difference between a direct sensory input (binaural) and reproducing the object itself (sound fields in rooms). These are distinctly different processes and one has nothing to do with the other. One is recorded with a dummy head and the other is recorded with normally deployed microphones. One is always and only two channels, one for each ear, and the other is an unlimited number of microphones, channels, and speakers. One is presented directly to the ears, the other is sounds placed in ROOMS, not directly into ears. Crosstalk is NOT a problem with stereo and in fact has nothing whatsoever to do with it. Surround sound is the best counterexample to this erroneous thinking. We have many speakers and channels with surround sound, placed in geometrically similar positions to the original sources and the ambience in the concert hall. Crosstalk has nothing to do with this system and the same principle applies to two channel stereo legacy recordings but it gets confusing because normal stereo uses two channels as well, but if it wasn't recorded binaurally it is not the same thing and has nothing to do with crosstalk and should not be played on loudspeaker binaural. It is my mission in life to explain this confusion and correct it so we can stop all of this nonsense and get on with studying how to do each system to its best capabilities.
 
As explained by Jens Blauert, there are two distinct and separate systems, or methods, or theories of how to do it based on the difference between a direct sensory input (binaural) and reproducing the object itself (sound fields in rooms). These are distinctly different processes and one has nothing to do with the other.

That is an interesting explanation, thank you. Where did Blauert explain this? Do you have a link, or is it in his book? It has been out of print for some time now.
 
There is no way I can get a word in edgewise and get on top of this conversation, but you are all mistaken in thinking that there is a single spatial audio theory combining everything we know about the psychoacoustics of auditory perspective systems. As explained by Jens Blauert, there are two distinct and separate systems, or methods, or theories of how to do it based on the difference between a direct sensory input (binaural) and reproducing the object itself (sound fields in rooms). These are distinctly different processes and one has nothing to do with the other. One is recorded with a dummy head and the other is recorded with normally deployed microphones. One is always and only two channels, one for each ear, and the other is an unlimited number of microphones, channels, and speakers. One is presented directly to the ears, the other is sounds placed in ROOMS, not directly into ears. Crosstalk is NOT a problem with stereo and in fact has nothing whatsoever to do with it. Surround sound is the best counterexample to this erroneous thinking. We have many speakers and channels with surround sound, placed in geometrically similar positions to the original sources and the ambience in the concert hall. Crosstalk has nothing to do with this system and the same principle applies to two channel stereo legacy recordings but it gets confusing because normal stereo uses two channels as well, but if it wasn't recorded binaurally it is not the same thing and has nothing to do with crosstalk and should not be played on loudspeaker binaural. It is my mission in life to explain this confusion and correct it so we can stop all of this nonsense and get on with studying how to do each system to its best capabilities.
 
That is an interesting explanation, thank you. Where did Blauert explain this? Do you have a link, or is it in his book? It has been out of print for some time now.
“A question of particular importance to communications engineers is the technical feasibility of transmitting a particular spatial impression as faithfully as possible across a distance of space and time. The purely acoustical or electroacoustical part of this task is identical with the task of reproducing exactly at the listener’s position in the playback room the spatial, temporal, qualitative constellation of auditory events that occurred in another space or position and at another point in time.

“In principle two approaches to solving this problem are possible. One consists of generating a sound field in the playback room that corresponds largely to that in the recording room. Such an electroacoustically generated sound field is called a ‘synthetic sound field.’ The second approach proceeds from the assumption that an optimal acoustical reproduction is attained if the subject’s ear input signals are identical to the ear input signals that would be generated at the position and time of sound collection. To this end, ear input signals are collected, transmitted, and reproduced. Processes employing this technique are called binaural or ‘head-related’ since a head, usually a dummy head, is used in collecting the ear input signals.”
 
One is presented directly to the ears, the other is sounds placed in ROOMS, not directly into ear. Crosstalk is NOT a problem with stereo and in fact has nothing whatsoever to do with it. Surround sound is the best counterexample to this erroneous thinking. We have many speakers and channels with surround sound, placed in geometrically similar positions to the original sources and the ambience in the concert hall. Crosstalk has nothing to do with this system and the same principle applies to two channel stereo legacy recordings but it gets confusing because normal stereo uses two channels as well, but if it wasn't recorded binaurally it is not the same thing and has nothing to do with crosstalk and should not be played on loudspeaker binaural. It is my mission in life to explain this confusion and correct it so we can stop all of this nonsense and get on with studying how to do each system to its best capabilities.
Eliminating crosstalk with BACCH software in my room, even with regular (non-binaural) recordings, noticeably enhances the sound—to my taste. It seems to overcome some inherent limitations of stereo playback in my setup: the soundstage becomes more precise, wider, deeper, and overall more immersive "3D". Given these improvements, why should I stop playing stereo recordings in a binaural manner on my system?

When I listen to the rare binaural recordings with the same crosstalk-cancellation software, the 3D effects of these unique recordings become even more pronounced and immersive. That's fun, but not why I use crosstalk cancellation software.

BTW: please avoid the obvious suggestion of upgrading my system hardware, room, or listening position—those aspects are already well-optimized.
 
Last edited:
Crosstalk is NOT a problem with stereo and in fact has nothing whatsoever to do with it. Surround sound is the best counterexample to this erroneous thinking.
Are you comparing Record-Playback stereo (non-binaural/binaural) with Record-Playback surround sound?
 
Eliminating crosstalk with BACCH software in my room, even with regular (non-binaural) recordings, noticeably enhances the sound—to my taste. It seems to overcome some inherent limitations of stereo playback in my setup: the soundstage becomes more precise, wider, deeper, and overall more immersive "3D". Given these improvements, why should I stop playing stereo recordings in a binaural manner on my system?

I can't see anyone saying that you should stop using BACCH in your system. If you like the effect the BACCH program creates, you should keep on using it, but there should be no harm in knowing it will only act as an "effect plugin" for everything you play that is not a binaural recording.

Depending on how a non-binaural recording is done, the effect can be more or less convincing-sounding. It can suit some audio productions a bit more convincing when the dominating pickup pattern heard on the recording comes from a wide-spaced pair of microphones, and where the "widening effect" that occurs in the reproducing listening environment more or less comes closer/better simulates the actual width of the positioning of those microphones used during the recording, and wider than the physical positioning of the loudspeakers in the listening room.



In many of these discussions, it seems that many people who much like the effect that crosstalk-canceling programs (are having even on audio material that is not binaurally recorded) have some sort of an "ego obstacle" of not accepting it as "just" an effect, they want it so much to be something that should be considered a fix to a universal problem as if all recordings no matter how they are done will be "heard as they where all meant to be heard".

When I listen to the rare binaural recordings with the same crosstalk-cancellation software, the 3D effects of these unique recordings become even more pronounced and immersive. That's fun, but not why I use crosstalk cancellation software.

Those rare binaural recordings are the ones that are obviously in need of a playback system with crosstalk cancelation to sound as they are intended to sound, otherwise they will simply not work. But for everything else, crosstalk cancelation will have a more or less convincing-sounding effect, and there is nothing wrong if someone likes and prefers what that effect is adding, and they should definitely keep on using it if it enhances their experience. There is nothing wrong with that.
 
Back
Top Bottom