Also, I've just read the Prof. Choueiri paper, "Optimal Crosstalk Cancellation for Binaural Audio with Two Loudspeakers", and while I think the basics of BACCH are now reasonably clear to me, I still don't understand where/how the listener's HRTF (or the generic HRTF) fit into the processing of the signal. Can anyone sketch out what is going on here please?
Thank you andreasmaaan for asking this good question. It is one that we are asked often enough that I think should go on our FAQ page.
First let me clarify that, unless you are referring to the version of Prof. Choueiri’s paper that appears in chapter 5 of the book Immersive Sound (see my earlier citation of that book), the earlier paper does not talk about the role of HRTF in designing BACCH filters. That earlier paper is about explaining the fundamental problems in crosstalk cancellation (XTC) and, for the sake of clarity, uses an idealization of two point sources for the speakers and two spatial locations points for the listener’s ears, neglecting the presence of the head. Therefore there was no HRTF in that explication (and none is really needed to explain the fundamental problems of XTC).
In Chapter 5 of the book Immersive Sound Prof. Choueiri updates the research presented in that earlier paper and adds a section titled "Individualized BACCH Filters” where he explains how to design an individualized (or custom) BACCH filters for a given listener’s HRTF. That is precisely the method used in BACCH4Mac to make BACCH filters based on HRTF measurements.
If you are mathematically minded (and you seem to be) you can read that section of that chapter and you would learn exactly how BACCH filters are produced by BACCH4Mac.
For the sake of others and people who do not have access to that book, and in order to answer your question in plain English, I provide below, a "simple” explanation of the role of HRTF in the design of BACCH filters by BACCH-dSP, which is the Mac application at the heart of the BACCH4Mac product (
https://www.theoretica.us/bacch-dsp/).
To make a BACCH filter, the listener sits in the intended sweet spot and inserts the BACCH-BM binaural microphones in his ears so that each of the two small microphone capsules is at the entrance of each of the ear canals. He then clicks on a button that starts the measurement process, which sends, sequentially, an exponential sine sweep (from 20 Hz to 20 kHz) from each of the two loudspeakers. The sound is recorded by the microphones as it reaches the entrances to the ear canals after having interacted with the head, the torso and the pinnae of the listener. This measurement is a (small) sample of the individual’s HRTF. Strictly speaking the HRTF (Head Related Transfer Function) is the series of many (often more than 1500) such measurements taken each, in an anechoic environment, with the sound sources (the speakers) located at a different location on a virtual sphere surrounding the listener. Luckily for making a BACCH filter we do not need the entire HRTF (which would take a very long time to measure and requires the listener to remain still during that long time). It needs only two elements of the whole HRTF set: those two measurements corresponding to the locations of the two speakers in question. These two measurements are processed using a standard mathematical method called de-convolution to result in a set of 4 impulse responses: each representing the impulse response of each of the two speakers measured at each of the two ears. This set of 4 IRs is often called BRIR in the research literature, and stands for "Binaural Room Impulse Response”. (In its default setting BACCH-dSP windows the BRIR to exclude refections in the room, so strictly speaking it should be called the BIR or more accurately “Binaural Impulse Response of the Speakers”. )
The BRIR is then used by the BACCH algorithm to produce a digital crosstalk cancellation (XTC) filter called the BACCH filter (in the form of a Finite Impulse Response (FIR) filter) following a method detailed in the last section of chapter 5 of the book. (For the technically minded: the method consists of a pseudo-inversion of the transfer function represented by the BRIR, optimizing a cost function consisting of XTC, tonal distortion and dynamic range, and using extreme frequency-dependent regularization that insures that the amplitude response of the filter is perfectly flat). A BACCH filter is a unique and special type of XTC filter that has no coloration (i.e. causes zero tonal distortion).
The BACCH filter thus produced by BACCH-dSP (it takes the embedded C++ program a fraction of a second to produce the filter) is then automatically loaded in an FFT-based 64-bit convolver inside BACCH-dSP which is used to convolve (a standard mathematical process that applies a finite impulse response filter to a signal) in real time the input audio signal with the BACCH filter to effect crosstalk cancellation at the ears of the listener sitting in the sweet spot.
(A similar filter design process is done to produce a set of more than 40 BACCH filters by interpolation for the case when head tracking is desired, with the requirement that 2 additional BRIRs be measured - one at each end of the desired area for head tracking - while the head tracking camera is recording the location of the measurements.)
A very good question often asked is “Does every listener require his/her own individualized BACCH filter?”
The short answer is generally no, as long as the playback system is the same, and the speakers and listening locations are also the same. In other words, the individual aspect of the HRTF used to make the filter is not nearly as important as (in decreasing order of importance): 1) the listening geometry and 2) the impulse response of the speakers themselves. This is strictly true if the speakers locations span less than an angle of about +/- 40 degrees measured at the listening location (which is often the case in serious stereo systems). Only if the speakers are at larger spans (or, unlikely, at a significant angle above or below the azimuthal [aka horizontal/equatorial] plane) does one need to make a BACCH filter for each listener.
To understand why this is so, one must understand a fundamental fact about spatial hearing illustrated in the two figures shown below (taken from the AES paper: Takeuchi
et al. "Influence oflndividual HRTF on the performance of virtual acoustic Imaging Systems” Audio Engineering Society Convention 104, May 1998.). Figure 1 is a plot showing the subjective testing results of many listeners who were asked to locate a sound projected through a virtual acoustic imaging system (using the listener's HRTF) to a location in the azimuthal plane. Figure 2 is a plot of the subjective test results using a dummy HRTF instead of individual HRTFs used in Figure 1. Figure 1 illustrates the fact that a virtual acoustic imaging system designed with an individualized HRTF gives excellent spatial fidelity (data mostly lining up on the straight line joining the lower left corner to the upper right corner of the plot). Figure 2 shows that with non-individualzied (or mismatched) HRTFs spatial fidelity is obtained only for spans of +/- 40 degrees (as evidenced by the departure of the data from the straight line past that angular value).
In other words for sound sources (in the azimuthal plane) located within +/- 40 degrees span straight ahead of a listener, there is no need to bother using an individualized HRTF - any HRTF (e.g. that of a dummy head) would do. This is largely due to the fact sound from sources within a relatively small span head on and in the same horizontal plane as the head, interacts least with the pinnae (the outer part of the ear), which is the most individualized part of the ear’s morphology.
Therefore as long a the speakers are within a span that is not larger than +/- 40 degrees (most hi-fi systems have their speakers nearer the standard +/- 30 degree span) and as long as the speakers are not significantly (say more than 25 degrees) higher or lower from the horizontal plane where the listener’s head is located there should be no need to make an individual BACCH filter for every listener (i.e. all listeners could use the same BACCH filter to listen). Under such conditions the difference between the perceived spatial imaging obtained through an individualized or non-indviulaized BACCH filter, if audible, is subtle. Some BACCH users have reported detecting such differences (which are most likely due to departures from the conditions stated above) and prefer making an individualized BACCH filter for every person (wife, friend, visitor) whom they wish to give the experience of 3D imaging with the highest possible spatial fidelity.
Incidentally, the u-BACCH (where “u” stands for “universal” ) used in the Intro edition of BACCH4Mac are pre-made generic BACCH filters designed using a BRIR obtained from a generic HRTF (that of the standard dummy head Kemar) and assumes the speakers to behave like a theoretical point source.
I hope the above is clear and helpful.
Buddy
Senior Development Engineer @ Theoretica