• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Threads of Binaural virtualization users.

Lion♡

Senior Member
Joined
Oct 17, 2023
Messages
493
Likes
517
Location
South Korea
Across global audio communities—including ASR—there are listeners who use headphones or IEMs to recreate a speaker setup via binaural virtualization. Although you can find bits of information and discussion, it’s scattered and often hard to track down. And even within audio forums, the typical reactions—“Why are you using headphones?” or “That’s not a real speaker”—make it tough for people to share their experiences and connect with one another.

So instead of “Why?”, I’d love to see a thread that keeps the conversation flowing with questions like “How?” and “Which?”, where people can share their information, successful experiences, failures, and more.

Whether you’re using Dummy Head Sofa or a personal Sofa virtualization, the Smyth Realiser, Impulcifer, Genelec Aural ID, BacchHP or any other DSP—public or personalized—we welcome participation and discussion from all binaural virtualization users.

Any topic related to binaural virtualization is welcome.

1746416629079.png
 
Since this is the first post in the thread, let’s keep the opening topic simple so everyone can jump in easily:

Which headphones (or IEMs) do you use for binaural virtualization? (Feel free to mention any models that didn’t work well, too.)

I used to virtualize mostly with Sennheiser cans—especially the HD 800S. The spacious earcups felt amazing. The only downside was that my headphone amp couldn’t drive them properly, which led me to pick up a Topping L70.
Not exactly headphones, but I also really liked using the Koss KSC75 on-ear clips combined with the Parts Express adapter. Sure, you can hear distortion at high volumes and in the bass, but they gave me the most "headphone‑less” sensation of any headphone setup I’ve tried, and they were incredibly comfortable. (Back when I was into the KSC75, I’d wear them over 12 hours a day.)

Nowadays I’ve retired my full‑size cans and use IEMs for binaural virtualization. My go‑to is Apple’s EarPods—they’re always on my desk, they’re super comfortable, and they reproduce everything except the deepest bass cleanly, so they work great for virtualization.
Most of the time I listen on EarPods (for easy, relaxed listening), but sometimes I’ll switch to CIEMs. I have a custom HIDITION Vineto‑B from Korea, which I break out when I’m in the mood to really dive deep into the music.
I also tried Seeaudio IEMs in the past—despite being universal-fit, they were very comfortable. But if I recall correctly, their bass quality wasn’t great.

I’ve tried many other headphones and IEMs beyond those I mentioned, and at first I based my choices on each device’s measured specs. But now, after a few years, I’ve come to realize that the single most important metric for binaural virtualization performance—at least for me—is comfort.
 
Hi, I'm very happy with Koss KSC75 for virtualization, altough specifically in games.
I'm using a PS5 console, which outputs a 5.1 Dolby Digital stream over optical to a Creative GC7 device.

The GC7 has an SXFI virtual surround simulation system, which results in much more reverb and "room feeling" than the previouse Creative's SBX system, or the PS5's bundled Tempest system. It has few "headphone FR correction" presets for few models, but the best results I get with the default preset + some bass boost and Koss KSC75.
There is an big boost to sub-bass, some mid-bass suckout, big boost in the 3-4 kHz range and some treble reduction.
This particular preset makes normal headphones (K371, K702, HD700, with or without EQ) bassy, dark and shouty, but somehow it creates a satisfying experience with the KSC75.

Playing games like Returnal, Horizon Forbidden West or Ghost of Tsushima or Cyberpunk 2077 (which already have good directional audio even with Sony's Tempest system) was quite the experience.

With my PC I like to use HeSuVi. Besides what's avaialble in the core package and the ASH Listening Set, I find an impulse response C-Media Xear Large Room to be very nice for movies. Quite reverby and spacious, but also somewhat "cinematic".

On my phone, I'm sometimes using Rootless JamesDSP software, which has parametric EQ but it also has a convolution-based stereo room simulation.
It's based off of a popular set of impulse responses from JoeBloggs, published originally at head-fi.
It works very well with my 7Hz Zero 2.
 
Hi, I'm very happy with Koss KSC75 for virtualization, altough specifically in games.
I'm using a PS5 console, which outputs a 5.1 Dolby Digital stream over optical to a Creative GC7 device.

The GC7 has an SXFI virtual surround simulation system, which results in much more reverb and "room feeling" than the previouse Creative's SBX system, or the PS5's bundled Tempest system. It has few "headphone FR correction" presets for few models, but the best results I get with the default preset + some bass boost and Koss KSC75.
There is an big boost to sub-bass, some mid-bass suckout, big boost in the 3-4 kHz range and some treble reduction.
This particular preset makes normal headphones (K371, K702, HD700, with or without EQ) bassy, dark and shouty, but somehow it creates a satisfying experience with the KSC75.

Playing games like Returnal, Horizon Forbidden West or Ghost of Tsushima or Cyberpunk 2077 (which already have good directional audio even with Sony's Tempest system) was quite the experience.

With my PC I like to use HeSuVi. Besides what's avaialble in the core package and the ASH Listening Set, I find an impulse response C-Media Xear Large Room to be very nice for movies. Quite reverby and spacious, but also somewhat "cinematic".

On my phone, I'm sometimes using Rootless JamesDSP software, which has parametric EQ but it also has a convolution-based stereo room simulation.
It's based off of a popular set of impulse responses from JoeBloggs, published originally at head-fi.
It works very well with my 7Hz Zero 2.
Thank you for sharing your great experience—I’m glad you’re happy with the KSC75.
I used to joke with friends that if Koss upgraded the KSC75 drivers and released a Parts Express–style model (like the KPH40) with a slightly more premium feel but the same incredible comfort, it would probably be the ultimate headphone.
That’s how outstanding I think Koss’s comfort is.
It looks like you’re using public impulse responses. Have you run into any issues with RootlessJamesDSP? For me, since last year it stopped working in some apps like Netflix—it seemed to be a problem not just with my convolution files but with JamesDSP itself.
 
There are some binaural recordings but of course the selection is limited. Apparently it works reasonably well but if you turn your head the image will turn. And turning your head is a normal part of localizing (real) sounds.

I suspect that different people experience it differently because people experience regular stereo differently and most people don't get a realistic soundstage illusion. Headphone Soundstage Survey

I tried the old Dolby Headphone once with a test DVD and I didn't get the illusion of the sound coming from behind. Some people say they do hear the rear channels coming from behind but Dolby never claimed that it would do that. I haven't tried the Atmos version but I doubt that would "fool me" either. And again, Dolby doesn't claim that it does. Dolby just says it's "immersive".
 
I am using headphone virtualization for regular stereo recordings.
My setup is on a Mac, processing is done in Element. I am using Supperware plugin and headtracker. I do not use the reverb or room features of the Suppervare, just use it for headtracking/scene rotating and HRTF processing. The ambience is generated with a Waves plugin. In my experience it is the most realistic multichannel reverb for real acoustic places.
Regarding externalization I found he on-ear headphones are the best, maybe because they reduce the interference of the outer ear compared to over the ear headphones.
My favorite is the Yamaha HP-1 with some bass boost and EQ.
Nowadays I can compare the sound and the sensation of the virtual headphone setup to my 8 channel ambiophonic speaker system.
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    100.8 KB · Views: 53
Last edited:
I am not very experienced in binaural, only getting to it more. I do own a Realiser though from the kickstarter days.
I am using it for (classical) music.
But I only got around doing a personal BRIR once in my room and it was way to reverberant. So I stuck to the included generic sofa files. This is already not a bad experience and the head tracking is just amazing.
I tried to recreate this on my laptop but this was a bumpy journey as on a Mac there is not so much software available. And so much other things to do.

I found SOFAs included in the Realiser as they are public and "recreated" a "true stereo" crossfeed (without head tracking).
I started using Amoeba Hijack to build my crossfeed but now I use Element, because it can do Atmos much more conveniently, though is not so robust as Hijack.

In the meantime I found dearVR monitor (that plugin is free now) from Neumann for rendering. This works with Atmos (streaming) too, and that can be quite an improvement.
After that I went back to measure my HRTF in nearfield and tweaking it in bass, FR and phase.
With my personal HRTF (measured with 6mm capsules) the result is much more "precise" in location and size of instruments and auditory scene than with the generic versions. This no surprise if I look at my HRTF and compare with what is used by Neumann. HRTFs can be very different.
I pimped the Neumann plugin by rerouting the frontal channels (LCR) through my stereo simulation.

All these experiments are missing the head tracking thing of course. But if I had to decide between head tracking and personal HRTF the decision would probably fall on the personalised version, the sound is that much better. And as I close my eyes when listening with concentration, head moving is not such a big thing for me.
However, my next project would be to get something similar into the Realiser. Maybe I start with only combining

But I can do things in Element that I cannot do with the Realiser. I experimented with the level of crossfeed and with cutting off the direct sound crossfeed altogether. (Like a "true ambiophonics" simulation)
And I merged my HRTF with a flat, direct bass at 300Hz. (Something I copied from the Realiser but that does it at 80/120Hz which will leave quite some unevenness.)
The bass is one of the best things about this solution. No way to get something remotely close with speaker/subs in a room, especially with Atmos, just awesome.

In respect to phones I use IEM (Zero2, Gear, etc) and headphones (LCD2C, HD800s). The IEMs are cheap and very mobile (that was my reason to look for a laptop solution in the first place), the headphones are more comfortable for me. I like the bass of the Audeze, and the HD800s is (as always?) a bit more spacious than other headphones, no matter what is the use case.

There is another interesting thread

EDIT: I forgot. Thanks a lot for creating this thread!

Untitled.png
 
Last edited:
Great posts, I wasn't aware of the Element host for VST or that the DearVR plugins have become free! That's great since they're reasonably good sounding for how little configuration is required
 
There are some binaural recordings but of course the selection is limited. Apparently it works reasonably well but if you turn your head the image will turn. And turning your head is a normal part of localizing (real) sounds.

I suspect that different people experience it differently because people experience regular stereo differently and most people don't get a realistic soundstage illusion. Headphone Soundstage Survey

I tried the old Dolby Headphone once with a test DVD and I didn't get the illusion of the sound coming from behind. Some people say they do hear the rear channels coming from behind but Dolby never claimed that it would do that. I haven't tried the Atmos version but I doubt that would "fool me" either. And again, Dolby doesn't claim that it does. Dolby just says it's "immersive".

Thank you for joining the thread. Yes, what you’re talking about seems to be binaural recordings. I haven’t tried the DVD you mentioned, but the cues people use to distinguish front from back vary from person to person. Every headphone’s own response—and the Hpcf used to EQ it—interacts differently with each individual, so it’s understandable that many feel disappointed by those demos.
I don’t put much stock in binaural tracks myself. Trying to infer precise localization from extreme ILD cues in near-field sounds (like scissors right next to your head) isn’t very meaningful—and it’s hard to internalize given all the unpersonalized variables. Still, if you go in with no expectations and treat a binaural recording merely as an effect—kind of like how people enjoy ASMR on YouTube—it can be an enjoyable experience.
With distant recordings, individual differences decrease, so you can get a sense of spatial depth that regular stereo lacks—but there still aren’t that many of those sources. It also depends on the context. I’m not much of a gamer, but I know public HRTFs have been used in FPS games for quite some time. Accuracy aside, users tend to adapt. Especially when sounds are constantly moving—gunshots or grenades going off nearby—some people want pinpoint precision, while many others just register that it came from roughly that direction and focus on the action.
While the binaural recording approach itself is attractive, excellent, and fascinating, I believe it still needs more time before it can be commercialized.


My setup is on a Mac, processing is done in Element. I am using Supperware plugin and headtracker
From the image you attached, your setup is very clear and will be helpful to anyone interested. Thank you.
I set up the same configuration in multiple programs and switch between them as needed.

1746491394496.png


The EQ APO–based approach is widely used and familiar among the Korean users I interact with, so I also continue to stick with EQ APO.

1746491495033.png


I like Hangloose because everything—from input/output routing onward—is so intuitive.

1746491568136.png


I use Reaper for VST compatibility and whenever I want to test various setups. (But no matter how much I use it, I just can’t get used to Reaper’s UI. :facepalm:)

I do not use the reverb or room features of the Suppervare, just use it for headtracking/scene rotating and HRTF processing. The ambience is generated with a Waves plugin. In my experience it is the most realistic multichannel reverb for real acoustic places.
Regarding externalization I found he on-ear headphones are the best, maybe because they reduce the interference of the outer ear compared to over the ear headphones.
My favorite is the Yamaha HP-1 with some bass boost and EQ.
Nowadays I can compare the sound and the sensation of the virtual headphone setup to my 8 channel ambiophonic speaker system.
I agree with your personal impressions about externalization. That’s exactly why I switched to using IEMs at the start of this thread, and why I’ve been recommending them to other Korean users.
My own hypothesis is that, in virtualization setups that aren’t perfectly accurate or consistent, an over‑ear headphone’s native pinna response (the parts not equalized by the Hpcf, plus fit variations) can actually give you some sense of externalization. Conversely, that same pinna response is always subtly in play—and it can end up hindering externalization. (When I tested this, I even extracted only simple pinna‑reflection impulses—rather than a full BRIR—and listened through IEMs.)
An HP‑1, huh? I’ve always wanted to get my hands on one.
 
Last edited:
I am not very experienced in binaural, only getting to it more. I do own a Realiser though from the kickstarter days.
I am using it for (classical) music.
But I only got around doing a personal BRIR once in my room and it was way to reverberant. So I stuck to the included generic sofa files. This is already not a bad experience and the head tracking is just amazing.
I tried to recreate this on my laptop but this was a bumpy journey as on a Mac there is not so much software available. And so much other things to do.
Welcome to your binaural journey.
I know that many people find the Realiser’s initial cost a burden, but I’m also aware that plenty are really satisfied thanks to its decoding, headtracking, compatibility, and performance.
When you measure your own BRIR in your room and it’s overly reverberant, you can tweak the impulse response yourself—but that’s just a minor part of the story.

In the meantime I found dearVR monitor (that plugin is free now) from Neumann for rendering. This works with Atmos (streaming) too, and that can be quite an improvement.
After that I went back to measure my HRTF in nearfield and tweaking it in bass, FR and phase.
With my personal HRTF (measured with 6mm capsules) the result is much more "precise" in location and size of instruments and auditory scene than with the generic versions. This no surprise if I look at my HRTF and compare with what is used by Neumann. HRTFs can be very different.
I pimped the Neumann plugin by rerouting the frontal channels (LCR) through my stereo simulation.
So far, I’ve calibrated and auditioned hundreds of impulse responses—including my own measurements, dummy‑head captures, and individual physiological responses from people in my country—but the one conclusion that always comes to mind is this: everyone’s response really is different.
Of course, despite these differences, you can adapt to some extent. But as we discussed a few days ago in another thread, adapting means your brain has to work—and when your brain is doing unnecessary work, it ultimately undermines immersion, becoming an unwanted effort to spot the differences from reality.
I’m glad you’re satisfied with your near‑field HRTF. I tried using dearVR when I first started working with BRIR, but for some reason it wasn’t compatible with my setup at the time—though that may have been on my end. So I convolve each channel myself.

All these experiments are missing the head tracking thing of course. But if I had to decide between head tracking and personal HRTF the decision would probably fall on the personalised version, the sound is that much better. And as I close my eyes when listening with concentration, head moving is not such a big thing for me.
However, my next project would be to get something similar into the Realiser. Maybe I start with only combining
I also would choose personalization every time, even if I had to pick again a hundred times. Headtracking definitely helps, and I did consider integrating it into my setup, but it offered me little benefit.
The interesting thing about binaural virtualization is that headphones (or IEMs) effectively become the speakers (plus the space). Speaker headtracking serves to correct and stabilize variations in listener position, whereas headphone (IEM) headtracking does the opposite: it freezes the 3D space and introduces variation.
Putting externalization aside, their purposes are essentially opposite.
However, you can’t ignore its impact. As a simple example, I can’t find the exact measurements now, but I remember observing Apple’s spatial binaural measurements taken at various angles from data I received previously.
Some level of personal scanning is applied and it’s not bad, but the details of the early reflections—their intensity, timing, and ITD/ILD variations—weren’t impressive; they felt like a weakly applied reverb VST, which I think actually undercuts the realism.
Still, many people are satisfied with Apple’s Spatial Audio and Samsung’s spatial audio. It isn’t just the feeling of sound right next to your ears through IEMs or headphones; it gives you a sound image positioned somewhat in front, with elements that fluidly shift as you move your head.
There’s plenty to talk about here, but it seems really difficult to put into words. :rolleyes:

But I can do things in Element that I cannot do with the Realiser. I experimented with the level of crossfeed and with cutting off the direct sound crossfeed altogether. (Like a "true ambiophonics" simulation)
And I merged my HRTF with a flat, direct bass at 300Hz. (Something I copied from the Realiser but that does it at 80/120Hz which will leave quite some unevenness.)
The bass is one of the best things about this solution. No way to get something remotely close with speaker/subs in a room, especially with Atmos, just awesome.

In respect to phones I use IEM (Zero2, Gear, etc) and headphones (LCD2C, HD800s). The IEMs are cheap and very mobile (that was my reason to look for a laptop solution in the first place), the headphones are more comfortable for me. I like the bass of the Audeze, and the HD800s is (as always?) a bit more spacious than other headphones, no matter what is the use case.
That kind of virtual synthesis is something I truly love. (I rely heavily on REW for this work.) You can replace responses whose early reflections occur around 4–5 ms with your own realistic, anechoic response—and compared to the effort and expense of overcoming room interactions with multi‑subwoofer setups or a DBA (Double Bass Array), you can achieve a tighter, cleaner result.
Plus, you’re not tied to mono‑bass routing on a multi‑woofer system—you can reproduce all Atmos channels full‑range down to 20 Hz, which can be a major advantage (though it’s optional).
You can also manipulate everything from the direct‑to‑reflected sound ratio to the shapes of the ITDG and ETC curves, and even synthesize the reflections themselves. I find the real appeal is being able to take theories from various research papers, apply them directly, and actually hear the difference.
(On a side note, one user recorded in a typical small, untreated living room using only a small bookshelf speaker, and when I adjusted his response—without prompting—he said it sounded like a large Atmos cinema. It wasn’t actually recorded in a theater, but the modified response stimulated his perception that way, and that’s how he experiences it.)
It also seems that quite a few people are using Audeze.

There is another interesting thread
I also remember jumping into that thread briefly before. Crossfeed can help a lot of headphone/IEM users, but its simplifications sometimes lead people to say, “Crossfeed (or crosstalk) and speakers just aren’t my thing.” If you’re really lucky and your HPTF (your headphone transfer response when you’re actually wearing them) happens to have nuances similar to your real HRTF, you can get surprisingly convincing externalization with a simple crossfeed plus a touch of early reflections—no complex BRIR required. (In my case, Apple’s EarPods pull this off.) But matches like that are rare, so results vary headphone to headphone and users end up tweaking settings. Ironically, the “ideal” crossfeed and EQ almost always point back toward using HRIR (or BRIR). And even though crossfeed is simply about adding the opposite‑ear channel, there’s far more to consider than you’d think.

EDIT: I forgot. Thanks a lot for creating this thread!
You’re welcome. As I mentioned at the start of the thread, even among binaural virtualization users, conversations can break down based on which DSP each person uses. There really wasn’t anywhere to discuss this before. Thanks for joining the thread.

Great posts, I wasn't aware of the Element host for VST or that the DearVR plugins have become free! That's great since they're reasonably good sounding for how little configuration is required
Oh, it’s become free? I didn’t know that. That’s great info for anyone who needs DearVR.
 
I know that many people find the Realiser’s initial cost a burden, but I’m also aware that plenty are really satisfied thanks to its decoding, headtracking, compatibility, and performance.
When you measure your own BRIR in your room and it’s overly reverberant, you can tweak the impulse response yourself—but that’s just a minor part of the story.
I would never have got a Realiser if it weren't for the Kickstarter when the prize was in a different ballpark.
There are options to tweak the BRIR in the Realiser but they seem quite limited to me. No way to reduce the amount of early reflections as far as I can tell. Or do you have(know tools to get a tweaked BRIR into the Realiser convolver?

Speaker headtracking serves to correct and stabilize variations in listener position, whereas headphone (IEM) headtracking does the opposite: it freezes the 3D space and introduces variation.
This I do not understand. What is "speaker headtracking"?

Head tracking is great (as provided from the Realiser's factory presets) but it involves a lot of measurements that are probably quite cumbersome when I have to try to get around room deficiencies in the process.
(On a side note, one user recorded in a typical small, untreated living room using only a small bookshelf speaker, and when I adjusted his response—without prompting—he said it sounded like a large Atmos cinema. It wasn’t actually recorded in a theater, but the modified response stimulated his perception that way, and that’s how he experiences it.)
That is about my situation. I used Kef R3 to measure my BRIRs, placing them and my head in the center of the room with a bit under 1m distance. As I am listening to classical 99% I am aiming not so much for a "big cinema" but for the acoustics of the recording venue. So I wanted to get as late and as litte early reflections as possible.
So far I were kind of cautious and did not touch the impulse responses too much. But it seems that one can work them more than I did without getting unwanted artifacts.
Looking at the impulse responses that Supperware or dearVR create, one can obviously do wild things and it still kind of works.

I also remember jumping into that thread briefly before. Crossfeed can help a lot of headphone/IEM users, but its simplifications sometimes lead people to say, “Crossfeed (or crosstalk) and speakers just aren’t my thing.”
Crossfeed to me is just a word for all kinds of solutions where sound from a source channel ends up in the ear on the opposite side. That can be crude and simple, or more complex (BRIR in stereo or even multichannel).
And that is mainly what was discussed in the other thread, where I learned first about Supperware and dearVR.

In respect to the experiences with (commercial) binaural recordings, these never worked for me. There is a binaural recording of Tallis' "Spem in Alium" (Suzy Digby and ORA singers) that is ok, but this probably depends on the strong room reflections and the spatialisation for me is not that much different to the stereo version.
With all the other (demo) recordings (Chesky and Barber Shop and such) I never get a frontal localisation, everything is in the back or around the head or similar.
Speaker-Room simulation is different somehow. My guess is, it is due to the room reflections stabilising the spatial perception.
And then the use of a personal HRTF/BRIR is even better. For me it is like bringing a fuzzy flat picture "into focus" and getting depth and contours.

I did have an astonishing experience with binaural recording though, when I recorded with in ear microphones in my own ear canals. Even without proper EQ for the modifications of the FR, when I listened to to the recording through (generic) headphones the result was just stunning. Voices of the persons in the room (family and friends) all of a sudden sounded "right" in a way never experienced by normal recordings. And the spatial impression was convincing in a way hard to believe. The same for outdoor recordings. Listening to a recording made on a park bench I could hear single footsteps passing by with a precision I did not know my ears could provide.
And another surprise came when I switched channels. My ears do not seem to be so different, but with the left recording to the right ear and vice versa, all of a sudden everything was in the back! It still sounded good and with convincing timbre. But all the frontal localisation gone.
That made clear to me how important the/my personal HRTF is.
I have no hope for dummy recordings in big halls to work for me. My idea would be to record in (higher order) ambisonics and then create an individual signal for the user by convolving with his/her own HRIR. But that will not happen I guess.
 
I would never have got a Realiser if it weren't for the Kickstarter when the prize was in a different ballpark.
There are options to tweak the BRIR in the Realiser but they seem quite limited to me. No way to reduce the amount of early reflections as far as I can tell. Or do you have(know tools to get a tweaked BRIR into the Realiser convolver?
Well, I’m not sure—how is convolution implemented in the Realiser? I haven’t used it, so I don’t know. Modifying early reflections requires a lot of care, which is why I use the tool I’m familiar with, REW, to manipulate the impulse responses.

1746511866959.png

1746511884703.png

1746512021720.png

1746512035089.png

I control it like this. (Of course, you can also tweak it to be more realistic or even more extreme.)

This I do not understand. What is "speaker headtracking"?

Head tracking is great (as provided from the Realiser's factory presets) but it involves a lot of measurements that are probably quite cumbersome when I have to try to get around room deficiencies in the process.
Unless you clamp your head in a vise, a speaker’s sound will vary with your listening position (or the angle of your face).
The video below shows part of Some headtracking with CTC in action.



That is about my situation. I used Kef R3 to measure my BRIRs, placing them and my head in the center of the room with a bit under 1m distance. As I am listening to classical 99% I am aiming not so much for a "big cinema" but for the acoustics of the recording venue. So I wanted to get as late and as litte early reflections as possible.
So far I were kind of cautious and did not touch the impulse responses too much. But it seems that one can work them more than I did without getting unwanted artifacts.
Looking at the impulse responses that Supperware or dearVR create, one can obviously do wild things and it still kind of works.

Yes. Although it depends on the quality and characteristics of the room reflections, even in the same space, if you intentionally increase the ITDG or alter the shape of the ETC, the brain interprets it differently. But in a real room, it’s not easy to achieve the ETC you want.
If you reduce early reflections, the linked late reflections will inevitably decrease as well (especially in a small room), and because the speaker and listener are closer together, the overall direct‑to‑reflected sound ratio changes too.
However boldly I experimented, as long as I applied changes according to theoretically sound criteria, I was able to experience certain conditions without unwanted artifacts. Of course, the process is somewhat tedious and cumbersome, so even I have been putting off creating various demos for Korean users—ranging from the presence or absence of early reflections, to delay times, to different reflection shapes on horizontal and vertical planes—for years now. It’s hard to motivate myself because it’s such a hassle.


Crossfeed to me is just a word for all kinds of solutions where sound from a source channel ends up in the ear on the opposite side. That can be crude and simple, or more complex (BRIR in stereo or even multichannel).
And that is mainly what was discussed in the other thread, where I learned first about Supperware and dearVR.
Yes, I agree.
I just find it unfortunate that so many people end up disappointed—claiming “crossfeed (or crosstalk) isn’t for me”—because their current headphone response doesn’t suit them, isn’t properly compensated, and the crossfeed setup itself lacks any individualized rationale or persuasiveness.

In respect to the experiences with (commercial) binaural recordings, these never worked for me. There is a binaural recording of Tallis' "Spem in Alium" (Suzy Digby and ORA singers) that is ok, but this probably depends on the strong room reflections and the spatialisation for me is not that much different to the stereo version.
With all the other (demo) recordings (Chesky and Barber Shop and such) I never get a frontal localisation, everything is in the back or around the head or similar.
I also don’t expect precise localization from binaural recordings unless they were made by me. While using multiple mics and speakers for recording and playback is similar to binaural audio, it’s not the same. Binaural’s advantage is that, if both recording and playback are personalized, you can enjoy full spatial benefits with just two “speakers”—but as I mentioned earlier, that level of commercialization is still a long way off.
When you say you only hear sound “around your head,” were you listening on regular headphones or on ones with a BRIR applied? With standard headphones or IEMs, the pinna cues are bypassed or so dominated by proximity that it tends to internalize easily. If you listened with a BRIR, you’d need crosstalk cancellation—but even without overwhelmingly strong crosstalk on playback, and provided the recording isn’t near‑field binaural, it still feels different from regular stereo.
But if that’s not the case, your experience may be something entirely different, and this is just my speculation. I’m still listening to what you have to say.
Here’s a link to a binaural video you can easily find on YouTube.


Speaker-Room simulation is different somehow. My guess is, it is due to the room reflections stabilising the spatial perception.
And then the use of a personal HRTF/BRIR is even better. For me it is like bringing a fuzzy flat picture "into focus" and getting depth and contours.

Multiple academic papers have also found that when comparing direct listening in an anechoic chamber with playback of HRIRs recorded on-site, the HRIR playback tended to be more internalized.(I’ve tested it myself and found the same—though it doesn’t feel as “inside my head” as with ordinary headphones or IEMs, since there’s still the pinna’s frontal, far‑field response.)
That’s why early reflections are especially important for externalization in BRIRs. Beyond a certain threshold, they can match—or even exceed—our perception of reality.
I completely agree about personalization. Generic HRTF/HRIR/BRIR can have an effect, but to me they felt more like an “effect” and didn’t sound truly realistic. There’s no way around that.
But if someone asked me, “Does your BRIR suit your taste?” I’d respond: “Do speakers and rooms even have a ‘taste’?” Of course, the characteristics of any speaker or room can vary according to personal preference.

did have an astonishing experience with binaural recording though, when I recorded with in ear microphones in my own ear canals. Even without proper EQ for the modifications of the FR, when I listened to to the recording through (generic) headphones the result was just stunning. Voices of the persons in the room (family and friends) all of a sudden sounded "right" in a way never experienced by normal recordings. And the spatial impression was convincing in a way hard to believe. The same for outdoor recordings. Listening to a recording made on a park bench I could hear single footsteps passing by with a precision I did not know my ears could provide.
Absolutely—convincing spatialization and realism are what binaural is all about. It’s been a truly great experience. Good for you.

And another surprise came when I switched channels. My ears do not seem to be so different, but with the left recording to the right ear and vice versa, all of a sudden everything was in the back! It still sounded good and with convincing timbre. But all the frontal localisation gone.
That made clear to me how important the/my personal HRTF is.
This varies from person to person—some people have really drastic individual differences. In my case, the left–right disparity is relatively small.
1746514039144.png


I have no hope for dummy recordings in big halls to work for me. My idea would be to record in (higher order) ambisonics and then create an individual signal for the user by convolving with his/her own HRIR. But that will not happen I guess.
That’s why so many people wrestle with this. But personally—if I may cautiously share what I’ve been thinking since last year—it goes like this:
When I set up a two‑channel speaker system in my small room, am I truly listening to the speakers themselves? And even more so, am I really hearing every interaction happening in that tiny space? By experimenting with impulse‑response manipulation as I mentioned earlier, I’ve come to realize how much gets hidden in the time domain—because in reality it’s nearly impossible to both excite reflections as you wish and keep them fully under control.



 
Last edited:
Modifying early reflections requires a lot of care, which is why I use the tool I’m familiar with, REW, to manipulate the impulse responses.
Yes, REW is my tool of choice too. The Realiser convolves with a DSP chip inside and as far as I can tell it uses proprietary formats. So you can measure with the provided microphones and then there are (quite restricted) options to truncate the IR and to merge with a flat one in bass (80Hz or 120Hz merging point). But that is it, unless there is something I did not find. If only one could open the "PRIR" as a wav modify and load back!

Unless you clamp your head in a vise, a speaker’s sound will vary with your listening position (or the angle of your face).
The video below shows part of Some headtracking with CTC in action.
There seem still to be a misunderstanding, I understand the need for head tracking, but you talked about the difference in "speaker head tracking" and "earphone head tracking". Or did you talk about speaker based cross talk cancelation like in Ambiophonics/Bacch?

This is interesting. If I understand correctly you started with the green IR and tweaked to get to the pink one?
If so, how did you do it? Did you cut the IR into pieces and reassembled after changements, or is there a clever function to do this in one go?

If you reduce early reflections, the linked late reflections will inevitably decrease as well (especially in a small room), and because the speaker and listener are closer together, the overall direct‑to‑reflected sound ratio changes too.
The way I see it, that is not necessarily so. In the time domain one can modify the time slices more or less independently (in theory) and even during recording early reflections can be tweaked without change of power response (later reflections).
With the placement as far away as possible from all walls I arrived at this.
So far the sound is not bad at all for me with this BRIR (L_channel->L_ear).
BRIR_olieb.png

Much more reflections than in the BRIRs you showed above. Interesting.
The BRIR for cross feed (R_channel->L_ear) looks similar and I experimented with attenuating crossfeed and as well with "cutting off" the initial peak (direct sound) completely with a suiting window to reduce the inherent stereo flaws a bit (creating ambiophonics instead of stereo). Still working on it.

When you say you only hear sound “around your head,” were you listening on regular headphones or on ones with a BRIR applied? With standard headphones or IEMs, the pinna cues are bypassed or so dominated by proximity that it tends to internalize easily. If you listened with a BRIR, you’d need crosstalk cancellation—but even without overwhelmingly strong crosstalk on playback, and provided the recording isn’t near‑field binaural, it still feels different from regular stereo.
But if that’s not the case, your experience may be something entirely different, and this is just my speculation. I’m still listening to what you have to say.
Here’s a link to a binaural video you can easily find on YouTube.
I was listening to binaural recordings (most of the time made with Neumann KU100) with headphones and without my BRIR. The idea being to hear with "KU100's ears". The result was for instance for the basketball scene that everything happened in the back of my head. I remember a recording of a jazz band where the players all sounded to me as if they were placed hanging up at the ceiling in the corners. But at least they were kind of frontal.
It never occurred to me to use a BRIR on top of the recording through dummy ears. Seemed like putting ears on ears to me.

I listened to the recording you linked. It is not clear to me where the dummy head is placed, I cannot see it in the video. I only see ORTF stereo mics hanging above the orchestra. So it is not clear what to expect "realistically" in respect to clarity, proximity and spatial differentiation.
The sound is quite nice, more like in a seat (no surprise ;) than what a typical stereo recording tries to achieve. No surprise because a binaural recording from a good seat would have too much reverb for stereo speaker reproduction.
Everything is a compromise, a balance and a "product". My preferred place for a binaural recording would probably not be an actual seat either.
Interestingly there was an improvement when routing it through my BRIR (direct, no additional crossfeed).
Without BRIR the orchestra was rather fuzzy and the saxophone was floating somehow above the orchestra.
With the BRIR active the orchestra got more "body" and everything became clearer (similar to using a personal BRIR in room simulation) but the sax still was a few meters above. How realistic the imaging of the orchestra is, I cannot tell as the reference is not there.

This varies from person to person—some people have really drastic individual differences. In my case, the left–right disparity is relatively small.
It is not as if I have one big and one small ear. ;-), but I do not have such nice castings of my ear canals. Cool stuff! My HRTFs look quite similar and in the room simulation I can use one BRIR for both ears with only moderate change.
Therefore my surprise with this drastic effect of the whole scene switching backwards (not left<->right as expected), but this was for an outdoor recording. Indoors the reflections provide a lot of room stability.
 
Yes, REW is my tool of choice too. The Realiser convolves with a DSP chip inside and as far as I can tell it uses proprietary formats. So you can measure with the provided microphones and then there are (quite restricted) options to truncate the IR and to merge with a flat one in bass (80Hz or 120Hz merging point). But that is it, unless there is something I did not find. If only one could open the "PRIR" as a wav modify and load back!
I searched for the manual on Google and skimmed it for a bit, but all the endless English terms suddenly gave me a headache. It really seems to use its own proprietary format. =(

There seem still to be a misunderstanding, I understand the need for head tracking, but you talked about the difference in "speaker head tracking" and "earphone head tracking". Or did you talk about speaker based cross talk cancelation like in Ambiophonics/Bacch?
It was both.
I’ll admit I see the value in head tracking, but when I think it through, I wonder: with headphones we’re deliberately adding dynamic motion and changes—aside from the slight boost to externalization, is that really necessary? A few years ago I asked another user, “If you move your body wildly, will head tracking correct all of that?” Their answer was that there are definite limits.
So I asked myself: what if, with speakers, head tracking were truly perfect and the imaging remained locked in place? Isn’t that essentially what a headphones/IEM BRIR does? No matter how much you shake your head or even stand on your hands, the sound stays fixed. That was just a passing thought—not a major point.

This is interesting. If I understand correctly you started with the green IR and tweaked to get to the pink one?
If so, how did you do it? Did you cut the IR into pieces and reassembled after changements, or is there a clever function to do this in one go?
Yes. That was probably someone else’s response—I grabbed the example from my folder and captured what I’d posted in the Korean community. And yes, I started with the green IR and tweaked it into the pink one.
Since this example is about cutting down reflections, I’ll keep this brief. In this case, my (or the other person’s) goal was to reduce reflections so much that it’s almost anechoic.
Completely removing reflections is easy—just gate the IR and you’re done. (Of course, that chops off the low end.)
Another approach is the MTW feature in the REW beta: it lets you apply different gating settings to each frequency band.

1746525061935.png

1746525074431.png



Something like this: MTW can be applied very simply. After all, you can correct or even synthesize the low end. Usually you only need about 5 ms—roughly up to 4 ms of the impulse response—so you can export it that way. The problem is that, in reality, even an anechoic chamber won’t decay that fast (since there are no reflections).
The reason I don’t worry much about the low-frequency band is that you can just synthesize it, and even though torso and chest reflections clearly contribute, you can still rely on the pattern of your original response for those as well.

In any case, you can do it easily like this, or experiment with other methods.
It’s like cancelling specific reflections or room modes with something like ART. But it’s surprisingly sensitive—low frequencies aren’t a big issue (and honestly it’s easier to just synthesize those), whereas high frequencies are much more finicky.
In an ideal room where side walls, front wall, back wall, floor, and ceiling are all spaced out at precisely the right time intervals and you can observe them in the impulse response, it’d be fairly easy to control. But in a typical home environment, that’s not the case—most reflections from all around hit you almost simultaneously.

1746525353026.png


So a simpler—but more realistic—approach than MTW is to literally attenuate the reflections’ response. How much you do this depends on the speaker’s directivity, the anechoic‑room conditions, and your goals, but in the example I shared, “realistic” meant leaving only the direct sound.


If you follow the link, you’ll likely find a text file there.

1746525481089.png

1746525492347.png

1746525499027.png


To keep it simple, apply a short MTW to your original IR and export it, then re-import it. That way, all the reflections will be cut off.


1746525539467.png

1746525547234.png


And if you apply the TXT file you downloaded from the link as the calibration curve to the MTW‑processed IR, you’ll see something like that. That’s essentially the target.

1746525608203.png


1746525617079.png

1746525625607.png


And then, on the original response (with no MTW applied), you gate the reflection portion, apply a bit of smoothing, and then EQ it. In the example, I just used REW’s Auto EQ. Since the goal is to kill the reflections and push them below the audible threshold, there’s no need for precision—just make it look reasonably good.

1746525715459.png


And when you combine the EQ’d response with the MTW‑processed IR you align(or A+B), the reflections end up attenuated in a realistic, almost anechoic way.
That’s just the start—you can also tweak individual reflections or even shape the ETC to your liking.
There’s a lot you can do with REW, but since almost no one uses REW for BRIR, it’s mostly just me (and a few Korean users) running repeated experiments and validations against the theories in published acoustics papers.

There’s a lot you can do this way, but most people want a “one‑click solution.” In Korea, almost nobody actually rolls up their sleeves, experiments, and then tries it again—I know of only one or two who do. (I can’t speak for PhD‑level experts, but in the general audio community, they’re virtually nonexistent.) So I’ve spent countless hours studying BRIR, replicating paper methods, and validating them—thousands, even tens of thousands of times—and whenever I hit a snag, there was no one to ask and nowhere to find the answers. That’s why I started this thread: I’m certain there are others like me around the world.

The way I see it, that is not necessarily so. In the time domain one can modify the time slices more or less independently (in theory) and even during recording early reflections can be tweaked without change of power response (later reflections).
With the placement as far away as possible from all walls I arrived at this.
So far the sound is not bad at all for me with this BRIR (L_channel->L_ear)
What I meant was before any post-corretcion. I was referring to treating reflections in the real world through physical absorption.

Much more reflections than in the BRIRs you showed above. Interesting.
The BRIR for cross feed (R_channel->L_ear) looks similar and I experimented with attenuating crossfeed and as well with "cutting off" the initial peak (direct sound) completely with a suiting window to reduce the inherent stereo flaws a bit (creating ambiophonics instead of stereo). Still working on it.
It’s actually quite easy to emulate a simple physical barrier in a BRIR. Muting the opposite‑ear channel entirely might seem intuitive, but in practice it doesn’t work that way and is far too extreme. Here’s the key:
Imagine a 30° angle: the ILD will vary across frequency bands. Adjust your EQ to achieve a uniform attenuation of about –16 dB, but leave everything below roughly 700–800 Hz untouched. You can easily approximate this with a simple EQ. (To check it more easily, it helps to view the response in normalized mode.)
You can apply it just to the direct sound, or include reflections—but in my experience, when you stray from what’s realistically possible or necessary, you can feel that something’s off even in the BRIR. In other words, BRIR post‑processing offers tremendous freedom, but if you indulge in that freedom, you can end up doing things you shouldn’t—things that are impossible in reality.
For example: if no direct sound is played, how can any reflections exist? If you remove the direct‑sound peak in the opposite‑ear channel—i.e. no direct sound was ever played or recorded—how would any reflections be generated?
Of course it’s technically possible to post‑process, synthesize, and create it that way, but it ceases to reflect a realistic acoustic scenario.

I was listening to binaural recordings (most of the time made with Neumann KU100) with headphones and without my BRIR. The idea being to hear with "KU100's ears". The result was for instance for the basketball scene that everything happened in the back of my head. I remember a recording of a jazz band where the players all sounded to me as if they were placed hanging up at the ceiling in the corners. But at least they were kind of frontal.
It never occurred to me to use a BRIR on top of the recording through dummy ears. Seemed like putting ears on ears to me.

I listened to the recording you linked. It is not clear to me where the dummy head is placed, I cannot see it in the video. I only see ORTF stereo mics hanging above the orchestra. So it is not clear what to expect "realistically" in respect to clarity, proximity and spatial differentiation.
The sound is quite nice, more like in a seat (no surprise ;) than what a typical stereo recording tries to achieve. No surprise because a binaural recording from a good seat would have too much reverb for stereo speaker reproduction.
Everything is a compromise, a balance and a "product". My preferred place for a binaural recording would probably not be an actual seat either.
Interestingly there was an improvement when routing it through my BRIR (direct, no additional crossfeed).
Without BRIR the orchestra was rather fuzzy and the saxophone was floating somehow above the orchestra.
With the BRIR active the orchestra got more "body" and everything became clearer (similar to using a personal BRIR in room simulation) but the sax still was a few meters above. How realistic the imaging of the orchestra is, I cannot tell as the reference is not there.

Yes. Depth and spatial elements are what characterize binaural sources or recordings. Of course, you can listen with headphones or IEMs with nothing applied. But as I mentioned earlier, IEMs bypass the Pinna and headphones sit too close, so the sound tends to hover around your head and internalize easily.
Also, the dummy‑head’s response and each individual’s own response will obviously differ, and if there’s no crosstalk and the compensation is mismatched, you could end up hearing it as if it were suspended from the ceiling, as you described.
To listen to binaural through speakers (or HRIR/BRIR), you should ideally reduce or remove crosstalk—but since the ILD in far‑field recordings isn’t that extreme, you can just listen without too much trouble. Most binaural sources include DF compensation, so they’re rich in the ITD and ILD cues we care about, and that’s exactly what you’re hearing.
Of course, the YouTube link I shared might sound great, or it might sound bad, or it might evoke no sensation at all—it’s simply a difference. And yes, it feels closer to direct listening. If, based on that feeling, you recreate the space in a high‑quality, tightly controlled multichannel environment and record it on your body, then play it back through two speakers or headphones/IEMs, you can fully experience that spatial sense with only two channels. I think that’s a difference in type of recording, not in quality.

It is not as if I have one big and one small ear. ;-), but I do not have such nice castings of my ear canals. Cool stuff! My HRTFs look quite similar and in the room simulation I can use one BRIR for both ears with only moderate change.
Therefore my surprise with this drastic effect of the whole scene switching backwards (not left<->right as expected), but this was for an outdoor recording. Indoors the reflections provide a lot of room stability.
In the case of outdoor recordings, I can’t say exactly how that happens, but it’s an intriguing phenomenon. And yes—as you tested—you can use a single BRIR, adjust it for both ears, and still get a proper spatial image.
The human body naturally has asymmetries, and our brains learn from that data. Even so, if your left‑and‑right responses were to match perfectly, you’d still be able to hear directionality. (I’ve seen some people misunderstand this: they think the left‑ear response literally makes sounds come from the left. Of course, you need to account for reflections and their interactions in the recorded space, but occasionally people oversimplify and assume the IR itself “originates” on one side.)
 
Last edited:
I’ll admit I see the value in head tracking, but when I think it through, I wonder: with headphones we’re deliberately adding dynamic motion and changes—aside from the slight boost to externalization, is that really necessary? A few years ago I asked another user, “If you move your body wildly, will head tracking correct all of that?” Their answer was that there are definite limits.
Well, it might not be necessary, but it is more than nice to have, even with its limits that do not really are this important during critical listening.
One (big) advantage on top of the added realism is what j_j mentioned in the Totem Acoustics Rainmaker Speaker thread. Every time you put on the phones and play a recording the brain has to learn the (virtual) acoustics of the situation to a degree. This happens much quicker (and better?) if you can move your head "inside" a stable simulation, it is a bit like "looking around".
Whether that is worth the effort? Everybody has to decide for himself.
Since this example is about cutting down reflections, I’ll keep this brief. In this case, my (or the other person’s) goal was to reduce reflections so much that it’s almost anechoic.
Completely removing reflections is easy—just gate the IR and you’re done. (Of course, that chops off the low end.)
Another approach is the MTW feature in the REW beta: it lets you apply different gating settings to each frequency band.
And then, on the original response (with no MTW applied), you gate the reflection portion, apply a bit of smoothing, and then EQ it. In the example, I just used REW’s Auto EQ. Since the goal is to kill the reflections and push them below the audible threshold, there’s no need for precision—just make it look reasonably good.
Aiming for almost anechoic, I understand.
I tried chopping off the reflections fully too, the result sounded about the same as a corresponding minimum phase EQ. The spatial differences to stereo were vanishing almost completely.
So you do cut and split the IR and remerge after modifications. Thanks you so much for the detailed instructions and introduction to MTW feature, I did not know about.
I understood only half of it and might have to dig deeper when time allows.

What I meant was before any post-corretcion. I was referring to treating reflections in the real world through physical absorption.
I see, for absorption this is true of course. But deflection or diffusion work a bit differently even in the physical world.

You can apply it just to the direct sound, or include reflections—but in my experience, when you stray from what’s realistically possible or necessary, you can feel that something’s off even in the BRIR. In other words, BRIR post‑processing offers tremendous freedom, but if you indulge in that freedom, you can end up doing things you shouldn’t—things that are impossible in reality.
For example: if no direct sound is played, how can any reflections exist? If you remove the direct‑sound peak in the opposite‑ear channel—i.e. no direct sound was ever played or recorded—how would any reflections be generated?
Of course it’s technically possible to post‑process, synthesize, and create it that way, but it ceases to reflect a realistic acoustic scenario.
This exactly the point, the freedom is enormous, but that makes it very easy to screw up.
My idea behind cutting off the direct sound (but keeping reflections) in the opposite ear looks like this (ambiophonics the pedestrian way :)
1746529674183.png

The barrier can be made bigger.
This is a nice physical idea to improve on some of the deficiencies of stereo. No direct crosstalk but crosstalk by reflections. You can place the speakers more in front and reduce the HRTF azimuth effects (but still have the ITD from the recording unaltered).
You (and I) probably do not want to listen like this, but in binaural room simulation it can be done in the BRIRs.
It does not work for all recordings equally well.
If the ILD becomes too high (hard panned tracks), this does not sound good, as it is unnatural if there is zero direct sound on the opposite ear. But with most classical recordings and if the panning is done carefully it works quite well.

Yes. Depth and spatial elements are what characterize binaural sources or recordings. Of course, you can listen with headphones or IEMs with nothing applied. But as I mentioned earlier, IEMs bypass the Pinna and headphones sit too close, so the sound tends to hover around your head and internalize easily.
But in a binaural recording there is already a collection of BRIRs "applied". The ones of the used dummy are included in the recording (including crosstalk). When using IEM it is just that you are hearing through "other ears". And you cannot get rid of these ears, you can only place your own ears (BRIR) on top. I am astonished that this even works halfway, but so it does.
However, the result was not better (though different) than some good acoustical stereo recording. At least a part of those (I assume/guess these are the ones with a dominant stereo pair and good placement) can sound very good and have a good spatial image.
 
Well, it might not be necessary, but it is more than nice to have, even with its limits that do not really are this important during critical listening.
One (big) advantage on top of the added realism is what j_j mentioned in the Totem Acoustics Rainmaker Speaker thread. Every time you put on the phones and play a recording the brain has to learn the (virtual) acoustics of the situation to a degree. This happens much quicker (and better?) if you can move your head "inside" a stable simulation, it is a bit like "looking around".
Whether that is worth the effort? Everybody has to decide for himself.
Yes, I agree. That’s why, even though I see the value in head tracking, I also shared my own perspective. From head tracking to haptic bass, if you chase realism you end up strapping on gear like Iron Man (just kidding), and it risks undermining the original goal of listening with a light body and mind. That’s why I find myself wrestling with this.

Aiming for almost anechoic, I understand.
I tried chopping off the reflections fully too, the result sounded about the same as a corresponding minimum phase EQ. The spatial differences to stereo were vanishing almost completely.
So you do cut and split the IR and remerge after modifications. Thanks you so much for the detailed instructions and introduction to MTW feature, I did not know about.
I understood only half of it and might have to dig deeper when time allows.
Actually, there are quite a few other approaches besides that. Rather than shooting for a truly anechoic result—which most people will never experience, nor know what an ideal ETC should look like—I was brainstorming quick-and-dirty methods that anyone could implement.
I also have IRs with virtually no reflections within 40 ms (essentially free‑field conditions), but when I’m running these tests I usually apply the technique to other people’s responses first, not just my own. Sure, tweaking a pristine, “perfect” IR is a matter of personal satisfaction, but a method that works reliably on IRs measured in less‑than‑ideal conditions? To me, that’s a genuinely useful approach.
And leaving only the direct sound is just the beginning—reflections mark the start of a whole new journey.

I see, for absorption this is true of course. But deflection or diffusion work a bit differently even in the physical world.
Of course, I know it doesn’t work that simply—I was just giving an example. Not all of us have the luxury of moving everything and installing exactly what we want. We have to make do with limited resources, the shape of the space, and fixed speaker positions to get the best possible result.

The barrier can be made bigger.
This is a nice physical idea to improve on some of the deficiencies of stereo. No direct crosstalk but crosstalk by reflections. You can place the speakers more in front and reduce the HRTF azimuth effects (but still have the ITD from the recording unaltered).
You (and I) probably do not want to listen like this, but in binaural room simulation it can be done in the BRIRs.
It does not work for all recordings equally well.
If the ILD becomes too high (hard panned tracks), this does not sound good, as it is unnatural if there is zero direct sound on the opposite ear. But with most classical recordings and if the panning is done carefully it works quite well.
Yes, that’s exactly what I posted in the thread. Each method has its own pros and cons and is worth trying—you can actually set it up more flexibly than a physical barrier, and it’s quite effective. Personally, I incorporate Bacch’s frequency‑dependent regularization concept and implement acoustically transparent crosstalk cancellation in a DIY setup.
But lately, I’ve been more interested in the space itself than in XTC.

But in a binaural recording there is already a collection of BRIRs "applied". The ones of the used dummy are included in the recording (including crosstalk). When using IEM it is just that you are hearing through "other ears". And you cannot get rid of these ears, you can only place your own ears (BRIR) on top. I am astonished that this even works halfway, but so it does.
However, the result was not better (though different) than some good acoustical stereo recording. At least a part of those (I assume/guess these are the ones with a dominant stereo pair and good placement) can sound very good and have a good spatial image.
Yes. Because crosstalk is already inherent in a binaural recording, the playback system must remove or attenuate that crosstalk to hear it correctly. When you say you can’t remove “these ears,” are you referring to the physiological (body’s) response? If you mean someone else’s—or a dummy head’s—response, binaural recordings are generally pre‑equalized using diffuse‑field (DF) compensation.
At close distances, mismatches in ITD and ILD become more pronounced, and we may find those discrepancies unappealing, hearing them as mere effects. With in‑ear monitors (IEMs), because they bypass the pinna, they tend to internalize the sound—as noted earlier in the thread—whereas speakers tend to externalize it well.
So, as I’ve said before, it’s simply a difference in recording method and format; I never meant to imply that one is inherently higher quality than the other.
Modern recordings are really well made these days, and I enjoy listening to them too.
So, if by the “quality” of a recording you mean including all those various binaural and personal compensation factors, then I can’t really answer that. I’m not someone who creates or records audio sources myself. (sad :rolleyes: However, I once joked while chatting with a Korean user that, in the end, when it comes to binaural/BRIR, we’ll have no choice but to learn mixing ourselves and start producing music.)

There are various recording and playback formats—mono, stereo, multichannel, binaural, and so on—and regarding this, Dr. Floyd Toole of ASR often says, “Stereo is a medium lacking in spatiality.”
Below, I’ve included a few links for you to check out.











Also you can check j_j's comment.
 
Last edited:
I am resorting to the binauralization of Atmos 7.1.4 from Apple Music with Binauralizer Studio 2 in Reaper (discussed a bit here).
Starting from the Atmos master and binauralizing it would be better, but alas not feasible with streaming music.
Binauralizer Studio allows you to upload SOFA files and quickly switch between them to correctly determine the subjective preference.
I ended up preferring the dummy head Neumann KU100, and in fact I'm not the only one, it seems to be the statistically favorite.
I am already satisfied so and I did not want to resort to personal HRTF estimation (as possible with some software).
I believe that the path of subjective preference rather than analytical determination of HRTF is still the most valid, because of 3 elements:
- Room divergence factor
- Repeatability of HRTF measurement
- There is no absolute spatial reference for the audio track, it is mastered to sound credibly on a wide variety of systems, from 2.0 to 9.1.6 to headphones, so there is intrinsic variability. Sometimes they aren't even mastered well, so...

On the BRIR part I'm still deepening instead. For some reason APL Virtuoso gets the statistical preference, and I think this has to do with their integrated BRIR.

The binaurlization engine shouldn't make any difference, after all it is a convolver.
I know there are some binauralizers that calculate analytically... there was one in beta from Korean guys, that gets good reviews, but then it seemed to be abandoned... Can't find it now.
 
Last edited:
I am resorting to the binauralization of Atmos 7.1.4 from Apple Music with Binauralizer Studio 2 in Reaper.
Starting from the Atmos master and binauralizing it would be better, but alas not feasible with streaming music.
Binauralizer Studio allows you to upload SOFA files and quickly switch between them to correctly determine the subjective preference.
I ended up preferring the dummy head Neumann KU100, and in fact I'm not the only one, it seems to be the statistically favorite.
I am already satisfied so and I did not want to resort to personal HRTF estimation (as possible with some software).
I believe that the path of subjective preference rather than analytical determination of HRTF is still the most valid, because of 3 elements:
- Room divergence factor
- Repeatability of HRTF measurement
- There is no absolute spatial reference for the audio track

On the BRIR part I'm still deepening instead. For some reason APL Virtuoso gets the statistical preference, and I think this has to do with their integrated BRIR.

The binaurlization engine shouldn't make any difference, after all it is a convolver.
I know there are some binauralizers that calculate analytically... there was one in beta from Korean guys, that gets good reviews, but then it seemed to be abandoned... Can't find it now.
Welcome to this thread. I’ve often seen your username in other DSP and ART discussions as well. As noted at the start of the thread, I respect and listen attentively to every user’s virtualization experiences, whether shared Publicly or Personalization.
And I agree with you. Since most convolution engines perform well, the setup you use will depend on personal preference—what’s most comfortable for you is best. I think the KU100 is great and widely used for various binaural sources, including ASMR. However, when I first started working with BRIR and tried dummy heads like the KU100 or other people’s responses, I didn’t achieve the level of externalization (indistinguishable from reality) I was hoping for, so I continued with personalized measurements. There are a number of complex reasons for that. I clicked one of the software links and saw it’s Impulcifer—the one I use—and Korean users enjoy that too.
 
From head tracking to haptic bass, if you chase realism you end up strapping on gear like Iron Man (just kidding), and it risks undermining the original goal of listening with a light body and mind. That’s why I find myself wrestling with this.
Haha, I like the idea of an Iron Man armour for hifi. But for the whole body bass fanatics this (from pud.com) might be a more practical alternative:
1746537590882.png

Actually, there are quite a few other approaches besides that.
Yes, I understood that. It just explained why the BRIR had so little reflections.
I also have IRs with virtually no reflections within 40 ms (essentially free‑field conditions), but when I’m running these tests I usually apply the technique to other people’s responses first, not just my own.
Interesting. 40ms, that is a long time. Are these from big halls (late reflections) or with no reflections at all?
In my experience a free field HRIR does not work so well. Stereo is a compromise for reproduction in (listening) room, isn't it?
Sure, tweaking a pristine, “perfect” IR is a matter of personal satisfaction, but a method that works reliably on IRs measured in less‑than‑ideal conditions? To me, that’s a genuinely useful approach.
And leaving only the direct sound is just the beginning—reflections mark the start of a whole new journey.
Yes, I fully agree.
The question for me is, how does a BRIR with reflections (see above) look like that is as neutral and restrained as possible, that lets the spatial characteristics and cues from the recording present themselves as clearly as possible?
Personally, I incorporate Bacch’s frequency‑dependent regularization concept and implement acoustically transparent crosstalk cancellation in a DIY setup.
This is with speakers?
Yes. Because crosstalk is already inherent in a binaural recording, the playback system must remove or attenuate that crosstalk to hear it correctly. When you say you can’t remove “these ears,” are you referring to the physiological (body’s) response? If you mean someone else’s—or a dummy head’s—response, binaural recordings are generally pre‑equalized using diffuse‑field (DF) compensation.
What I mean is that all the characteristics of a BRIR (FR signature, ITD and frequency dependent ILD) are incorporated in the recorded signal of a binaural recording with a dummy head. The diffuse field compensation is a rather broad FR correction that (among other things) is a natural way to prevent that the ear gain piles up when listening either through speakers or through headphones (that create the ear gain - Harman curve). In other words it is similar to the EQ I apply after convolving with my BRIR (from ear canal microphones) to get a neutral sounding signal to use with headphones/earphones in the same way as one would use the direct stereo signal.
My idea would not be to delete the crosstalk from the recording as this is a natural part of binaural listening. The correction I think is necessary is to"correct" the HRTF from the dummy's to my own. Not only in a broad sense but with taking the signatures of the pinnae (dummy's and mine) into account. This is hardly possible as the dummy's is not known normally. This correction would be a simple EQ without further crossfeed.

At close distances, mismatches in ITD and ILD become more pronounced, and we may find those discrepancies unappealing, hearing them as mere effects. With in‑ear monitors (IEMs), because they bypass the pinna, they tend to internalize the sound—as noted earlier in the thread—whereas speakers tend to externalize it well.
Well speakers are external sound sources that will be perceived externally in a natural way (unless such trickery as Bacch is used). Somehow I do not get the point of this comparison.
So, as I’ve said before, it’s simply a difference in recording method and format; I never meant to imply that one is inherently higher quality than the other.
Modern recordings are really well made these days, and I enjoy listening to them too.
I agree about many "modern recordings" being quite good, but there are differences and some are not good. And I would prefer a good atmos mix over the corresponding stereo mix (both over virtualization) any time. And a good binaural recording - IF made with my own ears - WILL be better than a good stereo recording (again over virtualisation) in my view (spatiality, auditory envelopment, realism...). I absolutely agree with Toole about stereo. But it will be different too, as the artistic and technical goals are different (not the least being listening over speakers of course). That is a question of preference.
 
Welcome to this thread. I’ve often seen your username in other DSP and ART discussions as well. As noted at the start of the thread, I respect and listen attentively to every user’s virtualization experiences, whether shared Publicly or Personalization.
And I agree with you. Since most convolution engines perform well, the setup you use will depend on personal preference—what’s most comfortable for you is best. I think the KU100 is great and widely used for various binaural sources, including ASMR. However, when I first started working with BRIR and tried dummy heads like the KU100 or other people’s responses, I didn’t achieve the level of externalization (indistinguishable from reality) I was hoping for, so I continued with personalized measurements. There are a number of complex reasons for that. I clicked one of the software links and saw it’s Impulcifer—the one I use—and Korean users enjoy that too.
KU100 integrates diffuse field equalization, so perhaps statistical preference has to do with that.
I think the round robyn test (aka Club fritz) I linked is interesting because it highlights that HRTF measurement uncertainties is well above the limits of perceptual neutrality, therefore many studies that include HRTF measurements must be considered with equal uncertainty.
We cannot say that we have an absolutely representative KU100 HRTFs set for example.
The topic of the Room Divergence Effect in my opinion is also not negligible, because it plays a more relevant role than expected in binauralization.
Although this whole topic is finally related to listening pleasure, I find that the technical side has a certain charm for me, and perhaps it is the part that I am most passionate about and I like most of immersive audio.
 
Back
Top Bottom