• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Threads of Binaural virtualization users.

1747295362515.png


I’d had my eye on the ButtKicker Gamer2 since last year, and I was lucky enough to pick one up used at a great price.

1747295423614.png


Since my legs are short and I use my chair at a very low height, there was no clearance underneath, so I installed it on the backrest instead. And I was quite pleased when listening to the BRIR. When the real and false information, the genuine cues, the spurious cues, and the synthesized info and cues from the impulse combine with that sensation, it really feels boosted. I still need to tweak the settings a bit more, but I’m very satisfied.

1747295573671.png


I read the “Tactile Response” thread on AVSForum and played tones directly in REW, applying EQ as shown in the image. It’s much better. It reproduces (and you can feel) down to 10 Hz very well, and even 8 Hz is possible, but below that it doesn’t work. It’s probably because the Gamer2 is an entry-level model.
Another user in Korea recommended the LFE model, but after seeing videos of couches shaking, I thought that would be overkill for me, so I chose the Gamer2. (In practice, it provides more than enough sensation for me.)

I’m not sure if it’s an issue with my setup, but I noticed about an 80 ms delay, so I adjusted the delay separately to align it with my BRIR. (I don’t know why—when I asked other users in Korea, most didn’t have this issue. Even with nothing applied—just the IEM and the kicker—the delay is definitely noticeable.)

Anyway, This is really enjoyable. I don’t usually like just listening to songs—I prefer processing impulses, synthesizing, and creating sounds—but it’s been a long time since I’ve had this much fun listening to music.
 
I also recommend checking out the ash dataset for eq-apo

ASH Toolset is also worth a try, providing a bunch of options to configure the sound to your liking.

The BRIRs included in hesuvi weren't as convincing to me, but that varies from person to person, of course.
I'm also using my own HRIR measured by impulcifer+ in-ear mics, but without seeing the speakers in front of me, the sound often flips backward.
It's a physiological phenomenon, visual cues are MUCH more involved in spatial hearing than many assume and we can get quite confused, if the HRIR doesn't match the room/ conditions we're actually use for listening.
I've tried a lot of different HRIRs including my own one, creative sxfi, PS5 tempest 3d blabla and would never go back to headphone listening without any binauralisation applied.
For mobile listening I'm mostly using a rooted phone with Viper4android and eq-apo on pc at home also for watching 4k UHDs.
 
Last edited:
I also recommend checking out the ash dataset for eq-apo

ASH Toolset is also worth a try, providing a bunch of options to configure the sound to your liking.

The BRIRs included in hesuvi weren't as convincing to me, but that varies from person to person, of course.
I'm also using my own HRIR measured by impulcifer+ in-ear mics, but without seeing the speakers in front of me, the sound often flips backward.
It's a physiological phenomenon, visual cues are MUCH more involved in spatial hearing than many assume and we can get quite confused, if the HRIR doesn't match the room/ conditions we're actually use for listening.
I've tried a lot of different HRIRs including my own one, creative sxfi, PS5 tempest 3d blabla and would never go back to headphone listening without any binauralisation applied.
For mobile listening I'm mostly using a rooted phone with Viper4android and eq-apo on pc at home also for watching 4k UHDs.
Thanks for sharing your experience. When you say the sound flips backward, do you mean it seems to come from behind? Of course, in an anechoic chamber such confusion can happen depending on the test signal, but it varies by case. The importance of visual cues is clear, but it looks like externalization is lacking due to an HPCF mismatch. Even if you measured your speaker + room and your headphones with in-ear mics, you’ll still need to apply additional EQ on the headphone side.
So when the EQ is fully applied and compensated correctly, you won’t experience front-back confusion even without visual cues. (To be precise, it’s not true front-back confusion but a kind of disorientation caused by the interaction between under-externalization and perception of early reflections.)
I suggest a simple test for users who encounter this: put on headphones, play the BRIR, and shake your head vigorously from side to side. Even without head-tracking, you’ll immediately notice the residues of improperly compensated HPCF—especially around 6000–12000 Hz—clinging to your ears or the headphones.
In other words, when it’s properly compensated, you feel as though you’re wearing nothing; there’s no sensation of sound being stuck on your ears or headphones.
 
but without seeing the speakers in front of me, the sound often flips backward.
It's a physiological phenomenon, visual cues are MUCH more involved in spatial hearing than many assume
Some even recommend to put two pieces of paper at the wall with circles on it to visualize the left and right speakers.

But I question this since numerous scientific listening tests with speakers in rooms have been conducted in which speakers in front were made invisible by means of acoustically transparent curtains in order to hide any clues about the location or type of speaker. For example, Harman did extensive blind testing of different speakers like that. To my knowledge, none of the subjects reported that they heard the speakers behind because they couldn’t see them.
There are exceptions though but I blame this on so-called poor localizers, ie. people who have difficulties to localize real sound sources in some directions.

There is an even stronger argument against it: an important job of the human auditory system is threat detection, and evolution ensures that those people who hear the tiger behind them when it actually approaches from the front reproduce less…
 
Some even recommend to put two pieces of paper at the wall with circles on it to visualize the left and right speakers.

But I question this since numerous scientific listening tests with speakers in rooms have been conducted in which speakers in front were made invisible by means of acoustically transparent curtains in order to hide any clues about the location or type of speaker. For example, Harman did extensive blind testing of different speakers like that. To my knowledge, none of the subjects reported that they heard the speakers behind because they couldn’t see them.
There are exceptions though but I blame this on so-called poor localizers, ie. people who have difficulties to localize real sound sources in some directions.

There is an even stronger argument against it: an important job of the human auditory system is threat detection, and evolution ensures that those people who hear the tiger behind them when it actually approaches from the front reproduce less…
You make a valid point.
As I mentioned above, people who experience this in BRIR are mostly doing so because proper externalization hasn’t occurred—due to issues with the headphone compensation curve—and what they’re actually perceiving is a combination of the amplitude response and ITD/ILD cues, the early reflections, and the brain’s imagination trying to interpret them.

To put it simply, in a diagram:

1747381392902.png


When the HPCF isn’t applied correctly, when it’s not your own HRTF, or for any number of other reasons externalization fails to occur, it ends up being internalized like this. At that point your brain does its best to fill in the gaps through imagination. Of course, since binaural cues are still present, you can perceive a certain degree of spatial localization.

1747381477313.png


But if everything is correctly aligned and calibrated, sounds meant to come from the front will indeed be heard as coming from the front—there’s no reason for them to remain “inside your head.”

1747381526502.png


This applies equally to the rear as it does to the front.

So while visual cues do influence perceived factors like auditory source width (ASW), I don’t believe they’re related to front–back confusion.
Of course, they can help—after all, they provide the brain with extra information to judge by (the brain was already compensating for mismatched cues through its imagination).

So, although it can vary case by case, from helping many users in Korea I’ve found that about 99% of the problems were due to the HPCF—and once that was fixed, proper externalization occurred. (Sometimes people get confused and differentiate BRIR/HRIR from loudspeakers. If virtualization is done correctly, they’re essentially the same. And for externalization, there’s no need for a separate criterion—it should sound just like speakers in a real space.)
Another important reason is that, when sound is bouncing around inside your head and your brain is trying to project it outward through imagination, the sense of space, width, and distance can also become somewhat inaccurate—often perceived as slightly compressed.

It might look like simple inverse compensation, but the point is that HRIR/BRIR is applied by first erasing the response you get when wearing headphones/IEMs and then layering the spatial filters on top of that. In other words, if it isn’t correctly compensated, traces of the headphone’s response remain—effectively “sticking” the sound to your ears or head, which conflicts with the magnitude response and binaural cues of the HRIR/BRIR. The result, as in the diagram I shared, is that the sound ends up perceived inside your head, and the brain then tries to “fill in” the missing external space through the initial reflections.
 
Last edited:
Btw, I cleared yesterday’s EQ and, as I replayed it today, I adjusted the Buttkicker settings.

1747395798019.png


At 10 Hz, I set the amp’s maximum volume to my preferred level, then tweaked the tone as I listened (and felt it with my body) to dial in a flat response.

1747395881773.png

And this is the filter configuration that resulted. Even without applying EQ, it’s interesting—since BRIR alone lacks this physical sensation—but it was somewhat unbalanced, with certain frequencies so overemphasized that it felt like a bouncy ball bouncing around (an unpleasant, dull, and unnatural vibration).
And after EQ adjustment, it sounds incredibly natural. I tested it on everything from the movie Top Gun to regular music, and it delivers that high-SPL subwoofer feel with a balanced blend—smooth yet powerful and perfectly natural within the BRIR.
 
So, although it can vary case by case, from helping many users in Korea I’ve found that about 99% of the problems were due to the HPCF—and once that was fixed, proper externalization occurred. (Sometimes people get confused and differentiate BRIR/HRIR from loudspeakers. If virtualization is done correctly, they’re essentially the same. And for externalization, there’s no need for a separate criterion—it should sound just like speakers in a real space.)
Another important reason is that, when sound is bouncing around inside your head and your brain is trying to project it outward through imagination, the sense of space, width, and distance can also become somewhat inaccurate—often perceived as slightly compressed.

It might look like simple inverse compensation, but the point is that HRIR/BRIR is applied by first erasing the response you get when wearing headphones/IEMs and then layering the spatial filters on top of that. In other words, if it isn’t correctly compensated, traces of the headphone’s response remain—effectively “sticking” the sound to your ears or head, which conflicts with the magnitude response and binaural cues of the HRIR/BRIR. The result, as in the diagram I shared, is that the sound ends up perceived inside your head, and the brain then tries to “fill in” the missing external space through the initial reflections.
May I ask how do you fix the HPCF? Is it by measuring the pinnae + headphone response via in-ear mics? Furthermore, is the "Diffuse Field" target curve good for BRIR simulation in your experience?
I find that when I try to correct my headphones (Samson SR850) to match the DF curve, it just removes a tad more treble response than I'd like (and this headphone is already bright out of the box), so I instead use a high-shelf filter at 6Khz and above to dial in the treble levels through listening to actual music.
 
May I ask how do you fix the HPCF? Is it by measuring the pinnae + headphone response via in-ear mics? Furthermore, is the "Diffuse Field" target curve good for BRIR simulation in your experience?
I find that when I try to correct my headphones (Samson SR850) to match the DF curve, it just removes a tad more treble response than I'd like (and this headphone is already bright out of the box), so I instead use a high-shelf filter at 6Khz and above to dial in the treble levels through listening to actual music.
Even after measuring with in-ear mics, you still need to listen and tweak the EQ by ear. There are many approaches, but I prefer to trust my ears when adjusting EQ—which is why I stopped using in-ear mics altogether, even when I get new IEMs or headphones.
With IEMs, the measurements are reasonably close (to some extent—of course, not perfectly), but headphones are another story: they really vary from person to person.
I may have misunderstood what you meant. Are you trying to match your binaural room response to a diffuse-field (DF) target curve, or are you matching the headphone response you’re using to a DF target? If it’s the latter, you don’t need to worry about DF or any other specific target curves. For BRIR work, your headphones and IEMs should effectively “disappear”—their response must be fully equalized.
Listening and adjusting as you go is great, but don’t be afraid to use more aggressive EQ. Try peaking filters, too. And use whatever test signal feels most comfortable—simple music, noise, sweeps—whatever works best for you.
 
Are you trying to match your binaural room response to a diffuse-field (DF) target curve, or are you matching the headphone response you’re using to a DF target? If it’s the latter, you don’t need to worry about DF or any other specific target curves.
Yes, I use the DF target curve to correct the headphone response.
For BRIR work, your headphones and IEMs should effectively “disappear”—their response must be fully equalized.
Can you tell me more about how you would personally equalize to make the headphones "disappear"?
 
Yes, I use the DF target curve to correct the headphone response.
Let me give a simple example with a diagram.

1747699445178.png

Let’s assume that when you wear a certain pair of headphones, you get the green response curve. Simply inverting that curve gives you the blue one. In doing so, you’ve effectively equalized the headphone’s own response, right?

1747699576038.png

And the equalized response must be in place before you can hear your binaural room response.
Let’s walk through another simple example. This time, imagine the response curve is completely bizarre.

1747699696545.png


This response curve might look like a bizarre, jellyfish-like shape, but it doesn’t matter what the target is—our only goal is to equalize that response.
So, would the headphone responses in the two examples now be the same?
Yes—ideally they’d be virtually identical. If you’ve equalized them properly, the worn IEM/headphone response is flattened, so the curves match almost exactly (aside from any device quirks or leaks from an imperfect fit). At that point, they’re just acting as playback devices for your binaural room impulses.

Can you tell me more about how you would personally equalize to make the headphones "disappear"?
There are lots of different approaches. Some people prefer David Griesinger’s two-tone method, though I personally found it awkward.
What I usually do is just tweak by ear: I listen to the inverted headphone (or IEM) response without any BRIR to establish a basic EQ curve. Then, with the BRIR applied, I fine-tune while listening to music, pink noise, and sine sweeps. Or When I’m feeling lazy, I just throw on the BRIR plus the partially corrected headphone response and listen to music right away. I know how things should sound to me and what a flat response feels like—my brain and body both recognize it as the perfect reference—so I can immediately hear anything that’s off. That’s why sometimes I skip all the complex steps and just EQ it on the spot.
Of course, if you record the headphone response with an in-ear mic, most of it shows up in your HPCF—but some elements get left out (like certain eardrum reflections or your personal hearing characteristics), and that also depends on how you make the measurements.
Or you could extract the consistent differences between standard measurement rigs (GRAS, 5128, etc.) and your own actual HRTF, then use that as a sort of calibration curve. I’ve tried various methods for others, and since the results were almost always the same, I personally prefer to just listen and tweak by ear. It’s best to find the method that feels most comfortable to you.
(And every so often, some users worry that applying too much EQ during the BRIR process will introduce phase distortion. But the fact is, if you haven’t applied a proper, individualized HPCF, you’re already listening to a distorted response. No matter how dramatic your EQ moves are (–10 dB, +10 dB, etc.), if the result matches what you actually hear in reality, you’re not adding distortion—you’re removing it.)

I’m doing my best to understand your situation, but it’s hard to get a clear picture from such a brief message. If you could share details about your room response and headphone response, what software and hardware you’re using, how you measured everything, and so on, we’d be much better able to grasp your concerns.

+++
Also, writing this has made me curious about your situation—seeing that you’re calibrating your headphone response to a DF target.
1. Are you using the binaural room response you recorded at your ears together with your headphone response to perform the virtualization?
2. Or Are you trying to take a response that was made for another user or a dummy head (even though the headphones are the same model) and adjust it to better match what you hear?
3. Or Are you simply listening to a standard binaural recording or video through your headphones, rather than using a personalized virtualization captured from your own ears?
 
Last edited:
On the BRIR part I'm still deepening instead. For some reason APL Virtuoso gets the statistical preference, and I think this has to do with their integrated BRIR.
I want to thank you for this recommendation big time!
I checked the study about the virtualization methods and preferences and gave APL Virtuoso a try.
That is so much better than dearVR, that I know now why dearVR plugins have been abandoned.
Not only the provided rooms are excellent, there are options for powerful tweaking too.
And then there is the possibility to load different and third party HRTF in standard .sofa format. It is a dream.

I went with your "statistical tip" and used a KU100 (cologne version) first, but this is the point of statistics: They do not mean anything for a single situation. This did not cut it for me: all smeared and fuzzy and dull sound.
Then I switched through the provided HRTFs and immediately these were better. I guess I got really lucky and HRTF E is quite good for me. I would say I am at least 50% of the way from "generic" to "full personal".
Great sound!
In the meantime I got a head tracker (Waves NX) too and with all this I have come quite bit forward towards my goal of binaural nirvana.
excellent room situation + reasonabe fit of HRTF + Atmos + head tracking
My next project will be to measure my personal HRTF a s good as possible.
Thanks again.
The point is related to actual measurement capability. Seems we cannot get measurements with uncertainty low enough to avoid perceptual alteration, so Working with absolute HRTFs is not entirely reliable, it is necessary to resort to statistics.
This is why I say that focusing too much on the aspects related to HRTF is not too worth it. As long as we are in the entertainment field it's better an approach that includes the verification of subjective preference.
I read this several times and had it translated too, but I do not understand.
What kind of statistics are you talking about?
The fact that there are measurement errors and deviations does not mean that HRTFs are not useful. But the quality of those lies in the signature/structure and not in a one size fits all result. The apprehension of a fitting HRTF will always be a personal solution, statistics will only get you so far. (But maybe I completely missed what you were saying.)
 
I guess I got really lucky and HRTF E is quite good for me.
Just curious, do you have the phantom center in front of you at eye level?
Are the virtual speakers externalized like real ones, that means they seem to be 2-3 meters away?
 
Just curious, do you have the phantom center in front of you at eye level?
Are the virtual speakers externalized like real ones, that means they seem to be 2-3 meters away?
Yes, for me there is externalisation in front of me. That does even work with the KU100 HRTF, just that the (phantom) sources are cloudy and smeared, and switching to HRTF E makes them much clearer and more defined in particular in size and "contour", it is much more convincing (and better sound).
The distance perception depends to some degree on the recording. With mixed recordings there might be this eerie feeling that a voice that is recorded with the mic centimetres away and electronically "enhanced" is "somehow" localised about 2m in front. That is not convincing in some cases. Sometimes that results in the perception of a ballooned source, sometimes all I hear is a sound cloud without body or physicality.
And the distance of (phantom) sources can be adjusted with tone adjustments. If I tilt down the FR the distance of an orchestra or ensemble grows.
(Just as it happens with speakers.)
I would say that in good recordings one does not hear the speakers as such too obviously. But I f I pay a signal through a single virtual speaker, then yes they give the impression like a speaker in a distance. But sitting in the kitchen it is obvious these virtual speakers are not "here".
As a general trend I would say that recordings with close up miking produce sources about 1.5-3m away in front of me.
Al this is with APL room and rather "dry" parameters (ambience 25% and RT 0.15s)
 
Yes, for me there is externalisation in front of me.
You must be very lucky that the selected HRTF fits well to your own then. I tried Virtuoso and my experience didn’t even come close to that, no matter which HRTF I have tried. Everything appeared far too elevated and just maybe half a meter away.
One more question: is your experience based on the use with head-tracker?
If yes, does it work so well without head-tracker, too?
 
In my experience head tracking is more important than HRTF matching. Changing the response in reaction to small head movements is more convincing than the exact HRTF match.
 
# After writing this, I realized that I’d somewhat broken the conversational style I’d set for the thread (using “How?” instead of “Why?”), so I’m correcting it. I apologize.
I’ll keep my thoughts brief. When the HRTF doesn’t match, the brain tries to reconcile the conflicting cues as convincingly as possible. In this context, head-tracking can help enhance each person’s illusion.
But personally, it feels like asking, “Which came first, the chicken or the egg?” As for considering something else based on a state where clues and conditions don’t align, my reaction is just, “Hmm…”
 
Last edited:
In my experience head tracking is more important than HRTF matching.
Just curious, have you experienced binaural rendering with proper HRTF personalization such as Smyth Realizer or Impulcifer already?
 
Not those, but tried a bunch of less accurate HRTF personalized solutions and currently working on my mesh2HRTF project.
Which is more important - head movement or proper HRTF matching: just flip the yaw value to negative on a head tracker, and suddenly the whole stage moves to the back, regardless of non matching HRTF clues.
 
Not those, but tried a bunch of less accurate HRTF personalized solutions
In this case I agree with Lion and recommend that you try those solutions before making a decision. I’m very sure that you will be surprised how well that works without head-tracking.

just flip the yaw value to negative on a head tracker, and suddenly the whole stage moves to the back
Correct, and this proves that head-tracking overrules HRTF and in fact is used to compensate for a lack of HRTF personalization. But it comes at the price that
- you need to move your head permanently,
- distance perception is flawed and in turn
- the feeling of being immersed in sound is limited.

If head-tracking was as important as you think, why can the human auditory system locate sound sources in all directions accurately with the head being fixed?
I know one of Europe’s famous immersive audio sound engineers and he uses the Smyth Realizer for on-site recording sessions. He told me that he always uses it without the head-tracker that comes with it because he doesn’t see the need for it.
 
Not those, but tried a bunch of less accurate HRTF personalized solutions and currently working on my mesh2HRTF project.
Which is more important - head movement or proper HRTF matching: just flip the yaw value to negative on a head tracker, and suddenly the whole stage moves to the back, regardless of non matching HRTF clues.
What are means some less accurate personalized HRTF solutions?
What is the criterion for “accurate” here?

Let me give one example.
Imagine doing a rough recording with the microphone placed shallowly—like wearing AirPods or EarPods—instead of inserting it deeply into the ear canal (or barely inserting it at all). In this case, the speaker + room and the headphone playback share the same overarching curve characteristics. Yes, of course this is inherently an unstable measurement setup: several elements—such as certain eardrum resonances and ear-canal responses—are unstable or missing. Yet, even so, you still capture the pinna’s response and each individual’s ILD and ITD.
These cues are like an individual’s fingerprint.

Suppose we have HRTFs obtained in various ways—those measured with such an unstable setup, those measured at a typical level, those measured with extreme precision, or those estimated by different methods. What would happen if the headphone (or IEM) compensation curve (HPCF) we use were correctly applied to each of these? Ideally, regardless of the measurement method, they would sound nearly identical. But conversely, even though the HRTF is truly based on my own anatomy, if the HPCF is incorrect, then no matter how precise the measurement was, it simply won’t function properly.

As I mentioned a few days ago, let’s now assume this isn’t your own HRTF. HRTF and HPCF aren’t separate—they’re intertwined, with response tendencies and patterns as unique as fingerprints. But if it’s not your own HRTF, it’s already mismatched. A slight error in ITD or ILD we can adapt to, and it usually isn’t a big problem—after all, in real life turning your head causes many changes. Yet an offset is one thing; having a completely different pattern is another.
In that case, our brain, faced with this pattern misalignment, tries to fill in the gaps—using inherent early reflections and other cues—to arrive at the best possible percept. And if the audio source (say, in a game or a movie) is dynamic, the brain is even more easily fooled. Add head tracking on top of that, and the illusion strengthens. (In games, most players will immediately return fire when a grenade goes off beside them or a shot sounds about 45° in front, rather than pausing to evaluate the accuracy of the spatial cues—though personally, I’m not a gamer.)
But conversely, with a static sound source, we’re not so easily deceived, because we continuously sense the mismatch in the pattern.

So it will vary depending on the situation and clearly has an effect. But if you get off to a better start, you gain an advantage not by making a better illusion of an illusion, but by adding realism to something that is already real. (Head-tracking “assists” externalization in BRIR—it doesn’t mean that without head-tracking there is no externalization at all.)

So let’s return to the beginning of the discussion: is head-tracking more important than HRTF?
-> I’d summarize my view like this: they’re both important. I’d gently suggest that pitting one against the other and choosing only one may be an inherently flawed comparison.


Correct, and this proves that head-tracking overrules HRTF and in fact is used to compensate for a lack of HRTF personalization. But it comes at the price that
- you need to move your head permanently,
- distance perception is flawed and in turn
- the feeling of being immersed in sound is limited.

If head-tracking was as important as you think, why can the human auditory system locate sound sources in all directions accurately with the head being fixed?
I know one of Europe’s famous immersive audio sound engineers and he uses the Smyth Realizer for on-site recording sessions. He told me that he always uses it without the head-tracker that comes with it because he doesn’t see the need for it.
I agree.
 
Last edited:
Back
Top Bottom