Applying an HRTF, on top of another pseudo HRTF created by the head worn device.
No... There is a compensation process. It seems like you might be misunderstanding it...
I'm not sure if this is the right place to discuss this in a crossfeed thread, but since it was mentioned, I'll write about it.
Such virtualization naturally equalizes the headphone/IEM's inherent response curve (including raw measurements). On top of that, a specific response is convolved—nothing more.
That equalization needs to be accurate, and even for the same headphones, it differs from person to person...
Additionally, it seems like you are using the terms HRTF, HRIR, and BRIR interchangeably.
To put it very simply, HRTF is the frequency representation of how you (or someone else) hear sound.
Even if you have an HRTF, without reflections, you cannot perceive distance (HRTF changes with distance, but it only represents the direct sound from a specific angle, which is recorded impulse-HRIR), nor can you perceive spatiality.
I’m not sure to what extent the reverb you’re referring to applies, but it seems that you and I are not talking about the same thing.
In reality, we are always listening to reflections, whether in your room or in the natural forest.
The strength of the reflections, the ratio between direct sound and reflections, the time intervals and density, the overall structure, and the reverberation time all influence auditory and spatial impressions. Therefore, it cannot be simply stated that reverb is unnecessary.
Of course, there are differences between personalized HRTFs and generalized HRTFs.
Each person has a different way of hearing, different ears, different physical characteristics, different levels of hearing ability, and even different brain compensation data for interpreting sounds.
However, while these differences can be significant, people also adapt quite well to them.
For example, most modern FPS games incorporate HRTF.
If you doubt its accuracy, it’s natural to question it. But is it so inconsistent that it prevents you from functioning within the game environment? No, it’s not. Most people adapt to such information.
Of course, if you examine it closely, generalized data and personalized, directly measured data are naturally different.
I’ve observed this while calibrating many people's BRIR data, and it indeed varies significantly depending on their physical characteristics.
When I listen to their BRIRs, sometimes I hear sounds as if they are coming from above my head.
And even if I forcibly EQ someone else's BRIR (which includes all HRTFs across the time domain, from direct sound to reflections, as you mentioned), it still doesn’t match what I hear.
While small ITD errors are usually tolerable, ILD errors prevent accurate localization and sometimes even result in spectral distortion. And because our brain is very adept at identifying such subtle issues, it eventually realizes that the sound doesn’t match how we naturally hear in reality, reducing it to a mere sound effect.
However, the important question is not whether it is "wrong" but rather recognizing it as an attempt to apply a common approach to how we hear.
Therefore, while achieving reproduction that matches how we naturally hear in reality is an excellent goal, it cannot be accomplished with a simple crossfeed.
For it to sound like reality, all your HRTFs need to be reflected, from direct sound to early reflections to late reflections, across all angles. When this is done, you can experience sound as you do in reality.
So, if the goal is not to achieve such realism but rather to generate virtual channel information in a binaural state without crosstalk, allowing the brain to perceive a sound that feels more familiar, then while people's characteristics are indeed highly individual, if we evaluate the criteria more broadly, the differences are "not as significant" as one might think.
And these differences tend to manifest as certain "trends."