• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Crossfeed for headphones

My experience too so I gave up on crossfeed as I could not find a satisfactory setting for different recordings and different headphones.
I just take headphones as they are (or with EQ) which works fine for me. I don't think headphones can be a replacement for speakers so see no point in trying to achieve that what will always be a failed effort, regardless how well it may work for others.
 
It's not to replace speakers. Rather, to pursue the experience of music out of your head as you would hear in nature. For this reason I still use the HD800sdr plus EQ and a nice amp.
Before writing it off, try the one I suggested. I only just discovered it and it really opens up the soundfield without affecting the frequency response like traditional crossfeed.

Different albums and genres sound better with different DSPs (or none at all). With Foobar's DSP switcher, you can flick between different presets quickly to find the best sound for that album or track.
I got my son and wife to try it and both said it sounded like music was coming from the room.

Granted it isn't perfect as I find it somewhat diffuse. And it resamples high res tracks to 44kHz.
If your brain is highly attuned to how headphones sound, it will take getting used to the out of head experience.
 
Hi I just predict that when a really working DSP cross feed will be available it will take the world like a storm
Starting from video gamers to home theater listeners
Even in a movie theater all people could enjoy a personalize experience
The test is always to listen for sounds coming from the front
Never heard that for now
Unnatural feeling
 
These pictures demonstrates the difference between speakers and headphone nicely.

Room.jpg


Headphone.jpg


The "problem" indeed is that stereo on a headphone becomes STEREO.
The soundstage is to wide, it is to detailed, to bright, etc, etc.

Tried couple of crossfeeds (Toneboosters , Mathaudio) but somehow never found a satisfying solution.

Listening to headphones for 2 decades now, I simply got used to it.
Now I have the reverse problem, if I listen to my speakers the soundstage is to narrow, it is lacking detail, it sound a bit dull, etc. etc.
I would consider that the room reflections should NOT be one of the aspects we wish to emulate, cos this in all cases makes the sound worse. We make so much effort in live playback rooms, and theatres to absorb reflections, so not sure about the value of emulating this, in a crossfeed simulation.

The main aspects of a crossfeed simulation that are vital, just thinking from 1st principles, could be :

1. Narrowing of the stereo field, cos speakers are not at a 180 degree separation like a headphone.
2. Delay in audio reaching the opposite ear, similar to what happens in real life, with speakers.
3. Attenuation of the audio reaching the opposite ear, due to blocking of the audio, by the head.
4. Filtering of higher frequencies to simulate absorption in air, of higher frequencies.
5. Filtering out higher frequencies, in audio reaching the opposite ear, the absorption of high frequencies, by the head.
6. Shape of ears, pinna gain, ear canal influences on audio. And level - can imagine that an IEM will have more of a variation than an over the ear headphone, in comparison to a speaker.

I am not an audio scientist so not sure how valid item 4 is. Nevertheless, each of these possible influences, has to be modelled, in the crossfeed, and some determination of how much will be applied.

Question is - how relevant is each of these features to an acceptable simulation of speakers, and how much of the feature should be applied? Stereo width, for example, 30 degrees or 45 degrees from the center for each speaker?

There is of course the question of relevance, which of these features delivers the most significant aspect of the simulation of listening to a pair of speakers in a room?, cos for some of them, there will be diminishing returns, and we may not bother to include them.

Is there an opportunity with headphones and IEMs, to actually go beyond the quality of what a pair of speakers delivers to the listening, to improve this? I say this because speakers - stereo pairs are already some kind of simulation, that is not like real life, needing the ear/brain to perform some tricks in our nervous system, to produce a virtual sound stage. A stereo pair is taking advantage of some biological Artificial Intelligence.

Is it possible that headphone/IEM listening has any biological unexplored AI, that can be emphasized with audio processing tools, to amplify the realism, or virtual sense of clarity, soundstage, frequency response, etc, etc. Using the features above or NOT using the features above.

TBC, to make this easier to read.
 
Hi i would like to suggest an old but always great disco test from Sheffield Lab this one below

1730746367666.png


https://redrumrecords.ca/products/sheffield-lab-the-sheffield-xlo-test-burn-in-cd-cd-album
there is a track useful to evaluate the soundstage depth called "Walkaround"
Towards the end of the track the person comes forward talking and walking from the end of the recording room
With speakers the effect can be exciting with the voice standing just between the two speakers at the end of the walk
With headphones the effect is just not existing
You cannot locate the voice in the space in front of you at all
 
Last edited:
I would consider that the room reflections should NOT be one of the aspects we wish to emulate, cos this in all cases makes the sound worse. We make so much effort in live playback rooms, and theatres to absorb reflections, so not sure about the value of emulating this, in a crossfeed simulation.

The main aspects of a crossfeed simulation that are vital, just thinking from 1st principles, could be :

1. Narrowing of the stereo field, cos speakers are not at a 180 degree separation like a headphone.
2. Delay in audio reaching the opposite ear, similar to what happens in real life, with speakers.
3. Attenuation of the audio reaching the opposite ear, due to blocking of the audio, by the head.
4. Filtering of higher frequencies to simulate absorption in air, of higher frequencies.
5. Filtering out higher frequencies, in audio reaching the opposite ear, the absorption of high frequencies, by the head.
6. Shape of ears, pinna gain, ear canal influences on audio. And level - can imagine that an IEM will have more of a variation than an over the ear headphone, in comparison to a speaker.

I am not an audio scientist so not sure how valid item 4 is. Nevertheless, each of these possible influences, has to be modelled, in the crossfeed, and some determination of how much will be applied.

Question is - how relevant is each of these features to an acceptable simulation of speakers, and how much of the feature should be applied? Stereo width, for example, 30 degrees or 45 degrees from the center for each speaker?

There is of course the question of relevance, which of these features delivers the most significant aspect of the simulation of listening to a pair of speakers in a room?, cos for some of them, there will be diminishing returns, and we may not bother to include them.

Is there an opportunity with headphones and IEMs, to actually go beyond the quality of what a pair of speakers delivers to the listening, to improve this? I say this because speakers - stereo pairs are already some kind of simulation, that is not like real life, needing the ear/brain to perform some tricks in our nervous system, to produce a virtual sound stage. A stereo pair is taking advantage of some biological Artificial Intelligence.

Is it possible that headphone/IEM listening has any biological unexplored AI, that can be emphasized with audio processing tools, to amplify the realism, or virtual sense of clarity, soundstage, frequency response, etc, etc. Using the features above or NOT using the features above.

TBC, to make this easier to read.

I have tried out at least 10 different approaches , i.e products or software DSP tools, which deliver some kind of crossfeed/speaker simulation in headphones, and over time :

1. I do not think the delayed signal from one speaker to the opposite ear, is a significant aspect of the emulation. It muddies the sound, in my opinion. It adds a "reverb" to the sound, which can make the end result less distinct. But the ear gets used to this "reverb" and subtracts it

2. Also don't think the addition of room reverb adds any benefit. In a real room, our ear makes every effort to "subtract" that reverb, e.g. you move into a new home, and over time, you hear the sound of that space, less and less. as our hearing removes it from anything heard in that room.

3. I find the narrowing of extreme stereo width, to be one of the more important aspects of any speaker simulation., reducing the extreme left and extreme right presentation, which is NOT how most people listen to speakers or live musicians.
 
Is there any evidence to support this?
A subjective assertion, needs no evidence, as it is not based on any objective evidence. Hint, hint. the word I used "significant", denotes an open ended qualification, i.e a subjective assertion.
 
A subjective assertion, needs no evidence, as it is not based on any objective evidence. Hint, hint. the word I used "significant", denotes an open ended qualification, i.e a subjective assertion.
hmm... okay.
icon_38.gif
icon_39.gif


Regarding the room reverb aspect, I think that’s a fair point. An imprecise crossfeed or virtualization can sometimes create a perceived sense of distance with the addition of head tracking or slight early reflections. Such reverb can be helpful at times, but it is ultimately a matter of preference, and I respect that.

However, if these two elements, ITD and ILD, are not important, then there’s essentially nothing left in crossfeed.
So, I’m not entirely sure what characteristics of crossfeed you’re referring to.
(I personally use BRIR recordings with personalized impulse responses. Recently, however, I’ve been revisiting various communities and papers to create a crossfeed solution that could benefit headphone users. While doing so, I came across this thread.)
 
1735368691225.png


LINK

If anyone is interested(if you use EQAPO), simply download all the txt and wav files from the provided link and load cross.txt via the "Include" option.
(If you use Roon or another DAW(Such as Reaper) with a VST-based crossfeed that you prefer, it might be a bit different from the purpose of what I’m testing. However, try enabling your crossfeed and convolving it with noise.wav from my link. Even in that setup, I expect it will sound noticeably different from what you’re used to.)

As I mentioned earlier, I have been exploring various existing crossfeeds, but I haven’t noticed significant differences when listening to them. (Since the components of crossfeeds are already quite limited.)
However, I recently combined the noise I had been using while implementing upmixing codes with the crossfeed, and it produces quite interesting results. Give it a listen.
The file 비교용원본코드.txt contains the original crossfeed code. It’s just a typical crossfeed-style code. I simply combined it with the noise.
(This crossfeed uses the preset created by Peter, the developer of Peace, as part of the Peace configuration.)



1735369111570.png


I believe that the main issue with most crossfeeds is that the soundstage collapses excessively towards the center, which I think is caused by the blending of information from the opposite channel.

If you look at my diagram, when listening to only the L channel, virtual LL and LR are active (this is how a typical crossfeed works).
However, as the diagram shows, it doesn’t work as we imagine (like the black lines in the diagram). Ideally, you should only hear the left speaker, but it becomes strange when the left speaker's sound is heard in the left ear and the right speaker's sound is heard in the right ear.
Now, thinking in reverse, you could make the sound from the L channel only delayed, but when virtualization happens with IEMs/headphones, it doesn’t work that way. If you duplicate the sound in the L channel, the result sounds even worse.
So, inevitably, the opposite channel has to be mixed in, but as a result, the combined sound from L and R becomes more prominent, which is likely why the soundstage collapses towards the center. (This might be the reason many people dislike crossfeed.)
However, by using noise to decorrelate the information from the left and right channels, the negative effects of crossfeed seem to be significantly mitigated.
I’ve shared these ideas in other communities where I sometimes participate, but people generally didn’t show much interest. So, I’m leaving this here on the ASR thread. Please listen and share your thoughts.
Fortunately, a few people who listened said it felt very different from a typical crossfeed and that it sounded good.
 
Last edited:
Here is my amateur take on this. I am not an acoustics expert, or have I any credentials in the knowledge of human hearing, but have done personal experiments with at least 20 different approaches, over recent years, predominantly free ones, and trial versions of paid apps. It would take me too long to remember all of them. And list them here. Suffice to say, I abandoned and restarted the quest, many times, cos most of the solutions did not sound great.

The question is always, from all the various factors that contribute to our perception of listening to speakers, which is important to emulate, that delivers most of the results that are acceptable.

I've also done some simple analysis with FR analysis plugins, to appreciate the various kinds of frequency response changes that these tools apply. What's missing in my analysis would be more review of the time delay aspects of each solution. I just have not bothered that much with this, not that it is not important, but the tools to analyse this are not as immediately available as simple FR.

So, there are various elements of this illusion, which one is aiming to create on headphones, to simulate what we hear on a speaker.

1. 1st component is clearly the ITD. There has to be some kind of delay, between the arrival of audio at one ear, from the time of arrival at the other ear. To compute this we need :
1.1 Distance of speaker
1.2 Angle of speaker
1.3 Size of head.

2. ILD - Cos the head blocks some of the sound arriving at the other ear, the level is lower.

3. Frequency change at the other ear. Cos the Head absorbs the higher frequencies.

4. The outer ear, ear canal, and shape of head/torso has its own impact on the frequency, which some refer to as the HRTF. This is probably where most of the solutions get things wrong. Pretty hard to define a universal HRTF that works for everyone. My thinking is that especially for IEM's which is what I use more, for head worn listening, the IEM is responsible for implementing much of the "transform" of the outer ear and ear canal/pinna gain, etc, etc. And each device, succeeds or fails based on the tuning. If a tool will implement an HRTF of some choosing, clearly using a normal IEM or Headphone, which adds its own pseudo HRTF component, on top of this, skews the effect. i.e if the speaker simulation software, has its own HRTF, it should assume that the device worn on the head is predominantly a flat frequency transducer. It seems obvious to me, but no one seems to discuss or have identified that this is a double jeopardy. Applying an HRTF, on top of another pseudo HRTF created by the head worn device.

Ideally if an HRTF is applied, best to use a custom one based on one's own measurements. But that is rare, typically only available to well heeled people or academics. IN my opinion, assuming any other HRTF, which is not perfectly matched to ours, is a bit of a gamble., causes more problems than it solves. My best results have been from avoiding the use of any tool which uses an HRTF, since I DO NOT have a custom HRTF measured, and rely on the "pseudo HRTF" created by the frequency response of the head worn device.

I consider that the human brain, adapts, and we have examples. A reverberant room, after a while does not sound quite as reverberant, cos the ear zones out the excess, it adapts. I expect the same kind of thing happens, when we impose something different such as an IEM or Headphone, which introduces something that is not exactly the same as ours. But then compounding this by adding another layer in the audio path, in software, is like pouring salt into a wound.

In my experience, every tool that introduces an HRTF, that I have heard, just sounds odd.

5. Some tools also have EQ correction to optimise the sound of the headphone, but this is a broad stoke attempt, cos of variances between every copy of a headphone model. IT's a lot of effort and cost to measure the FR of each headphone/IEM and also each earpiece, with the pad or eartip in use, and remeasure if any of these accesories change.

6. Some tools then attempt to add room reverberation. This clearly is a personal preference issue, too much and everything sounds terrible, like salt.

How well each tool, or combination of tools, is successful at achieving one or more of these, is the million dollar question. My thinking would be to initially start simple, not try to do everything, and just focus on doing a few of these acceptably.

One may have to use different tools.

OK Hypothesis over. I'll talk about solutions in the next post
 
Applying an HRTF, on top of another pseudo HRTF created by the head worn device.
No... There is a compensation process. It seems like you might be misunderstanding it...

I'm not sure if this is the right place to discuss this in a crossfeed thread, but since it was mentioned, I'll write about it.
Such virtualization naturally equalizes the headphone/IEM's inherent response curve (including raw measurements). On top of that, a specific response is convolved—nothing more.
That equalization needs to be accurate, and even for the same headphones, it differs from person to person...

Additionally, it seems like you are using the terms HRTF, HRIR, and BRIR interchangeably.
To put it very simply, HRTF is the frequency representation of how you (or someone else) hear sound.
Even if you have an HRTF, without reflections, you cannot perceive distance (HRTF changes with distance, but it only represents the direct sound from a specific angle, which is recorded impulse-HRIR), nor can you perceive spatiality.
I’m not sure to what extent the reverb you’re referring to applies, but it seems that you and I are not talking about the same thing.
In reality, we are always listening to reflections, whether in your room or in the natural forest.
The strength of the reflections, the ratio between direct sound and reflections, the time intervals and density, the overall structure, and the reverberation time all influence auditory and spatial impressions. Therefore, it cannot be simply stated that reverb is unnecessary.

Of course, there are differences between personalized HRTFs and generalized HRTFs.
Each person has a different way of hearing, different ears, different physical characteristics, different levels of hearing ability, and even different brain compensation data for interpreting sounds.
However, while these differences can be significant, people also adapt quite well to them.

For example, most modern FPS games incorporate HRTF.
If you doubt its accuracy, it’s natural to question it. But is it so inconsistent that it prevents you from functioning within the game environment? No, it’s not. Most people adapt to such information.

Of course, if you examine it closely, generalized data and personalized, directly measured data are naturally different.
I’ve observed this while calibrating many people's BRIR data, and it indeed varies significantly depending on their physical characteristics.
When I listen to their BRIRs, sometimes I hear sounds as if they are coming from above my head.

And even if I forcibly EQ someone else's BRIR (which includes all HRTFs across the time domain, from direct sound to reflections, as you mentioned), it still doesn’t match what I hear.
While small ITD errors are usually tolerable, ILD errors prevent accurate localization and sometimes even result in spectral distortion. And because our brain is very adept at identifying such subtle issues, it eventually realizes that the sound doesn’t match how we naturally hear in reality, reducing it to a mere sound effect.

However, the important question is not whether it is "wrong" but rather recognizing it as an attempt to apply a common approach to how we hear.

Therefore, while achieving reproduction that matches how we naturally hear in reality is an excellent goal, it cannot be accomplished with a simple crossfeed.
For it to sound like reality, all your HRTFs need to be reflected, from direct sound to early reflections to late reflections, across all angles. When this is done, you can experience sound as you do in reality.

So, if the goal is not to achieve such realism but rather to generate virtual channel information in a binaural state without crosstalk, allowing the brain to perceive a sound that feels more familiar, then while people's characteristics are indeed highly individual, if we evaluate the criteria more broadly, the differences are "not as significant" as one might think.
And these differences tend to manifest as certain "trends."
 
Last edited:
Here is my amateur take on this. I am not an acoustics expert, or have I any credentials in the knowledge of human hearing, but have done personal experiments with at least 20 different approaches, over recent years, predominantly free ones, and trial versions of paid apps. It would take me too long to remember all of them. And list them here. Suffice to say, I abandoned and restarted the quest, many times, cos most of the solutions did not sound great.

The question is always, from all the various factors that contribute to our perception of listening to speakers, which is important to emulate, that delivers most of the results that are acceptable.

I've also done some simple analysis with FR analysis plugins, to appreciate the various kinds of frequency response changes that these tools apply. What's missing in my analysis would be more review of the time delay aspects of each solution. I just have not bothered that much with this, not that it is not important, but the tools to analyse this are not as immediately available as simple FR.

So, there are various elements of this illusion, which one is aiming to create on headphones, to simulate what we hear on a speaker.

1. 1st component is clearly the ITD. There has to be some kind of delay, between the arrival of audio at one ear, from the time of arrival at the other ear. To compute this we need :
1.1 Distance of speaker
1.2 Angle of speaker
1.3 Size of head.

2. ILD - Cos the head blocks some of the sound arriving at the other ear, the level is lower.

3. Frequency change at the other ear. Cos the Head absorbs the higher frequencies.

4. The outer ear, ear canal, and shape of head/torso has its own impact on the frequency, which some refer to as the HRTF. This is probably where most of the solutions get things wrong. Pretty hard to define a universal HRTF that works for everyone. My thinking is that especially for IEM's which is what I use more, for head worn listening, the IEM is responsible for implementing much of the "transform" of the outer ear and ear canal/pinna gain, etc, etc. And each device, succeeds or fails based on the tuning. If a tool will implement an HRTF of some choosing, clearly using a normal IEM or Headphone, which adds its own pseudo HRTF component, on top of this, skews the effect. i.e if the speaker simulation software, has its own HRTF, it should assume that the device worn on the head is predominantly a flat frequency transducer. It seems obvious to me, but no one seems to discuss or have identified that this is a double jeopardy. Applying an HRTF, on top of another pseudo HRTF created by the head worn device.

Ideally if an HRTF is applied, best to use a custom one based on one's own measurements. But that is rare, typically only available to well heeled people or academics. IN my opinion, assuming any other HRTF, which is not perfectly matched to ours, is a bit of a gamble., causes more problems than it solves. My best results have been from avoiding the use of any tool which uses an HRTF, since I DO NOT have a custom HRTF measured, and rely on the "pseudo HRTF" created by the frequency response of the head worn device.

I consider that the human brain, adapts, and we have examples. A reverberant room, after a while does not sound quite as reverberant, cos the ear zones out the excess, it adapts. I expect the same kind of thing happens, when we impose something different such as an IEM or Headphone, which introduces something that is not exactly the same as ours. But then compounding this by adding another layer in the audio path, in software, is like pouring salt into a wound.

In my experience, every tool that introduces an HRTF, that I have heard, just sounds odd.

5. Some tools also have EQ correction to optimise the sound of the headphone, but this is a broad stoke attempt, cos of variances between every copy of a headphone model. IT's a lot of effort and cost to measure the FR of each headphone/IEM and also each earpiece, with the pad or eartip in use, and remeasure if any of these accesories change.

6. Some tools then attempt to add room reverberation. This clearly is a personal preference issue, too much and everything sounds terrible, like salt.

How well each tool, or combination of tools, is successful at achieving one or more of these, is the million dollar question. My thinking would be to initially start simple, not try to do everything, and just focus on doing a few of these acceptably.

One may have to use different tools.

OK Hypothesis over. I'll talk about solutions in the next post

The best solution I have found, which gives a good balance, without messing things up has been to use a binaural approach. To be more exact an ambisonic approach, which combined with a binaural representation, definitely produces acceptable results. And is simple.

OK for those who may not be familiar, ambisonic theory enables the virtualisation of a multiple set of recorded audio inputs in an immersive environment such as a room, with microphones receiving input from many angles, and can do its best to translate such multi-channel recordings to a set of speakers or headphones, adapting to the number of microphones on the input, as well as the number of outputs.

In theory, the higher the number of inputs and outputs, the more realistic the perception of immersive listening.

So in theory for stereo listening,, :

Step 1 - transform the stereo input, to a multi-channel ambisonic audio stream. Lots of encoders are available for this. and many are free. It's a very well defined science, so in my experience, it does not matter much, what encoder you use, the one's I have tried out - null perfectly. I use the IEM encoder. Hint, varying the speaker angle, to taste, is an important optimiser step.

Step 2 - transform the ambisonic audio stream, to a binaural one. Now this where the cookie crumbles, cos everyone does it differently. Some use an HRTF, some do not. I use abDecoder Light.

Audio from STEP 1 - feeds into Step 2.
 
No... There is a compensation process. It seems like you might be misunderstanding it...

I'm not sure if this is the right place to discuss this in a crossfeed thread, but since it was mentioned, I'll write about it.
Such virtualization naturally equalizes the headphone/IEM's inherent response curve (including raw measurements). On top of that, a specific response is convolved—nothing more.
That equalization needs to be accurate, and even for the same headphones, it differs from person to person...

Additionally, it seems like you are using the terms HRTF, HRIR, and BRIR interchangeably.
To put it very simply, HRTF is the frequency representation of how you (or someone else) hear sound.
Even if you have an HRTF, without reflections, you cannot perceive distance (HRTF changes with distance, but it only represents the direct sound from a specific angle, which is recorded impulse-HRIR), nor can you perceive spatiality.
I’m not sure to what extent the reverb you’re referring to applies, but it seems that you and I are not talking about the same thing.
In reality, we are always listening to reflections, whether in your room or in the natural forest.
The strength of the reflections, the ratio between direct sound and reflections, the time intervals and density, the overall structure, and the reverberation time all influence auditory and spatial impressions. Therefore, it cannot be simply stated that reverb is unnecessary.

Of course, there are differences between personalized HRTFs and generalized HRTFs.
Each person has a different way of hearing, different ears, different physical characteristics, different levels of hearing ability, and even different brain compensation data for interpreting sounds.
However, while these differences can be significant, people also adapt quite well to them.

For example, most modern FPS games incorporate HRTF.
If you doubt its accuracy, it’s natural to question it. But is it so inconsistent that it prevents you from functioning within the game environment? No, it’s not. Most people adapt to such information.

Of course, if you examine it closely, generalized data and personalized, directly measured data are naturally different.
I’ve observed this while calibrating many people's BRIR data, and it indeed varies significantly depending on their physical characteristics.
When I listen to their BRIRs, sometimes I hear sounds as if they are coming from above my head.

And even if I forcibly EQ someone else's BRIR (which includes all HRTFs across the time domain, from direct sound to reflections, as you mentioned), it still doesn’t match what I hear.
While small ITD errors are usually tolerable, ILD errors prevent accurate localization and sometimes even result in spectral distortion. And because our brain is very adept at identifying such subtle issues, it eventually realizes that the sound doesn’t match how we naturally hear in reality, reducing it to a mere sound effect.

However, the important question is not whether it is "wrong" but rather recognizing it as an attempt to apply a common approach to how we hear.

Therefore, while achieving reproduction that matches how we naturally hear in reality is an excellent goal, it cannot be accomplished with a simple crossfeed.
For it to sound like reality, all your HRTFs need to be reflected, from direct sound to early reflections to late reflections, across all angles. When this is done, you can experience sound as you do in reality.

So, if the goal is not to achieve such realism but rather to generate virtual channel information in a binaural state without crosstalk, allowing the brain to perceive a sound that feels more familiar, then while people's characteristics are indeed highly individual, if we evaluate the criteria more broadly, the differences are "not as significant" as one might think.
And these differences tend to manifest as certain "trends."
The biggest challenge is when one has these tools with tons of controls, trying to be jack of all trades, but it just leads to a mess, from a usability standpoint, and also a mess, from the results.

I've shared what works, for me. It is an approximation, which from my checks of frequency analysis, avoids any HRTF, and also avoids any dampening of higher frequencies.

Another solution if one wants everything in a single tool, with very few controls, is Airwindows Cans. Like salt, one has to be conservative with the settings. From my own analysis, it does provide a little of everything, delay, frequency damping, no HRTF to the best of my knowledge, and room effect, with very few controls.

This is probably the best all rounder, for anyone who wants to tweak, but avoids providing us with too many tools, to hang ourself with - audio speakingly.

I still prefer the combination of IEM Ambisonic Stereo Encoder, followed by abDecoder. To add, I'm using 3rd order Encoding, in IEM Encoder. This combination has no real artefacts sonically. Stays true to the original, just tightens up the extreme stereo of a head worn playback device.
 
Here is the impact of AirWindows Cans - Green "straight line" is the original, while the wigly line with dampened higher frequencies, and lower level overall is the opposite ear.

1735393460111.png
 
The best solution I have found, which gives a good balance, without messing things up has been to use a binaural approach. To be more exact an ambisonic approach, which combined with a binaural representation, definitely produces acceptable results. And is simple.

OK for those who may not be familiar, ambisonic theory enables the virtualisation of a multiple set of recorded audio inputs in an immersive environment such as a room, with microphones receiving input from many angles, and can do its best to translate such multi-channel recordings to a set of speakers or headphones, adapting to the number of microphones on the input, as well as the number of outputs.

In theory, the higher the number of inputs and outputs, the more realistic the perception of immersive listening.

So in theory for stereo listening,, :

Step 1 - transform the stereo input, to a multi-channel ambisonic audio stream. Lots of encoders are available for this. and many are free. It's a very well defined science, so in my experience, it does not matter much, what encoder you use, the one's I have tried out - null perfectly. I use the IEM encoder. Hint, varying the speaker angle, to taste, is an important optimiser step.

Step 2 - transform the ambisonic audio stream, to a binaural one. Now this where the cookie crumbles, cos everyone does it differently. Some use an HRTF, some do not. I use abDecoder Light.

Audio from STEP 1 - feeds into Step 2.

I respect your attempts and the approach you shared in your follow-up post.
Many people, when experiencing failure (or inconsistency with generalized data) in IEM/headphone virtualization, tend to focus solely on the differences between their own HRTF and someone else’s HRTF.
Of course, that’s a valid concern. However, as I mentioned earlier, a significant factor in BRIR playback lies in the "compensation process for each headphone/IEM."
Surprisingly, even if you have a highly precise HRTF, HRIR, or BRIR measured specifically for your ears, if it’s played back through a headphone/IEM that hasn’t been properly pre-equalized, the result will differ from how you naturally hear.
And yet, our brain continues to function and, despite the data being personalized, it identifies those inconsistencies and recognizes the result as "fake."

Here is the impact of AirWindows Cans - Green "straight line" is the original, while the wigly line with dampened higher frequencies, and lower level overall is the opposite ear.


That spectrum does not perfectly reflect what you are currently hearing.
You mentioned that the straight line mainly represents the ear you are listening with, and the rest of the graph represents the opposite ear...

1735393695919.png


The spectrum you’ve shown only displays a rough response that mimics ILD, similar to traditional crossfeeds.
What you mentioned about bypassing HRTF (although, in reality, it’s more likely that you previously listened to a file that wasn’t pre-processed or equalized for your headphones) is also evident in the current Crossfeed graph.
However, traditional crossfeeds or the spectrum you provided cannot predict how your headphones/IEMs will interact with your face and ears.

Of course, if the actual response of your headphones happens to align well with how you naturally hear sound from a specific angle in reality, then it may work by sheer luck.

1735393865781.png

1735393876597.png


I can show you a graph representing a part of my body.
 
Last edited:
No... There is a compensation process. It seems like you might be misunderstanding it...

I'm not sure if this is the right place to discuss this in a crossfeed thread, but since it was mentioned, I'll write about it.
Such virtualization naturally equalizes the headphone/IEM's inherent response curve (including raw measurements). On top of that, a specific response is convolved—nothing more.
That equalization needs to be accurate, and even for the same headphones, it differs from person to person...

Additionally, it seems like you are using the terms HRTF, HRIR, and BRIR interchangeably.
To put it very simply, HRTF is the frequency representation of how you (or someone else) hear sound.
Even if you have an HRTF, without reflections, you cannot perceive distance (HRTF changes with distance, but it only represents the direct sound from a specific angle, which is recorded impulse-HRIR), nor can you perceive spatiality.
I’m not sure to what extent the reverb you’re referring to applies, but it seems that you and I are not talking about the same thing.
In reality, we are always listening to reflections, whether in your room or in the natural forest.
The strength of the reflections, the ratio between direct sound and reflections, the time intervals and density, the overall structure, and the reverberation time all influence auditory and spatial impressions. Therefore, it cannot be simply stated that reverb is unnecessary.

Of course, there are differences between personalized HRTFs and generalized HRTFs.
Each person has a different way of hearing, different ears, different physical characteristics, different levels of hearing ability, and even different brain compensation data for interpreting sounds.
However, while these differences can be significant, people also adapt quite well to them.

For example, most modern FPS games incorporate HRTF.
If you doubt its accuracy, it’s natural to question it. But is it so inconsistent that it prevents you from functioning within the game environment? No, it’s not. Most people adapt to such information.

Of course, if you examine it closely, generalized data and personalized, directly measured data are naturally different.
I’ve observed this while calibrating many people's BRIR data, and it indeed varies significantly depending on their physical characteristics.
When I listen to their BRIRs, sometimes I hear sounds as if they are coming from above my head.

And even if I forcibly EQ someone else's BRIR (which includes all HRTFs across the time domain, from direct sound to reflections, as you mentioned), it still doesn’t match what I hear.
While small ITD errors are usually tolerable, ILD errors prevent accurate localization and sometimes even result in spectral distortion. And because our brain is very adept at identifying such subtle issues, it eventually realizes that the sound doesn’t match how we naturally hear in reality, reducing it to a mere sound effect.

However, the important question is not whether it is "wrong" but rather recognizing it as an attempt to apply a common approach to how we hear.

Therefore, while achieving reproduction that matches how we naturally hear in reality is an excellent goal, it cannot be accomplished with a simple crossfeed.
For it to sound like reality, all your HRTFs need to be reflected, from direct sound to early reflections to late reflections, across all angles. When this is done, you can experience sound as you do in reality.

So, if the goal is not to achieve such realism but rather to generate virtual channel information in a binaural state without crosstalk, allowing the brain to perceive a sound that feels more familiar, then while people's characteristics are indeed highly individual, if we evaluate the criteria more broadly, the differences are "not as significant" as one might think.
And these differences tend to manifest as certain "trends."
Not wishing to disagree with you, but based on my own listening, I have not found a single HRTF based solution (freeware or paid), that sounds correct. Of course I have not heard all of them, but a few examples I explored, of the well known tools, none of which were satisfactory, to my hearing.

1. Realphones
2 CanOpener
3. Redline 112dB
4. Sonarworks. (looks like all they do is modify frequency - no time delay, and level change on the opposite side)
5. Virtuoso
6. Acustica Sierra
7. Waves NX (there are several of these based on different professional studio mixing rooms, tried them all)

+ a couple of freeware options, names I can't recall immediately.

The challenge with these is they are moving targets, who knows if their newer versions have improved their results, the versions I tested, with exception of Virtuoso, were a while ago - at least a year ago. Virtuoso was within the last few days, so your mileage may vary.


What has worked best for me.

1. IEM Encoder + abDecoder Light.
2. AirWindows Cans
3. Bs2R a modified version of Bs2B, by Liqube, with some presets - Windows only.
4. Bs2B version 1.2, legacy version.
5. Meier Crossfeed - by Case
6. ReaSurround - a Reaper only plugin, - adjusting the X settings narrows the stereo field. Making it more speaker like perceptively.
7. DeeSpeaker by Dotec Audio - supposedly an NS10 virtual speaker emulation.

All of these are free and easy to implement, with either none or very few controls. Some may be Windows only.

I have them all setup in my audio path, and can switch between them.
 
Highly appreciate the theoretic side, but at the end of the day, it's all about the results. In my case, the tools that attempt to do much, sound the worst. And most of these seem to be the commercial ones, which sorry to say, are all just black boxes. Somewhat snake oily, sold based on "reputation" and no one really knows what's going on in them.! With enough of a herd mentality, just like some revered headphones and IEMs and speakers, they develop a cult following. I've also been down that road.

At the end of the day, I can only describe what I am hearing and have provided the list of things that did not work for me, and the list of things that did, which I still use today. Others are welcome to try them and come to their own conclusion. Which is the proof of the pudding, which one works for them.

Special mention to Toneboosters Morphit, which does not do any crossfeed, but has a really good solution for EQ correction for headphones which are in its database. IEM's were not yet a "thing" when I used it, a lot, with my AKG K702 Over ear headphones. But moving to IEM's most of which were not in the database, I had to stop using Morphit.
 
Back
Top Bottom