• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Headphones with APL Virtuoso vs Studio Monitors on an untreated room

Volutrik

Member
Joined
Apr 2, 2023
Messages
70
Likes
33
Hi all, I'm wondering what you guys think would perform the best for mixing/production:

AKG K371 or SHP9500 + Virtuoso, essentially being a virtual mixing environment

OR

Something like the Adam Audio D3V or Edifier MR3 on an untreated bedroom

Curious to know what your takes are on whether one option is better than the other :)
 
Neither is really ideal and both have significant strengths and weaknesses.

The headphones are going to have more and cleaner bass (no modes), lower distortion at relatively high SPL, but the stereo and tonality will only resemble that of real monitors to an extent. You can't know for sure whether the response you're hearing is really flat or not.

The D3V is a decent monitor as far as it goes and should have pretty legitimately neutral response, but will be really limited in bass and spl, and your ability to pick out fine details will be limited by reflected sound in the room.

If your goal is to produce good mixes, either will be better than nothing, but you'll have to make an extra strong effort to use good references and check mixes on multiple systems aside from your own.
 
Pros almost always recommend not using headphones as your main monitors. But small monitors with no subwoofer aren't ideal either.

You might want to use both. You can use headphones to check/balance the bass.... The worst room problems tend to be in the bass range.

I suspect any headphone "enhancements" other than "corrective" EQ will make your job harder.

Here is my collection so excerpts regarding mixing & mastering on headphones:

This is from Recording Magazine "Readers Submissions" where readers send-in their recordings for evaluation:
As those of you who have followed this column for any length of time can attest, headphone mixing is one of the big no-no's around these parts. In our humble opinion, headphone mixes do not translate well in the real world, period, end of story. Other than checking for balance issues and the occasional hunting down of little details, they are tools best left for the tracking process.

And this is from a mixing engineer, also Recording Magazine:
Can I mix on headphones?

No. But in all seriousness, headphones can be a secret weapon and it really doesn’t matter what they sound like…

Over time, after constantly listening back to my work from different studios on those headphones I really started to learn them. They became sort of a compass. Wherever I went… It became a pattern for me to reference these headphones to see if what I was hearing was “right”…

I learned them, I knew them, I trusted them. It didn’t matter whether or not I loved them…

So, can you mix on headphones? Probably. I just think you really need to put some time into learning them first…

This is from Floyd Tool's book, Sound Reproduction
Headphones entertain masses of people. Professionals occasionally mix on them when conditions demand it. Both rely on some connection to sound reproduction, that is, loudspeakers in rooms, because that's how stereo is intended to be heard. Stereo recordings are mixed on loudspeakers.

This is from Ethan Winer's book, The Audio Expert:

(Headphones) are not usually recommended for mixing music because you can hear everything too clearly. This risks making important elements such as the lead vocal too soft in the mix. Mixes made with (headphones) also tend to get too little reverb, because we hear reverb more clearly when music is played directly into your ears than when it's added to natural room ambience...

...It is good practice to verify edits using (headphones) to hear very soft details such as clicks or part of something importing being cut off early.
 
The major problem with binaural speaker virtualization is that you need to measure your HRTF. Using a generic HRTF leaves the result up to chance.

I am exploring this right now so I have no practical advice to give. I've mixed on speakers and headphones, and I prefer speakers. But headphone audio and speaker virtualization are so different I would almost consider them to be in different categories.

Edit: It's also important to determine the HpTF (headphone transfer function). In ear microphones are required for this.
 
The major problem with binaural speaker virtualization is that you need to measure your HRTF. Using a generic HRTF leaves the result up to chance.

I am exploring this right now so I have no practical advice to give. I've mixed on speakers and headphones, and I prefer speakers. But headphone audio and speaker virtualization are so different I would almost consider them to be in different categories.

Edit: It's also important to determine the HpTF (headphone transfer function). In ear microphones are required for this.
I actually have measured my own HRTF using Mesh2HRTF. Idk how accurate it is though, because I've done it twice and got similar results. I emphasize the word 'similar' because, even though they're close, I can still feel some difference between them.

Also, I saw an interview from Hyunkook Lee himself and he said that, based on his research, the actual HRTF profile doesn't matter much as long as headtracking is being utilized. There are other aspects to how convincing the experience can be, such as the environment, which can trick the user more easily if there are actual speakers in front of them. Imagine you're listening to a sound that's supposed to mimick a studio when you're outdoors, for example, not really convincing.

Oh, also saw him talking about IEMs:
In principle IEM can work as well as over ear headphones. It is more about how well the frequency response at the ear drum follows a target response. The HPC EQ or any other correction filter you could find from the AutoEQ library could help make the response more suitable for binaural listening.

Tonality of the HRTF itself is a whole subject of its own, too. While we can get used to a certain HRTF profile, the tonality won't be as accurate if it's not of your own HRTF.

It's a pretty interesting topic, and it has so much new information that, at least for me, makes me have this kind of question... It becomes even more complex with the fact that I've never owned nor listened to studio monitors in real life, so I find myself asking "is this really accurate or would studio monitors in an untreated room be better?"
 
Last edited:
Also, I saw an interview from Hyunkook Lee himself and he said that, based on his research, the actual HRTF profile doesn't matter much as long as headtracking is being utilized. There are other aspects to how convincing the experience can be, such as the environment, which can trick the user more easily if there are actual speakers in front of them. Imagine you're listening to a sound that's supposed to mimick a studio when you're outdoors, for example, not really convincing.

Tonality is a whole subject of its own, too. While we can get used to a certain HRTF profile, the tonality won't be as accurate if it's not your own HRTF
I've seen research that contradicts that: https://www.researchgate.net/public...Id=5d0e96b5458515c11cf0e68a&showFulltext=true

The important part of using your personal HRTF is localization accuracy. Users apparently get used to using foreign or generic HRTFs but localization suffers.

Like I said I'm in the middle of setting this up for myself so I can't comment on how important believability of the auditory scene or head tracking is right now.
 
I've seen research that contradicts that: https://www.researchgate.net/public...Id=5d0e96b5458515c11cf0e68a&showFulltext=true

The important part of using your personal HRTF is localization accuracy. Users apparently get used to using foreign or generic HRTFs but localization suffers.

Like I said I'm in the middle of setting this up for myself so I can't comment on how important believability of the auditory scene or head tracking is right now.
Curious to know what your experience will be with this :D Let's keep in touch
 
while the concerns about headphone mixing are true, I would say it is a far better option than mixing in a bad room.
At least you get a clean signal.
the rest you can get used to IF you listen to a LOT of reference material before AND during the mixing process.
 
This is from Recording Magazine "Readers Submissions" where readers send-in their recordings for evaluation:

And this is from a mixing engineer, also Recording Magazine:

This is from Floyd Tool's book, Sound Reproduction

This is from Ethan Winer's book, The Audio Expert:
But I would be surprised if any of these comments are in relation to binaural virtualization. As @Curvature mentioned, these setups are in different leagues.
Also, I saw an interview from Hyunkook Lee himself and he said that, based on his research, the actual HRTF profile doesn't matter much as long as headtracking is being utilized. There are other aspects to how convincing the experience can be, such as the environment, which can trick the user more easily if there are actual speakers in front of them.
This is certainly a good point but this relates to a "convincing" experience in relation to externalisation and localisation ("Being there"). Even with a "wrong" HRTF this can work well. But the problem of a natural balance in tonality and timbre is a different thing. Our hearing can (and will) adjust for some errors in this. But this is not what you want for mixing.
Tonality of the HRTF itself is a whole subject of its own, too. While we can get used to a certain HRTF profile, the tonality won't be as accurate if it's not of your own HRTF.
That!
I actually have measured my own HRTF using Mesh2HRTF. Idk how accurate it is though, because I've done it twice and got similar results. I emphasize the word 'similar' because, even though they're close, I can still feel some difference between them.
With speakers you will get different results too, either from different rooms, different positions or different monitors (and in an untreated rooms these differences will be significant). It is just a lot more difficult to compare in these cases.

EDIT:
I actually have measured my own HRTF using Mesh2HRTF.
Would you mind to el your workflow? Are you satisfied with the result?
 
Last edited:
I did several headscans and ran mes2HRTF in varying resolutions and evaluation grids recently and these are my personal experiences.

Mesh2HRTF has excellent tutorials. My biggest problem was that I tried to run it on a mac, and haven’t done programming for a long-long time. Git and cMake was bigger issue than running mesh2HRTF. For windows there are prebuilt binaries.

The basic workflow is the following:
1. Scanning your head with an iphone. Do it once, do it right, you don’t have to splice in high res ear scans. It will make no difference, just makes it complicated.
2. Use meshmixer to prepare the modell. That will solve most of the problems which are much more complicated in other apps.
3. Position the head in the coordinate system in blender.
4. Run hrtf_mesh_grading to the desired resolution
5. Export left and right models created in the previous step from blender with the hrtf export script. It will create 2 work directories for the 2 ears
6. Run NumCalc on each
7. Run finalize_hrtf_simulation

The first 5 steps can be done in 20 min after a couple of tries. The NumCalc can run from 4 hours to a week depending on resolution and computer. Last step is 30 sec

- Personalized hrtf is not the magic bullet, just one part of the puzzle.

- hrtf simulation is quite consistent across headscans and different resolution models up to ~8kHz. Above that it has a general trend with wild peaks and valeys which in real life I think dependent on air temperature, your hairdo, clothing, how much your head was deformed during sleeping etc. It should change when you turn your head toward the source, but exact details does not seem to be important.

- I made some measurements with loudspeakers in a room to compare with the simulated hrtf, and they are quite consistent, within the limits of my measurement rig and room.

-Personal hrtf fixes tonality and response to big head movements mostly. Also helps with front/rear confusion and elevation localization but they are not as important for me, for binauralizing stereo recordings. My biggest differences compared to the generic hrtf is stronger rise after 800Hz and broader peak at 4kHz at the sides. These have a significant effect for me in the punchiness and loudness of the sound compared to the generic hrtf.

- Externalization is much more dependendent on headtracking (or motion), than accurate hrtf. No wonder all the generic headphone binauralization demos are full of motion and sounds swooshing around you. It is much easier to localize or externalize a sound source when it is moving around you or you can move your head.

-Headtracking does not mean you always have to move your head. There is a lock in effect, when you are confused where the sound is coming from, a proper change in binauralization in response to a small head movement can lock in the externalized sound, even if you don’t move your head afterwards. But even when you think you are still, headtracker shows several degree of small head movements to which the binauralizer should respond.

- You need a virtual room to externalize. Just adding your or a generic hrtf is not enoug, that is already done in your headphone if it tries to comply with a Harman or a diffuse curve, but they still sound inside your head. Every binauralizer app adds his own “room” with more or less success. You can pick different BRIRs, your room, a studio, a performance venue. It is up to taste.

- One of the biggest advantage of proper binauralization is the party effect when you can pick up individual conversations from a noisy background. It makes stereo records with big mushy sound in stereo reproduction sounding well defined with perfectly followable separate melodic lines in binaural.

-Picking different rooms change the perception of the performance much more than anything mentioned above. Changing the venue of the binauralization can make the same performance slow or majestic or rushed or well payed or muddy.

-Headphone EQ. Sofa files are producing flat response at plugged ear canal. You have to eq your headphone flat at your plugged ear canal. You need an in ear mic for this. In ear measurements can be wildly different from published headphone measurements, you can not rely on them to make your EQ. Also, most headphones can not be EQd to flat. You can preapply your calculated inverse diffuse field EQ to the sofa file and eq the headphone to your diffuse field curve, which is in trends usually not that far fom a regular headphone response.

Binauralization can create believable replicas of real acoustic spaces, down to your living room. It is not easy and not as straighforward as they are selling it, mostly because you need personalized measurements, (but that looks like a standard requirement now for loudspeakers too.) and with the ability picking a BRIR you will realize that there is no one HIFI truth.

You can emulate speakers in a room with binaural but why would you want that.
 
Headtracking does not mean you always have to move your head. There is a lock in effect, when you are confused where the sound is coming from, a proper change in binauralization in response to a small head movement can lock in the externalized sound, even if you don’t move your head afterwards. But even when you think you are still, headtracker shows several degree of small head movements to which the binauralizer should respond.
What happens if you keep your head completely still? For example using a clamp or headrest or something similar?
 
What happens if you keep your head completely still? For example using a clamp or headrest or something similar?
I’ve tried it. I feel I slowly start to loose the sense of space around me. It is getting more frontal focused. The directions are still clear, and it is still definitely out of head, but the distance of the musican is loosing definition.
In the other direction, when I start to play without headtracking, it takes a couple seconds of mental effort to build up the space around me. External distractions can ruin the effect, and I have to build it up again. Turn on head tracking and it suddenly becomes more real, no questions asked.
The more auditory clues are trying to be correct, the easier to believe the scene.
 
Would you mind to el your workflow? Are you satisfied with the result?
The comment from @fcserei is pretty much what I would write hahah


I’ve tried it. I feel I slowly start to loose the sense of space around me. It is getting more frontal focused.
I have the same impression and maybe that's why I don't find virtualization much immersive, because I don't have a headtracker. When I listen to the regular headphone sound for a while and then play sound with virtualization software, I get the sense of listening to speakers, true, but it does not last long. After a while, I actually get so used to the still "speaker" sound, that it's like I'm listening to the regular headphone sound again.
 
The basic workflow is the following:
1. Scanning your head ...
2. Use meshmixer ...
3. ... blender.
4. Run hrtf_mesh_grading ...
5. ... hrtf export script
6. Run NumCalc
7. Run finalize_hrtf_simulation
Hmm, I had the hope that Mesh2HRTF would make the process available to people without a professional career in numerical maths, BEM modelling and programming. Seems I was wrong.

So I still shy away from a serious attempt to do this. I will probably use Windows (on my mac) to do it with the tutorials.
Mesh2HRTF has excellent tutorials. My biggest problem was that I tried to run it on a mac, and haven’t done programming for a long-long time. Git and cMake was bigger issue than running mesh2HRTF. For windows there are prebuilt binaries.

-Personal hrtf fixes tonality and response to big head movements mostly.
I do not about you, but that is quite a big fix for me.
- Externalization is much more dependendent on headtracking (or motion), than accurate hrtf.
My experience is that it is more complex.
I was always interested in binaural recordings and most of the time with disappointment. When I listen i.e. to the Chesky demos the whispering in the ear works convincing but most of everything (basketball..) is not externalised in the front but in the back. The music recordings seem to be done in a funny room with the musicians placed under the ceiling and so on.
I never knew what was wrong until I made some test recordings with in ear mics. Well, that was a surprise. The realism (even without proper EQ) was stunning. Hearing back a recording of a conversation in the living room fooled me several times about about who was speaking (or not). It worked outdoors too.
So I would estimate the influence of the hrtf to be bigger than you seem to do. Of course a binaural virtualization of stereo listening is a different thing than a (true) binaural recording.

No doubt there are several things that contribute to a convincing presentation of stereo via binaural:
- HRTF
- virtual room reflections
- head tracking

The second and third point will probably be similar for most people. But it is very difficult to gauge the first.
A "generic" hrtf can always be more or less similar to one's own hrtf. So the effect of changing to a personal solution can be quite different.
I tried Virtuoso with KU100 first and was a bit underwhelmed. Then I tried the integrated hrtfs and (luckily) found those to be much better. So again there is a considerable difference from hrtf for me as everything else is the same.
Head tracking is great and gives that extra realism that gets stunning with greater head movements when the sound stays put. But even with head tracking the KU100 hrtf never did give a presentation with the same clarity and precision as the integrated hrtf (without tracking). And my self-tinkered cross feed solution from in ear measurements sounds even better to me in that regard.
(But Virtuoso gives me a very good room, head tracking and multichannel, so it is coming out in front in many cases.)

About differences in HRTFs I at some point traced a collection of curves from Oksanen et al „Estimating individual sound pressure levels at the eardrum in music playback over insert headphones“ . Here are the hrtfs and the differences to the averaged hrtf as an illustration.
hrtfs.jpg
differences.jpg
 
When I listen to the regular headphone sound for a while and then play sound with virtualization software, I get the sense of listening to speakers, true, but it does not last long. After a while, I actually get so used to the still "speaker" sound, that it's like I'm listening to the regular headphone sound again.
That is so incredibly strange. I'm excited to start playing with it.
 
So I still shy away from a serious attempt to do this. I will probably use Windows (on my mac) to do it with the tutorials.
I can give you the mac arm NumCalc binary. The rest is just using 3d modelling SW and running a couple python scripts.

The Chesky music demos are really underwhelming for me too, hardly more than an ovely echoy, headphone sound in the back of my head. I think the problem is not in you.

Also I have a hard time externalize other static KU100 recordings in the front. There is no problem hearing what is happening around me and in the back, but the music which supposed to be out in the front, is on the sides close to my head, nothing in front center. Interestingly I have no externalization problem with binauralizing with KU100 hrtf with headtracking.

I’ve tried almost every plugin which is accessible, and most of them are underwhelming for me. Mostly because they try to emulate studio sound with headphones, which is basically trying to simulate a psychoacoustically wrong (stereo or multichannel) speaker reproduction with binauralization. I bet your in ear mic recording does not sound like any playback studio at any price.
 
I can give you the mac arm NumCalc binary.
That would be great. How big is it?
Interestingly I have no externalization problem with binauralizing with KU100 hrtf with headtracking.
Yes, head tracking is a great enabler of frontal localisation (it just does not help that much for me with timbre and clarity).
But I get frontal localisation with Virtuoso and KU100 sofa even without head tracking, IF there are enough room reflections present. Adjusting the T60 to low will make it less convincing for me.

Mostly because they try to emulate studio sound with headphones, which is basically trying to simulate a psychoacoustically wrong (stereo or multichannel) speaker reproduction with binauralization.
What alternative are you thinking of? The two-channel signal is mixed for stereo, which in
turn needs a "room" to work properly.
 
How big is it?

396kByte

… IF there are enough room reflections present.
The two-channel signal is mixed for stereo, which in
turn needs a "room" to work properly

Yes, you should add ambience, because stereo recordings are inherently dry. But the ambience has to come with all the right directions, timing and tonality, not just spilling echo over the recording as most pugins does.
It does matter what kind of room you are in or simulate. Some rooms can sound in your head in real life if you close your eyes.
 
Curious to know what your takes are on whether one option is better than the other
Both options are flawed in my experience:

Binaural renderers that are based on generic or somebody’s HRTF in combination with recorded reflections tend to come with
  • localization errors (front-back / up-down confusion),
  • errors in tonality / timbre,
  • errors in apparent source distance (too near),
  • a lack of immersion, and
  • deep nulls in the bass frequency range from room modes that tend to be asymmetrical between left and right channels.
Only the first problem can be solved with head-tracking. Even if your HRTF accidentally fits to the one used, it will most likely do so in one direction only.

Speakers in untreated rooms come with peaks and nulls from room modes in the bass frequency range (can be solved with DSP to some extent), flutter echoes and reflections that are too strong, too early and asymmetrical typically.

In your situation I would recommend binaural rendering with individualized binaural room impulse responses (BRIR), either from measurements with appropriate microphone capsules or from sufficiently accurate approximations. Naturally, for measurements you will need to have access to a treated room with decent speakers for calibration, otherwise you will end up with the same problems outlined above.

Philipp
 
Back
Top Bottom