• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

AES 2025 Paper: New targets for the B&K 5128 GRAS 45CA-10

Ignoring deviations from improper measurements like improper seating on the fixture etc., big measurable difference of the "same" HP on different fixtures would lead to the following conclusions:
-if not the same exact headphones were used in each of these tests, the unit variation between HPs of this model is fairly high *or*
-the HP in question shows a fairly large variation between individual listeners *and*
-the HP was developed on the fixture where the measurements corelate well to the expected target curve

How people can draw the conclusion "the 5128 is an improper measurement fixture that produces unreliable and inaccurate data", while variation between humans is far greater than variation between the two test fixtures discussed here plus B&K's incredible track record of producing high quality measurement equipment for decades is beyond me. Do you think Sean would put so much effort into research on this platform if he came to the same conclusion?

I can’t really speak to people’s motivations, but what I do find problematic is that 5128 measurement shows deviations that later disappear without any clear explanation - and the EQ ends up much closer to Oratory’s or GRAS measurement.

If the first measurement was the result of an improper fit or seating issue, what’s the value of publishing it at all? And if it was an early prototype or a pre-production unit, that should have been clearly stated. Likewise, if it’s simply unit variation, that’s an important detail for a review - it has direct implications for how consistent the product is. I'm sure @listener650 can correct me.

As a consumer, I look at Amir’s review, compare it with Oratory’s measurements, and watch Resolve’s or Listener’s impressions - more information is always welcome. But if the measurement standards and target curves differ substantially, I’ll rely on what works for me...in this case Amir/Oratory.
 
As a consumer, I look at Amir’s review, compare it with Oratory’s measurements, and watch Resolve’s or Listener’s impressions - more information is always welcome. But if the measurement standards and target curves differ substantially, I’ll rely on what works for me...in this case Amir/Oratory.
I agree this is usually a sensible approach - and when the Gras/BK measurements mostly align it probably means a much higher likelihood that the given headphone will sound similar on your head. When the measurements diverge though it probably means there will be a much higher variablility on people heads. I think what @MayaTlab and others have been saying is, designing a headphone which is more consistent on more heads should be the goal and not that a headphone matches a specific target on one measurement rig. As far as I can see one of the many factor (but one that might be measurable in reviews) is the headphones acoustic impedance.
 
I agree this is usually a sensible approach - and when the Gras/BK measurements mostly align it probably means a much higher likelihood that the given headphone will sound similar on your head. When the measurements diverge though it probably means there will be a much higher variablility on people heads. I think what @MayaTlab and others have been saying is, designing a headphone which is more consistent on more heads should be the goal and not that a headphone matches a specific target on one measurement rig. As far as I can see one of the many factor (but one that might be measurable in reviews) is the headphones acoustic impedance.

Headphone consistency should absolutely be the goal. Headphone FR measurements ideally shouldn't be just a pass/fail check on a single measurement rig with one specific target. Who wouldn’t agree with that?

But I undestand Amir's position - what is the benefit of introducing another variable to headphone measurement? How does that move us closer to consistent headphone response across heads? Until there’s proper validation and correlation data, adding an unproven variable creates more confusion than clarity.

Personally, I couldn't care less about measurements rigs, but I just don't understand the hostility from people that do. At the end of the day, these are just tools worth studying and discussing - not turning into some sort of ideological battle.
 
Headphone consistency should absolutely be the goal. Headphone FR measurements ideally shouldn't be just a pass/fail check on a single measurement rig with one specific target. Who wouldn’t agree with that?
Absolutely, in an ideal world a well designed headphone should be consistent on all rigs (and that would give greater confidence that it is consistent on more human heads).
Until there’s proper validation and correlation data, adding an unproven variable creates more confusion than clarity.
Yes, but I think we are seeing in these recent research papers another dimension that is needed in headphone measurements, in a similar way to the off-axis measurement of speakers brought new insights into the actual sound of speakers in a room as against just the direct FR measurements.
 
Personally, I couldn't care less about measurements rigs, but I just don't understand the hostility from people that do. At the end of the day, these are just tools worth studying and discussing - not turning into some sort of ideological battle.
I had little involvement in that, as my basic questions were dismissed and even demonized by others, which I found confusing. Later, I learned that people involved in Amir's evaluation of the BK5128 failed to disclose potential conflicts of interest. It also appeared that these issues have been brushed aside as if nothing had happened. I'm not forcing narratives or making speculations, I've shared specifics in other threads and don't want to derail this one further. Ultimately, such narratives distract from more important matters, like:

Independently quantifying the predictability of listener preference that the rig offers. Is it comparable to or better than established GRAS 45CA and equivalent rigs? I prefer that all focus regarding BK5128 and other IEC 60318-7 rigs will center on this going forward. Prematurely dismissing GRAS 45CA and equivalent rigs isn't productive and goes against reasonable standards of evidence.
 
Last edited:
Independently quantifying the predictability of listener preference that the rig offers. Is it comparable to or better than established GRAS 45CA and equivalent rigs? I prefer that all focus regarding BK5128 and other IEC 60318-7 rigs will center on this going forward. Prematurely dismissing GRAS 45CA and equivalent rigs isn't productive and goes against reasonable standards of evidence.

I understand that the idea that measuring a pair of headphones on x fixture and deriving a form of predictive score vs a target using a set of factors, or even just eye-balling it, is very seductive, but if you critically engage with the Harman papers, you'll realise that this was actually never really tested except in one of the articles, despite what has been said about that research.

What Harman really tested is "if you apply a number of EQ profiles to a pair of HD800 / HD518 / modded K712 / Momentum IEM, which one is preferred ?".

Now these over-ears were specifically chosen by Harman as in their own tests they proved to be a more stable platform than some other headphones, ex they proved to have a more consistent bass response across individuals. The HD800, in particular, was a very good choice (the HD518 and K712 maybe less so, but the latter was modded to increase clamp force, perhaps this helped). The Momentum in-ear was associated with a MEMs mic to ensure that the seatings in the individual's ears, a major improvement over a lot of previous studies that did not bother to check for seal when using IEMs as test devices.

If the headphones you measure on a GRAS fixture (ideally with Welti pinna, but Amir's rig is mostly rather similar when looking at the big picture) also behave similarly nicely across individuals as the headphones Harman used and are also translating nicely from the rig used to real humans, you can probably be a lot more confident in predicting whether or not a pair of headphones will be preferred over another one. But if the headphones under test are poor performers in that regard and their response varies significantly when the load varies, your capacity to predict preference can rather quickly fall off the window. With the exception of one paper, these HPTF issues weren't involved in Harman's papers as far as their influence on preference, as the "real" headphones weren't even used during the listening tests.

The irony is that for all the talk about how inconsistent headphones are when measured on the 5128 fixture... well that just isn't the case for these rather stable headphones, in fact Amir had very few issues getting consistent traces for these when he tested the 5128. They also tend to produce a response that's not too far off the GRAS rigs anyway up to a few kHz (nothing unexpected here), and they also tend to have a more consistent transfer function between the 5128 and GRAS rigs, which makes a rough translation of the Harman target with that method rather easy anyway up to a few kHz.

The 5128 seems more eager to trigger leakage scenarios than the "flat plate around the pinna" rigs like the 45-CA, no surprise then that leak-intolerant headphones (most closed back HPs are in that basket) proved harder to measure in a consistent fashion. Personally I see this as a benefit as good headphones should be designed to handle worst case scenarios as well as possible anyway (since these leakage effects will be more or less prevalent on real humans, more for some, less for others), but I wish that more publications had continued to use their GRAS rigs alongside the 5128 for over-ears and systematically presented measurements done on both fixtures, it's one proxy way among others to start having some educated guess as to which headphones are more or less likely to avoid introducing undesirable in situ variation across individuals.

For IEMs the fact that the 711 coupler is very much, at best, an outlier, and quite far from the average human ear, starts becoming a pretty big problem when applying the knowledge we've gained from Harman's work to IEMs with a significantly different source impedance, ANC headphones being, in a way, the extreme example of that problem, as the IE target "bakes in" this offset from the average, while feedback systems will more or less nullify the offset. The issue here isn't even a question of preference, it's that the error curve against that target will simply be invalid and mis-represent the actual in situ experience for most individuals (so of course it won't help in making good predictions).

Again, I understand that it's seductive to think that tracing a single line against a target can mean something in terms of predicting users' preferences, but it's only going to have some measure of predictability if the in-situ response is predictable to begin with, and if it's predictable to begin with, then the 5128 fixture won't cause any major issue to get more or less consistent, repeatable traces anyway, and finding a half-decent translation of Harman's target for over-ears isn't a huge challenge either. So the problem of being able to predict preferences rather is a headphones problem, than a fixture problem. Which is why it's very useful to get via any means possible even a rough idea of how a pair of headphones behave when exposed to different loads, leakage scenarios, positions, etc., so that you can know how confident you can be in your predictions.

A fairly extreme example of the fact that the notion of predictability is a headphones problem first, a fixture second, is with advanced active IEMs like the APP2 / APP3 : they will deliver nearly the same SPL and FR curve regardless of whether they're measured in a 711 coupler or a 5128 up to 4-5kHz or so (provided they're primed properly and have the same source and device volume :D). For these headphones, the fixture doesn't even matter to derive predictions. (Well it isn't actually exactly true, there's the issue that the exact same SPL in the 1-5kHz range across individuals likely isn't desirable, something the APP3 (and Bose CustomTune IEMs) seem to try to tackle, but that's worthy of its own thread :D - these two companies have long gone past the idea of evaluating headphones in a single ear simulator against a single target and are so far ahead of the level of discussion we're having it's a bit disheartening).

designing a headphone which is more consistent on more heads should be the goal and not that a headphone matches a specific target on one measurement rig.

The DCA Stealth is such a good poster child for this. Measures exceptionally well on a GRAS fixture, has very high inter-individual variation across listeners and an average in situ response that's quite far off what's measured on said rigs. Makes predicting people's preferences for it by plotting its error curve against Harman a crapshoot.
 
Last edited:
The DCA Stealth is such a good poster child for this. Measures exceptionally well on a GRAS fixture, has very high inter-individual variation across listeners and an average in situ response that's quite far off what's measured on said rigs. Makes predicting people's preferences for it by plotting its error curve against Harman a crapshoot.
Bolded for emphasis/relevance. it seems Sean would agree with the broad idea that fixating on target adherence when headphones as a class of devices by-and-large don't achieve that target response consistently on humans is The Bigger Problem here (which I assume is why he posted the images included in this BlueSky post earlier in the thread here as well).
Screenshot 2025-10-20 at 3.59.35 PM.jpg
 
it seems Sean would agree with the broad idea that fixating on target adherence when headphones as a class of devices by-and-large don't achieve that target response consistently on humans is The Bigger Problem here (which I assume is why he posted the images included in this BlueSky post earlier in the thread here as well).
The room dominates the low frequency response of any speaker. Is this justification to not ask for anechoic flat on-axis response in speakers?
 
It would save money on expensive couplers and theoretically it could generate a personalized frequency response for prospective buyers. The same way people get custom IEM impressions, maybe they could get their ears measured for acoustic impedance. Then a web tool like squig.link could have the added functionality of generating a personal response from uploaded measured ear canal impedance.
Looking at the significant variations in the same headphone response depending on coupler used, having own acoustic impedance would help with purchasing decisions.
Maybe 4 simple couplers is overkill. Maybe a algorithm could do the math based on two or three?
I certainly don't know all there is to know on this topic, but I don't think acoustic impedance is that simple. If the headphone has a different angle due to angled pads it's gonna influence the results. I should know more about acoustic impedance, but I don't think you can take all factors into account. As far as I know about the only way you can really go to town with getting headphones perfect & mimicking speakers is with the Smyth Realiser system whereby you would optimally measure a perfect speaker setup in a room with in ear mics, then measure your headphone's frequency response with in ear mics, then the system will make sure your headphone is EQ'd to hear the same thing that it's receiving from the optimised in room speaker setup you've measured. I suppose that's only gonna be as good as your speaker system that you're measuring, but that's where you take the care. I've heard of people taking all the gear to a good studio and getting it all measured so they can replicate it using Smyth Realiser. I don't think all that's necessary to enjoy music through headphones, but really that's taking everything into account.
 
The room dominates the low frequency response of any speaker. Is this justification to not ask for anechoic flat on-axis response in speakers?
A speaker can have great axial FR at one point but if the directivity is poor, strange, or ill-fitted to the listening circumstance, the end result may sound meaningfully different (and potentially much worse) in practice. And by the same coin, a headphone can have perfectly Harman-like response on a 45CA, but if said headphone when loaded by an actual human ear measures dramatically unlike the original Harman curve did in people's ears, the end result may also sound meaningfully different (and potentially much worse) in practice.
 
A speaker can have great axial FR at one point but if the directivity is poor, strange, or ill-fitted to the listening circumstance, the end result may sound meaningfully different (and potentially much worse) in practice. And by the same coin, a headphone can have perfectly Harman-like response on a 45CA, but if said headphone when loaded by an actual human ear measures dramatically unlike the original Harman curve did in people's ears, the end result may also sound meaningfully different (and potentially much worse) in practice.
I don't see how that is the answer to my question. The room impacts the response of a speaker and that is different in every room. Same "room" exists for a headphone in the way it radiates into your ears. We don't give up on our targets for speakers because of differences in rooms people have. Why should the same not be true for headphones?

The solution for rooms is EQ and maybe acoustic treatment. The former applies just the same for headphones.

So no, this is not a reason to give up on a target response. We badly need a frequency response for headphones as we have for speakers. That it also serves for preference is a major bonus. We don't get to throw the baby out with the bath water because we can't solve every issue with such a standard. Because if we do, the result is a disaster:

index.php


You really think because there are variations between humans when wearing headphones we discard that blue line and just run with such horrors?
 
Again, I understand that it's seductive to think that tracing a single line against a target can mean something in terms of predicting users' preferences, but it's only going to have some measure of predictability if the in-situ response is predictable to begin with, and if it's predictable to begin with, then the 5128 fixture won't cause any major issue to get more or less consistent, repeatable traces anyway, and finding a half-decent translation of Harman's target for over-ears isn't a huge challenge either.
I am up to some 150 headphones/IEMs tested. In vast majority of cases, what I see on GRAS 45CA either directly or closely correlates with my listening tests and EQ verification.

Dan Clark purchased his 5128 a while back and has been sharing his measurements with me on his new designs. I have yet to see any of the 5128 results to speak the truth more than GRAS 45CA. At best it shows differences that are a tie. I have noted this at times:
index.php


Here are my EQ/listening test results:

"I had to dial down the bass filter at 111 Hz as the predicated deviation while nice on some tracks, took the impact away from bass heavy tracks. With the reduced amount, you have a tighter bass response while still having 80% of the impact. The sum total of the rest of the filters gave me the impression of more separation of instruments though the effect is very subtle.

Above was sighted. In ad-hoc blind testing, I guessed correctly only 1 out of 3 as to which was stock and which was the EQ! So the effect is quite small and subtle. In that sense, I don't think I can make a strong case that these deviations are real. At the same time, I can't say they are not either. To wit, I am listening with the EQ on."


As you see above, it is really not hard to analyze differences between the two measurements with listening tests. Yet I see none of this from advocates of 5128.

Of note, Dan Clark was part of this study because he genuinely likes to get to the bottom of this. I don't see others lifting a finger to do the same.

So the problem of being able to predict preferences rather is a headphones problem, than a fixture problem. Which is why it's very useful to get via any means possible even a rough idea of how a pair of headphones behave when exposed to different loads, leakage scenarios, positions, etc., so that you can know how confident you can be in your predictions.
So you want reviewers and companies to go and buy a $35K fixture, with no reliable target as to serve this need??? We can't even get them to buy the $14K GRAS fixture to produce consistent results. Now we expect them to spend more and deal with the confusion that creates?

No. Anything you are asking about, you or proponents of your theory need to prove. It is not anybody else's job to jump in the pool.
 
How people can draw the conclusion "the 5128 is an improper measurement fixture that produces unreliable and inaccurate data", while variation between humans is far greater than variation between the two test fixtures discussed here plus B&K's incredible track record of producing high quality measurement equipment for decades is beyond me.
B&K has no history of generating preference targets. It is Harman that did this and they didn't use the 5128. Building a fixture and performing listening tests and creating targets are two different things.
Do you think Sean would put so much effort into research on this platform if he came to the same conclusion?
The B&K marketing story was appealing so they got an early unit. Per above, none of their research used it. So I don't know what it means when you say "so much effort." That effort was all concentrated on GRAS, not 5128.
 
I don't see how that is the answer to my question. The room impacts the response of a speaker and that is different in every room. Same "room" exists for a headphone in the way it radiates into your ears. We don't give up on our targets for speakers because of differences in rooms people have. Why should the same not be true for headphones?
There's a bit of a difference here. Speakers have the chance to be free from the influence of the room. We have anechoic chambers, which allow us to focus on the characteristics of the speakers themselves. However, headphones always work together with the ear canals. If the acoustic character of a room is not good, we can do acoustic treatments or change to another room, but people don't get the chance to replace their ear canals. The most important thing is that, aside from personal taste preferences, the target for speakers applies to everyone. Because everyone listens to speakers through their own ears, and at the same time, everyone listens to live music through the same pair of ears. Both the speakers and the real world sound sources are filtered the same. For speakers, the job is done as long as they can produce the same sound as the live performance. But the way headphones and real world sound sources are affected by the ears is completely different. The headphone targets for different individuals are all unique, and we can only try our best to cover the majority.
 
I don't see how that is the answer to my question. The room impacts the response of a speaker and that is different in every room. Same "room" exists for a headphone in the way it radiates into your ears. We don't give up on our targets for speakers because of differences in rooms people have. Why should the same not be true for headphones?

The solution for rooms is EQ and maybe acoustic treatment. The former applies just the same for headphones.

So no, this is not a reason to give up on a target response. We badly need a frequency response for headphones as we have for speakers. That it also serves for preference is a major bonus. We don't get to throw the baby out with the bath water because we can't solve every issue with such a standard. Because if we do, the result is a disaster:

index.php


You really think because there are variations between humans when wearing headphones we discard that blue line and just run with such horrors?
No one is saying we have to give up on the Harman target or research, I’m not sure why that’s what it always has to come back to when we are all very clearly appreciators of the literature and its authors. We are not on opposite sides here.

All I’m saying is that because of the variation I mention and differences in preference that Harman makes explicit, the reality of “the target”—what it looks like in practice—actually looks like a family of curves across fixtures and heads that likely look similar, sound similar and are similarly good.

Being more interested in characterizing what that family of curves actually looks like is not “giving up” on the research that fundamentally informs the task to begin with. We cannot accurately replicate the test conditions that would allow proper use of the 2018 curve anyway (since again, none of us have the Welti ear), so to me that means there’s really no choice but to engage with the less “fixture-specific” conclusions of Harman’s work… of which there is still plenty to chew on (people prefer more bass/less treble than flat DF, the ideal sound of speakers in rooms is likely a good starting point for “good” in headphones, listener preference varies, etc.)

Acknowledging this is not “throwing out” Harman, and its not giving a pass to terrible headphones. It’s just acknowledging the truth that Harman relayed to us: over-extending the specifics of their work to conditions outside those that were tested (including using their target with a different ear or, as we see in this paper, with a different replicator headphone) may not result in the outcomes we’d hope.

It’s the best we have though, so naturally we’re all ignoring this advice and using Harman’s work in whatever ways we believe work best for us :p

Frankly I think it’s lovely that Sean encourages us all to keep using it to help as many people as possible, even if we’re technically all being a bit naughty by doing so in our different, but all still technically wrong ways.
 
Last edited:
All I’m saying is that because of the variation I mention and differences in preference that Harman makes explicit, the reality of “the target”—what it looks like in practice—actually looks like a family of curves across fixtures and heads that likely look similar, sound similar and are similarly good.
I have repeatedly addressed that. Let's say there is a +- 2 dB band around the target. Where is the research that says +2 dB in bass and -2 dB in treble sounds the same as the 0 dB target? How about if we inverse that offset? You think that 4 dB differential at both ends sounds similar???

If you mean that people should accept minor variations in the target, then that is how the graph needs to be read. The target is heavily smoothed so no actual measurement can hug it from one extreme to the other.

Acknowledging this is not “throwing out” Harman, and its not giving a pass to terrible headphones.
That is precisely what happens to any reasonable band around the target curve. You are rationalizing responses that have not been studied.

And how does that solve any problem? Let's say you did draw that band. Where is my curve in that band?

All of this forgets the strong case I have made for a single target: standard. We must have a standard to solve the larger problem of inconsistency among headphones.
 
When I started testing speakers, i thought we could rely 100% on measurements. In practice, this is not how it worked out. I found it necessary to listen to speakers, develop EQ and perform AB test to assess the audibility of variations from the target.

I had to do the same thing for headphones. Here, the job is actually simpler because the room equation is not there.

The notion that drawing gray bars around target, getting more fixtures, etc. will get us to a better numerical solution is just a pipe dream. We have to accept the gift we have in front of us (the targets) which tells us which are the real dogs. Beyond that, we need to deploy other tools. With the help of measurements, we can easily perform device specific tests to get to a high confidence answer. I know of no other solution.
 
The room dominates the low frequency response of any speaker. Is this justification to not ask for anechoic flat on-axis response in speakers?
It certainly is not! The anechoic on-axis response of a speaker is informative.
Now what's the headphones' equivalent to a speaker's anechoic frequency response? For the speaker we removed the room, so for headphones we need to remove the listener's (or test fixture's) "head", as you have correctly identified previously:
The room impacts the response of a speaker and that is different in every room. Same "room" exists for a headphone in the way it radiates into your ears.
So what you suggest a simple pressure field measurement of the headphones. No pinnae, no ear canal, no possible variation, just raw data.
It's an interesting proposal and I wouldn't be surprised if headphone manufacturers actually take measurements like this at least somewhere in their process (quite possibly for QC because that doesn't require an expensive HATS), but I'm not sure how informative that data would be for us consumers.
 
The room dominates the low frequency response of any speaker. Is this justification to not ask for anechoic flat on-axis response in speakers?
Now this is just a ridiculous analogy, you know it isn't in any way related to listeners message and yet still post it. Not trying to come across as rude... it just doesn't help fix your image here.
 
So what you suggest a simple pressure field measurement of the headphones. No pinnae, no ear canal, no possible variation, just raw data.
It's an interesting proposal and I wouldn't be surprised if headphone manufacturers actually take measurements like this at least somewhere in their process (quite possibly for QC because that doesn't require an expensive HATS), but I'm not sure how informative that data would be for us consumers.
For production line QC, they use IEC711 fixtures to test IEMs and IEC318 fixtures to test over-ear headphones. These fixtures don't have to be BK or GRAS models. Fixtures produced in China are commonly used to reduce costs. But both IEC711 and IEC318 are designed to simulate the human ear. You can‘’t discuss the pressure field without considering the volume of the cavity. In other words, when the air pressure, temperature, and humidity are consistent, there are countless possibilities for the pressure field, but the free field is unique.
 
Back
Top Bottom