• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

AES 2025 Paper: New targets for the B&K 5128 GRAS 45CA-10

But I rather think that the 711 standard just isn't well suited to test IEMs IMO given the increasing amount of concordant evidence we have on the fact that it doesn't represent well the behaviour of an average ear (the original impetus of B&K's research that led to the 5128).
This 100% goes against incredible development we have had in low cost IEMs following Harman target. Satisfaction from users has been through the roof, unearthing levels of fidelity unheard of before. It demonstrates the efficacy of having a standard, and validity of Harman IEM response at high level.

That aside, I don't know why averaging is promoted so much in these discussions. An average is a low pass filter of the dataset. By definition then, it reduces resolution. The reason it is often used is to make the data easier for humans to digest. But as targets, it just reduces specificity as we have seen in this paper.

You can average the entire human population and still be nowhere with respect to what a single person represents. If you have a problem with measurements, averaging is not going to solve it for you, lest you want to close your eyes on detail that may be important.

Above is why i don't average multiple seatings of the headphone on the fixture.
 
This 100% goes against incredible development we have had in low cost IEMs following Harman target. Satisfaction from users has been through the roof, unearthing levels of fidelity unheard of before. It demonstrates the efficacy of having a standard, and validity of Harman IEM response at high level.

We already know - thanks to methods for estimating the response at DRP or even direct measurements - that we have a long way to go still in most cases to reach particularly excellent "levels of fidelity" (unless you think that an in-situ response than can uncontrollably vary by around 5-8db even right down across the mids is good enough).

Harman's work to determine an IE target for passive IEMs was done nearly a decade ago, it was a major improvement then over what we had, but a lot of knowledge has been gained in the meantime and things are moving fast in the R&D labs of companies like Bose or Apple (and hopefully Harman will get there too soon enough). I wish you remained curious about more recent research and did not get stuck in 2016.

That aside, I don't know why averaging is promoted so much in these discussions.

What do you think happens when this :

Screenshot 2025-10-18 at 09.32.00.png


Meets an active system, either like classic feedback systems for ANC headphones, or more advanced systems like what we have in Bose or Apple IEMs ? And what does it mean about the validity of a target designed for passive IEMs on a fixture that's a poor representation of the average human for these headphones ?

(now the same could be said even for passive IEMs which source impedance differs vastly from the IEM Harman used to design the IE target).
 
O.K.

  1. I wish for less unpleasantry as this is not helpful in any discussion ever.
  2. Humans differ, perception differs, preference differs, viewpoints differs, test methods differ, results differ, interpretation differs.
  3. It is very clear... there is a definite delta between test fixtures with different headphones and for that reason there cannot be a single 'conversion curve'.
    The paper is clear about this, all of the experiences from 'measurebators' are clear about it.
  4. There is a difference between EDRP measurements and every test fixture. They differ in different ways. None of them, nor ears are perfect nor a true standard.
    One picks one (or more if them) based on ... well preference for fixture(s) based on specific reasons. That may differ for different 'measurebators'.
    There is no best or perfect choice. All choices are trade-offs. Every 'measurebator' defends their choices in fixture and target(s).
  5. I like 'measurebators' for what they are doing and their approaches regardless how they differ. Humans differ, there is room for different opinions.
    Amir, Resolve, Mad_economist, Dr. Olive and Oratory all are highly rated in my opinion and all try to do their 'measurebating' in somewhat different ways and I believe firmly that all are doing what they do to better understand the correlation between headphone measurements and perceived sound (there is some, but no single correct correlation).
    Thanks to all you guys for your continued efforts.
  6. Accept differences in approach, vision, interpretation of data and acquisition of it and keep this 'heated' topic a bit cooler.
  7. There is no single 'best' target nor measurement method/fixture. One can bombard a method to a certain standard but simply means there will be several standards.
    at best there might be a specific test fixture/target that is the 'best possible fit for a certain average' but there won't be a target/measurement that will be fine for all people.
    The reason being ... perception/preference and variability of human auditory 'sensor'.
Accept each others flaws and methods and please ... discuss politely regardless how much the 'differences' rubs against one's personal core principles.
(B.t.w. this last bit does not apply to Dr. Olive as he has been nothing but polite and respectful IMHO).

I simply hate to see people going for each other's throat simply because they have different opinions about 'science'.

Standards are standards and can differ.
Preference can differ and this is fine, in the end tune to your liking. Great if it coincides with some standard that seems to work for you.
Perception can differ too.
Opinions can differ but in the end are just opinions, regardless how valid they may seem to the holder of that opinion.

/rant
 
Last edited:
Meets an active system, either like classic feedback systems for ANC headphones, or more advanced systems like what we have in Bose or Apple IEMs ? And what does it mean about the validity of a target designed for passive IEMs on a fixture that's a poor representation of the average human for these headphones ?
It all means one thing: there will never be an exact solution to the problem. Ever. Averaging certainly doesn't get you there.

You need to become comfortable with ambiguity and deal with that differently than chasing a dream measurement/target which doesn't exist. For me, that is to always follow measurements with filtering and listening tests. It is only then that I can convince myself if the variability I see is real or not. I perform these tests blind if I need to.

Staying with measurements alone won't work. Nor will some random subjective review (with or without peaking of measurements).
 
Or perhaps even better, we'd measure headphones on a cohort of dummy heads (for example, 12), that have been evaluated as a good, balanced representation of a larger population of real humans, and evaluate them against the ideal target for each of these dummy heads
Again another excellent post - I would love to see this - I think this type of representation would be amazing and much more accurate that one or other fixture being used, of course to @amim concerns, we would need 12 properly done research studies of each of these 12 heads to correctly derive a target for each.

But something like this (maybe it starts with just 3 different heads) then combined with that head / ear scanning that Apple (and others ) are doing to align an individual HRTF nearer one of these heads and finally combined with some of the acoustic impedance metrics about a given headphone (like a confidence factor about the headphone consistency on a given fixture) would be the ideal. I feel reviews with the level of details about a headphone might then give the same level of confidence that we get with a Klippel Speaker review from Amir or Erin.
 
I would love to see this - I think this type of representation would be amazing and much more accurate that one or other fixture being used, of course to @amim concerns, we would need 12 properly done research studies of each of these 12 heads to correctly derive a target for each.

But something like this (maybe it starts with just 3 different heads) then combined with that head / ear scanning that Apple (and others ) are doing to align an individual HRTF nearer one of these heads and finally combined with some of the acoustic impedance metrics about a given headphone (like a confidence factor about the headphone consistency on a given fixture) would be the ideal. I feel reviews with the level of details about a headphone might then give the same level of confidence that we get with a Klippel Speaker review from Amir or Erin.

This would be like buying 12+ 5128 fixtures - hence not really feasible, and would require a sh*tload of data companies like Bose, Apple, Harman, etc. would in all likelihood prefer to keep for themselves :D.

But some publications already try to present additional data such as positional variation, try to test for leakage, etc.... (example unheardlab) this can help to characterise a pair of headphones' behaviour and enlighten how you should interpret the data you obtain (and your confidence in that interpretation). For example for IEMs, I like to measure them with custom DIY canal extensions to keep everything ceteris paribus except the length of the canal and its volume (and, in extenso, the impedance presented to the IEM, albeit with a similar impedance curve without modifying the side volumes). For active IEMs, for example, it's very useful to determine the range over which active systems will try to keep the SPL constant regardless of the load impedance and helps me understand how to interpret measurements (preferably done by others, not on my own clone coupler :D).

Quarks raw diff.jpg
A40 diff.jpg
A40 ANC on raw diff.jpg
Screenshot 2025-09-21 at 08.30.39.png
 
From 1 to 2 kHz, it is either the same or very small difference. So I don't think we need to mess with that.
I disagree because it's from 1-3kHz and it's a fairly significant change if we look at the Susvara for instance, you can see the EQ filter used to correct that area is pretty significant being Peak Filter 2249Hz, -1.8dB, Q1.492. That's an audible difference especially as it's a wide filter and nearly 2dB change - ok some portion of that is eaten up by the positive filter right next to it (when looking at the Total EQ Curve), but it's still a significant change. But of course it's up to you how you approach & assess the impact of this research on what you do here.

1760779680922.png
 
I don't disagree. It's more a question of whether or not a slightly different presentation would benefit reviews and direct readers to ask more questions of their own experience.

In that vein, this target deviation plot could be quite useful for those additional traces. I could see readers not being clear on what kind of spectral tilts have been supported by research.

View attachment 483958
Personally I don't like seeing this graph as a main or only means of understanding the frequency response. I find it far more useful and intuitive to see the entire raw frequency response shown along with the target curve.
 
O.K.

  1. I wish for less unpleasantry as this is not helpful in any discussion ever.
  2. Humans differ, perception differs, preference differs, viewpoints differs, test methods differ, results differ, interpretation differs.
  3. It is very clear... there is a definite delta between test fixtures with different headphones and for that reason there cannot be a single 'conversion curve'.
    The paper is clear about this, all of the experiences from 'measurebators' are clear about it.
  4. There is a difference between EDRP measurements and every test fixture. They differ in different ways. None of them, nor ears are perfect nor a true standard.
    One picks one (or more if them) based on ... well preference for fixture(s) based on specific reasons. That may differ for different 'measurebators'.
    There is no best or perfect choice. All choices are trade-offs. Every 'measurebator' defends their choices in fixture and target(s).
  5. I like 'measurebators' for what they are doing and their approaches regardless how they differ. Humans differ, there is room for different opinions.
    Amir, Resolve, Mad_economist, Dr. Olive and Oratory all are highly rated in my opinion and all try to do their 'measurebating' in somewhat different ways and I believe firmly that all are doing what they do to better understand the correlation between headphone measurements and perceived sound (there is some, but no single correct correlation).
    Thanks to all you guys for your continued efforts.
  6. Accept differences in approach, vision, interpretation of data and acquisition of it and keep this 'heated' topic a bit cooler.
  7. There is no single 'best' target nor measurement method/fixture. One can bombard a method to a certain standard but simply means there will be several standards.
    at best there might be a specific test fixture/target that is the 'best possible fit for a certain average' but there won't be a target/measurement that will be fine for all people.
    The reason being ... perception/preference and variability of human auditory 'sensor'.
Accept each others flaws and methods and please ... discuss politely regardless how much the 'differences' rubs against one's personal core principles.
(B.t.w. this last bit does not apply to Dr. Olive as he has been nothing but polite and respectful IMHO).

I simply hate to see people going for each other's throat simply because they have different opinions about 'science'.

Standards are standards and can differ.
Preference can differ and this is fine, in the end tune to your liking. Great if it coincides with some standard that seems to work for you.
Perception can differ too.
Opinions can differ but in the end are just opinions, regardless how valid they may seem to the holder of that opinion.

/rant
Well I certainly agree that Dr Olive is so chilled, he doesn't get riled! But from an intuition and personal experience perspective I'm sure that some targets are better than others at pleasing myself & likely better at pleasing larger swathes of the population - so I don't think all targets (& fixtures) are created equal in their ability to satisfy the largest proportion of people. The ideal scenario is proper studies being done in the creation of the targets & choice of the fixture, and we have that for the Harman Headphone Curve and now some additional info with regard to where the frequency response can differ when translating back to the original fixture that was used by Harman - so to me it looks like we've got two bits of solid work that relate to application of Harman on GRAS KB5000. To me it's just a question of how best we apply those two bits of information together in assessing headphone measurements going forward on the GRAS device that Amir uses (& Oratory uses & others use it too of course including Resolve at headphones.com has access to that GRAS too).
 
Last edited:
from an intuition and personal experience perspective I'm sure that some targets are better than others at pleasing myself & likely better at pleasing larger swathes of the population - so I don't think all targets (& fixtures) are created equal in their ability to satisfy the largest proportion of people.
Exactly my point.
And as Dr. Olive says.. the Harman target suits the majority of people and even has a percentage (60% or thereabouts) and even in that 60% there is an 'acceptance band'.
This is with the fixture with the home-brew pinna and not KB5000 (though they seem close).

That leaves 40% of the people that were included in the testing prefer a different tonality for whatever reason.

The ideal scenario is proper studies being done in the creation of the targets & choice of the fixture, and we have that for the Harman Headphone Curve and now some additional info with regard to where the frequency response can differ when translating back to the original fixture that was used by Harman - so to me it looks like we've got two bits of solid work that relate to application of Harman on GRAS KB5000.
Yep, solid work was done, which has clearly shown that not all people prefer it but suits the preference of the majority (about 60%) of the people that were included in the testing.

To me it's just a question of how best we apply those two bits of information together in assessing headphone measurements going forward on the GRAS device that Amir uses (& Oratory uses & others use it too of course including Resolve at headphones.com has access to that GRAS too).
For Amir this is clear.
I don't see any objections to exploring other fixtures like the more 'human alike' 5128 and how that one translates to perception.
This requires creating a target which as the article Amir mentioned shows is not possible without accepting a rather large tolerance.

Different fixtures = different measurements with the same headphone.
Different targets = looking for another (better ?) match with how headphones are perceived.
Consider placement on the head, seal issues, pad wear, product variances/tolerances, (silent) updates, reaction to different source impedance.

Then consider the Harman research included lots of different people where a number of them (I recall) were not particular headphone enthusiasts and a limited amount of actual different headphones were used and simulated headphones may not be 'accurate' enough.

I can see some headphone enthusiasts with (expensive) HATS or hammers might be looking for a grail that is a bit holier to them.
All fun and exiting but can see objections being made.

All of this can be discussed in friendly and open discussions is my point.
 
Exactly my point.
And as Dr. Olive says.. the Harman target suits the majority of people and even has a percentage (60% or thereabouts) and even in that 60% there is an 'acceptance band'.
This is with the fixture with the home-brew pinna and not KB5000 (though they seem close).

That leaves 40% of the people that were included in the testing prefer a different tonality for whatever reason.


Yep, solid work was done, which has clearly shown that not all people prefer it but suits the preference of the majority (about 60%) of the people that were included in the testing.


For Amir this is clear.
I don't see any objections to exploring other fixtures like the more 'human alike' 5128 and how that one translates to perception.
This requires creating a target which as the article Amir mentioned shows is not possible without accepting a rather large tolerance.

Different fixtures = different measurements with the same headphone.
Different targets = looking for another (better ?) match with how headphones are perceived.
Consider placement on the head, seal issues, pad wear, product variances/tolerances, (silent) updates, reaction to different source impedance.

Then consider the Harman research included lots of different people where a number of them (I recall) were not particular headphone enthusiasts and a limited amount of actual different headphones were used and simulated headphones may not be 'accurate' enough.

I can see some headphone enthusiasts with (expensive) HATS or hammers might be looking for a grail that is a bit holier to them.
All fun and exiting but can see objections being made.

All of this can be discussed in friendly and open discussions is my point.
Agree with the first half of your message, for the rest I'm ambivalent.
 
5128 and 4620 reveal perceptually relevant information about high acoustic z devices that 711 systems do not. That is why we use this.
This has been your guys claim from the start before independent validation existed for the claims, presents a credibility issue on your part. Further evidence is the fact that your guy rage-banned me from all your discord servers early on in your BK5128 adoption when I pointed out a statistical delta maxim at 14Khz, discovered independently by csglinux which I was right to point out for your college @_listener_ that was judging BK5128 graphs like it was 711. This historical revisionism is getting tired.
 
Last edited:
@Robbo99999 I think the point is the resolution of the targets is lower than the difference between them. You can't manufacture something to a higher accuracy than that of the standard it is based on, and if you try you are just guessing, and therefore not increasing the accuracy.
 
This has been your guys claim from the start before independent validation existed for the claims, presents a credibility issue on your part. Further evidence is the fact that your guy rage-banned me from all your discord servers early on in your BK5128 adoption when I pointed out a statistical delta maxim at 14Khz, discovered independently by csglinux which I was right to point out for your college @_listener_ that was judging BK5128 graphs like it was 711. This historical revisionism is getting tired.
This is not at all the topic of the thread, but a friend of mine let me know that you've posted this and said I should probably defend myself. I kind of agree that its worth commenting, because I don't want people to think my reasoning for your removal was quite so base. It was everything to do with how you were arguing your take, including strawmanning me with weird racial/ethnicity based comments that showed, at the time at least, you weren't interested in discussing the issue in good faith.
Image attached below, and people can go check out the discussion in Crinacle's server that led to this as well if they'd like.

Screenshot 2025-10-18 at 11.49.54 AM.jpg


However, this was now 2 years ago at this point, and I've learned a lot about Markanini in the time since; after engaging with him and reading his contributions in other spaces, it became very clear to me that he was absolutely capable of reasonable discussion about these things, and that we actually had a lot more in common than we had different (after all, we are both deeply interested in the science surrounding the measurement of headphones and IEMs).

So after learning more about him, I saw fit to unban him from Crinacle's server, and apologized + acknowledged that I could've handled our previous discussions better. Happy to post my apology as well if people think this is (for some reason) the appropriate place to air stuff like this out—I do worry this is already monumentally off topic. I do want to patch things up with Markanini though, because over time we have both actually arrived to very similar viewpoints, and I would be lying if I said he hasn't said interesting things that made me think critically about my views... and acrimony in the space isn't really something I want to leave festering if I can change it.

Screenshot 2025-10-18 at 11.57.58 AM.jpg

Unfortunately I think Markanini misrepresenting this dialogue on ASR a while back as "the tyrannical B&K mafia silencing an opposing viewpoint" did lead to my original account here being banned, but I hope as someone who is deeply interested in the science surrounding audio metrology (that I would like to contribute to myself one day) that hasn't quite done what I've been accused of, I can continue to remain here (probably with a "Reviewer" tag so people understand my positioning with proper context) for when I want to discuss interesting contributions to the space, like Sean and Dan's new paper, or Sean and Etienne's new paper, or Sean and Floyd's upcoming 4th Edition of Sound Reproduction.

Bringing it back to the topic of the thread, I do think what's interesting about this paper is the question that gets raised about how these targets are defined. Specifically that any of the targets we've used up to this point have been, for all intents and purposes, a smoothed headphone frequency response after EQ has been applied. Now that we know the "headphone frequency response" part of that equation is more squishy, and prone to change based on the input load of the test fixture, I think it does behoove us to think a bit more about what kind of headphone ought to be used as the "blank slate" upon which we define the preference changes. @Sean Olive seems to think this is worth mulling over as well, or at least he'd indicated interest in doing so in our prior discussions.

He also had a BlueSky post that may interest people reading the paper/following this thread. Highly recommend following him on BlueSky by the way, he posts a ton of little nuggets of wisdom and it's great to hit him up and ask questions/learn more about his work by discussing things with him directly. OK, sorry for the tangent
 
Last edited:
@Robbo99999 I think the point is the resolution of the targets is lower than the difference between them. You can't manufacture something to a higher accuracy than that of the standard it is based on, and if you try you are just guessing, and therefore not increasing the accuracy.
How do you mean, which targets? If you're talking about the graphs of the 7 headphones that Amir put up then the differences between the way the headphones measured on the original GRAS Welti vs GRAS KB5000 is not imagined as the differences are there to see and the trends of the differences are similar with a depression between 1-3kHz and an excess between 3-8kHz. If the resolution wasn't great enough then they wouldn't be able to show those differences. Yes the amount of differentiation in those two areas is different for different headphones but the trend is there nevertheless. It's difference enough that you could hear the difference in an EQ.
 
How do you mean, which targets? If you're talking about the graphs of the 7 headphones that Amir put up then the differences between the way the headphones measured on the original GRAS Welti vs GRAS KB5000 is not imagined as the differences are there to see and the trends of the differences are similar with a depression between 1-3kHz and an excess between 3-8kHz. If the resolution wasn't great enough then they wouldn't be able to show those differences. Yes the amount of differentiation in those two areas is different for different headphones but the trend is there nevertheless. It's difference enough that you could hear the difference in an EQ.
Yes the GRAS Welti and KB5000. I think amir earlier did say he was considering a boost to 3-8kHz. For the other region I interpreted his description of "either the same or very small difference" to mean within the limits of measurement error. I personally agree and would not worry.

n.b. can anyone explain what the "_7" and "_20" means in the graphs and whether the GRAS system Sai uses at unhearldab matches any tested?
 
This all sounds a bit squishy to me.

I do not see any claims at statistical equivalency among the test devices.

Where is the audio science?
 
. It was everything to do with how you were arguing your take, including strawmanning me with weird racial/ethnicity based comments that showed, at the time at least,
There is nothing remotely racist to state the fact that Danish population was used to develop 5128. They should have spot checked rest of the world as a minimum to quantify differences.
 
Back
Top Bottom