• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

AES 2025 Paper: New targets for the B&K 5128 GRAS 45CA-10

I think you may have erred where I previously did in assuming the lack of correlation to be a bug instead of a feature. I was also deeply confused about 5128 measurements as recently as two months ago, but as I've learned more and spoken with people who know more than I do about headphone measurements, it's become apparent that even in the absence of a "Harman target" for the 5128, there are useful ways to evaluate the measured performance of headphones in a way that people can still understand, and the 5128 is a very powerful tool for characterizing the behavior of these devices.
There is nothing "powerful" about it unless it better predicts listener preference. Currently, it has no ability to do so. See this thread on Dr. Olive's past research on it: https://www.audiosciencereview.com/...ment-talks-from-head-fi-and-sean-olive.27017/

The BK 5128 underreported bass frequencies by substantial amount and in a sloping way. The lower the frequency, the less it reported the bass energy. This caused the preference score of a headphone drop 45 points from high 80s down to 40s! There were also some differences in high frequencies but that was not that meaningful or emphasized in the presentation.

Even B&K itself doesn't sell it for this purpose. In my direct conversations with them, they said the fixture is useful for companies doing internal R&D. Not as a tool for reviewer where a target curve is necessary. I appreciated their honesty in this regard.
 
I'm not sure this is true, sir. You absolutely did introduce at least one random target at the beginning of your time with the 5128 before there was any scientific support for it (DF + 8dB tilt, if memory serves). I don't think this was unreasonable, but it did happen. Can you share the paper where the 10dB tilt was rated similarly to Harman?

Big fan of your content, by the way.

So again, it was by no means random. It was to tilt the baseline transfer function of the measurement rig by the same bass to treble delta found in the Harman research. That ends up being 8-12dB. If memory serves, Dr. Olive had noted that in-room flat was very similar to DF, and we had that for the system. If you strictly mean "didn't have the Harman listening tests done with it", then sure. But that's true of traditional 711 systems too. Everyone just used it because they assumed HpTF trends wouldn't vary as much as they did.

It's technically equivalent to what we're doing now, it's just that we're not baking in the tilt to the compensation. The concept was and still is to include both the ear transfer function and listener preference in the methodology.

Anyway here's the convention paper that tested DF + tilt: https://aes2.org/publications/elibrary-page/?id=22903

I don't exactly put much stock in it because it's a small sample size, and again the most relevant part of listener preference has been bass to treble delta, so it's not really surprising to me that it scored similarly. The spatial stuff is... not what I would have expected.
 
ven B&K itself doesn't sell it for this purpose. In my direct conversations with them, they said the fixture is useful for companies doing internal R&D. Not as a tool for reviewer where a target curve is necessary. I appreciated their honesty in this regard.
That detail certainly makes the push toward BK5128 adoption seem more aggressive than it needs to be. Before I knew this it merely raised questions about why those advocating for it are connected to audio tech retail with varying degrees of separation.
 
I do want to also note, at no point did we ever "jump and create a new random curve". Our whole thing was always to use the DF baseline calibration with the preference tilt from Harman. Coincidentally... in a recent convention paper, even our very early efforts of DF + 10dB tilt was tested and rated similarly to Harman. Though that's not why we did it.
Dr. Olive's research last year shows that all of the targets based on 5128 underperformed the original research fixture:

index.php


Before any adoption, such studies needed to be performed. I did my own ad-hoc analysis and found this underperformance with 5128. You did nothing but jump both feet in because Jude had.

As to that paper, the target they used for "HARMAN" was not actually from Harman as such target did not exist yet (and there is still not an official one). They used what Mad_Economics had cooked up while I was evaluating the 5128. That you think you did close to it then means little. I was so worried about them running with that curve that I did not want my name associated with it in their paper!
 
Well, I always do that. But no, what I meant is that most of the curve has not changed so if I were to show a variation, it would only be where the deviation is noticeable which is above a few kHz.
What about having a number of new traces representing the kind of tone controls that @Sean Olive mentioned for bass and highs? Upper and lower bounds, plus a few in between and the original target. Would give a sense of where the headphone lands.
 
Dr. Olive's research last year shows that all of the targets based on 5128 underperformed the original research fixture:

index.php


Before any adoption, such studies needed to be performed. I did my own ad-hoc analysis and found this underperformance with 5128. You did nothing but jump both in because Jude has.

As to that paper, the target they used for "harman" was not actually from Harman as such target did not exist yet. They used what Mad_Economics had cooked up while I was evaluating the 5128. That you think you did close to it then means little.

This was actually due to a mislabeling issue, which Dr. Olive noted subsequently. So nothing we were doing in our content at the time actually got tested here. But more importantly, if this is the last thing you've seen from Dr. Olive, there's quite a bit more. There were a number of additional listening experiments done that actually showed alternative targets like the one from SoundGuys, which are nearly identical to the JM-1 baseline many IEM reviewers use, 'outperformed' Harman IE. I say in scare quotes because just like the image quoted here, they were basically tied. Attempts like this to weild some sort of target authority are misguided at best and bad faith at worst.

More importantly, what people should take away from that listening experiment, and I specifically asked Sean about this at the presentation of that data, is that the same people can prefer very different targets similarly. And, the findings of the very research this thread is about to do with the effects of acoustic impedance lends even more support to the need to move on from the single line target paradigm. Something tells me the folks conducting the research also agree.


As far as us buying a 5128 because Jude did... yeah, that's who we're trying to compete with. You realize we also purchased two official GRAS fixtures with KB5000 pinnae, are about to purchase a third, and also purchased wait for it... 40 clone couplers - to test to see how suitable they actually are for community use. I'd love to see your rationale as to how we did that to compete with Jude. This is a moronic conversation.
 
Last edited:
Buy 400 or 4000 clone couplers, but you can't expect people are paying attention to ignore that you're overselling BK5128 adoption.

Well quite frankly I wouldn't want to use 711 for IEMs or other acoustic impedance sensitive devices, for precisely the reasons that have been discussed in this thread. For open back over-ear headphones I have no issue using 711 and do so regularly. And I'll be the first person to say that the specific B&K pinna effects are by no means a better match to population average ear effects than what you get from GRAS. We've done several videos about this.

No, it bears repeating, the 5128 and 4620 reveal perceptually relevant information about high acoustic z devices that 711 systems do not. That is why we use this. Not because it's a more accurate ear, not because of Jude or some other weirdo conspiracy theory, but because of the specific differences the research in this thread is about. And shoutouts to the folks behind this one for facilitating it.
 
Last edited:
As far as us buying a 5128 because Jude did... yeah, that's who we're trying to compete with. You realize we also purchased two official GRAS fixtures with KB5000 pinnae, are about to purchase a third, and also purchased wait for it... 40 clone couplers - to test to see how suitable they actually are for community use. I'd love to see your rationale as to how we did that to compete with Jude.
It is all along the same theme. That is, an arm's race to nudge the other guy out of running as to say, "we are more right because we have all this gear." And as I mentioned, to throw more confusion about measurements out there as to lesson their value.

Or maybe it is yet another attempt to try to justify the already incorrect decision of buying 5128.
This is a moronic conversation.
As I mentioned, the alternative is to think that you all have no logical sense to have bought a $30K fixture that has brought no more clarity to headphone measurements.

Or maybe click biat has more value than I think. See this slide in Dr. Olive's presentation:
1760753181096.png


It is this kind of hubris that astonishes me. It is crap that you only see when youtubers think all of a sudden they are experts by reading papers and not once attempting to conduct that actual research to earn the credibility.
 
More importantly, what people should take away from that listening experiment, and I specifically asked Sean about this at the presentation of that data, is that the same people can prefer very different targets similarly.
There is the money line folks: that measurements have less value because two different measurements could mean the same thing.

So no, these are the actual conclusions from the paper:

9 Conclusions
The Harman Target curve is a preferred headphone frequency response based on an older GRAS 45CA RA045-S1 test fixture with a custom pinna that is not widely available. Applying this target to measurements from newer fixtures can cause errors due to differences in their acoustic transfer impedance. The purpose of this paper was to define Harman target equivalents for these new test fixtures that account for the acoustic transfer impedance differences and produce similar sound quality. A calibration method was described that produced equivalent Harman target curves for the GRAS 45CA-10 and B&K Type 5128 fixtures. The calibration method can be applied to other test fixtures with different ear simulators and pinna to produce an equivalent Harman target curve for that fixture.

The listening test results provide evidence the Harman equivalent targets for these fixtures produced near-identical sound to the Harman target defined on the original fixture. Increasing the average number of headphones used in the calibration process reduced similarity ratings. This indicates that a calibration tailored to each headphone works best. However, the equivalent Harman targets on the newer fixtures for each of the 7 headphones are very similar and could be easily matched with simple bass and treble shelf filters. Future work will aim to develop more generalized calibrations and targets.

----
Nowhere does it say the spin that you just put on it. At best the 5128 can have a target that gets similar results. At worse, it requires calibration for each headphone being tested. There is no sign of superiority of this instrument.
 
What about having a number of new traces representing the kind of tone controls that @Sean Olive mentioned for bass and highs? Upper and lower bounds, plus a few in between and the original target. Would give a sense of where the headphone lands.
As I have explained in other threads like this, creating such bands leads to misleading results. Let's say a headphone has too much bass but still within such a range, and too little treble, but still in said range. No research backs this type of outcome being just as good as one without deviation from target. That headphone would sound boomy and dull.

A much better solution is to accept the variability as a fact and let your eyes judge its degree. If there is doubt, create a filter and test for audibility difference as I do in every review. Don't create limits for measurements unless there is data that backs all of the headphones in those ranges having equal preference.
 
It is all along the same theme. That is, an arm's race to nudge the other guy out of running as to say, "we are more right because we have all this gear." And as I mentioned, to throw more confusion about measurements out there as to lesson their value.

Or maybe it is yet another attempt to try to justify the already incorrect decision of buying 5128.

As I mentioned, the alternative is to think that you all have no logical sense to have bought a $30K fixture that has brought no more clarity to headphone measurements.

Or maybe click biat has more value than I think. See this slide in Dr. Olive's presentation:
View attachment 483937

It is this kind of hubris that astonishes me. It is crap that you only see when youtubers think all of a sudden they are experts by reading papers and not once attempting to conduct that actual research to earn the credibility.

Oh no that's just you saying this. By all means I welcome you to continue using 711 and whatever interpretation of the Harman target you please. Truly I do not care if you do not wish to adopt a new standard.
 
there is data that backs all of the headphones in those ranges having equal preference.
From what I remember of Olive's paper, there was a cluster analysis showing that headphone response preferences demonstrated a few different groups. So illustrative low Q adjustments in line with those wouldn't be inappropriate.
 
Truly I do not care if you do not wish to adopt a new standard.
There is no standard. New or otherwise. Sometimes I think you all are on payroll of B&K with all your advertorials.
 
From what I remember of Olive's paper, there was a cluster analysis showing that headphone response preferences demonstrated a few different groups. So illustrative low Q adjustments in line with those wouldn't be inappropriate.
There is but there is no simultaneity. Some do want more or less bass/treble. My solution for this is to say that EQ is mandatory and listeners should go and adjust to taste. I get them close with what I test and the rest is up to them. Much like speakers.

Put another way, measurements let us discard the real dogs and we can readily do that with GRAS 45CA. As response gets closer to our target, then there is nothing one can do to validate a perfect conclusion. The system has too much variability to get to a pure answer.
 
Adding on, I face that very situation with Dan Clark headphones. There are small variations and all I could do is develop filters and perform AB tests. No change of fixture, modifying the target, etc. would solve what to do here:

index.php
 
There is but there is no simultaneity. Some do want more or less bass/treble. My solution for this is to say that EQ is mandatory and listeners should go and adjust to taste. I get them close with what I test and the rest is up to them. Much like speakers.

Put another way, measurements let us discard the real dogs and we can readily do that with GRAS 45CA. As response gets closer to our target, then there is nothing one can do to validate a perfect conclusion. The system has too much variability to get to a pure answer.
I don't disagree. It's more a question of whether or not a slightly different presentation would benefit reviews and direct readers to ask more questions of their own experience.

In that vein, this target deviation plot could be quite useful for those additional traces. I could see readers not being clear on what kind of spectral tilts have been supported by research.

1760759807526.png
 
My question here is: for defining something like a Harman curve for the 5128, do you think it would make sense for the GRAS systems to have a "high impedance" and "low impedance" version of the target? Like if we take "low variation" headphones like the HE1000 in the study and a "high variation" headphones like the DCAs or other closed back headphones, EQ them to the same target on the 5128, and then measure the headphones on the GRAS system, one would assume that the GRAS system would show similar behavior under 1 kHz for the "low variation" headphone but more energy in the "Definitely problematic" region. I think somethign like this could be one way to attack the problem of the GRAS systems' behavior in the "Definitely problematic" band, but I guess we would need to measure the output impedance of the headphone to confirm which target to use.

I hope I'm not misunderstanding you, but I am not certain a binary approach like this is ideal given that we're dealing with issues best expressed in degrees ?

Ideally, at least at low frequencies, all publications would do what Rtings is doing (and severely punish headphones that can't deliver a constant bass response), but it's probably not feasible in practice for most. Or perhaps even better, we'd measure headphones on a cohort of dummy heads (for example, 12), that have been evaluated as a good, balanced representation of a larger population of real humans, and evaluate them against the ideal target for each of these dummy heads. But that's even less feasible and might for now be limited to the R&D labs of large companies :D.

Otherwise I'd prefer to see GRAS (or 5128 for that matter) measurements supplemented with additional measurements aiming at characterising the headphones' behaviour under various circumstances such as leakage (either a controlled "quantity" of leakage, or a consistent "physical" mod like glasses, or both - for example a pair of headphones might be quite susceptible to leakage effects, and yet have a very good physical / ear pad design that reduces the quantity of leakage present over most heads), pad compression, positional variation, etc. This can help to determine the parts of the spectrum that are consistent under these circumstances and the parts that aren't, and can inform the interpretation of the measurements so that it's not misleading and that the right conclusions are reached : some headphones might not hit the target well, but might turn out to be very stable and desirable platforms for EQ, while others might hit the target well on a given ear simulator, and yet turn out to be poorly engineered because they can't deliver a consistently desirable response (DCA Stealth for example).

Where I can see some value in using two targets for 711 test systems is for IEMs, for active vs passive IEMs, but even then it's fraught with difficulties (in particular since active systems operate over different ranges from one IEM to another and the transition region where the passive behaviour takes over can behave rather oddly in some cases). I've seen a lot of publication test IEMs like the AirPods Pro 2 in a 711 coupler against the Harman IE target (designed using a passive IEM) and this is just plain wrong and misleading.

But I rather think that the 711 standard just isn't well suited to test IEMs IMO given the increasing amount of concordant evidence we have on the fact that it doesn't represent well the behaviour of an average ear (the original impetus of B&K's research that led to the 5128). Harman recently published an article on AES regarding a methodology to estimate the response at the eardrum up to several kHz when using IEMs with an inward facing mic, and the results comparing real ears with 711 and 5128 is another (expected) piece of evidence to that effect : https://aes2.org/publications/elibrary-page/?id=22943

Screenshot 2025-10-17 at 21.57.23.png

(to be clear figures 9 and 10 are estimations, not direct in situ measurements, but they are concordant with what we already know, and if I understand the article well enough, the estimation method's error was tested against actual measurements in the two ear simulators). Another point of interest to me here is that we can see one failing of the 5128 fixture when measuring IEMs, it often presents "wiggles" in the lower mids that seem far less prevalent in real ears, and I wish B&K fixed that - for now the only thing we can do is ignore them and "connect the dots" across these wiggles).

The whole thing kind of leaves me scratching my head about why they used the DCA headphones. Even in this study, they observed other headphones to have less variation between fixtures.

In fairness this study shows that one of DCA's headphones could be very consistent, but indeed the X / XO / Noire / Stealth have already been measured as being quite undesirably inconsistent across individuals (and fixtures) and should absolutely not be used in any capacity to translate Harman's work to the 5128 if that's the method one prefers to choose to design a target for the 5128.
 
Last edited:
I find these announcements of new research and subsequent discussion valuable.

I do wish for less hostility in the discussions all around though. Making and listening to music is clearly important to all of us here but not in a way that justifies some of the language used.

There are situations where looking deeply into the intentions behind people's words and actions, politics for example, can be the astute thing to do. I doubt that being so suspicious of @Resolve and the headphones.com guys is necessary or productive. I think starting off with the faith that people are broadly trying their hardest for a similar goal, even if methods vary, goes a long way and should require real evidence to allege the contrary.

Sorry I don't have much to say regarding the actual content of the paper.
 
I don't disagree. It's more a question of whether or not a slightly different presentation would benefit reviews and direct readers to ask more questions of their own experience.
Maybe it could help to take the raw measurements on the HAT (dBSPL) and and subtract the HAT Diffuse Field so that we are left with a graph similar to how speakers are represented? We can then easily observer the linearity of the headphones and it would be easier to argue over how much tilt the graph should have to be a good sounding headphone.
 
Back
Top Bottom