• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Resolve's B&K 5128 Headphone Target - you can try the EQ's.....

Status
Not open for further replies.
We have already had discussions with Sean where he accepts that similar work done for speakers needs additional work. And hence accepts the reason I don't use it. It think you are buying yourself a world of hurt by promoting single metric preference ratings for headphones. Such is visible even in Sean's paper:

View attachment 283379

Predicted score of 60 refers to actual ratings from 25 up to 70+! Sean also correctly states how trained listeners produce far different scores than normal listeners:
It fits quite well in my analysis (blue)
 

Attachments

  • Screenshot_20230504_221021_Chrome.jpg
    Screenshot_20230504_221021_Chrome.jpg
    293.6 KB · Views: 168
It fits quite well in my analysis (blue)
There's around 8 significant outliers from that trend I'd say by eyeballing it & counting the outliers. So 8 outliers out of 32 headphones, or 25% of them not really following the trend - personally I'm not a fan of the Predicted Score, I don't find it that useful for assessing headphone performance, I'd rather just interpret the raw frequency response measurement vs Harman Target. So use of Predicted Score in reviews should be taken with grains of salt. (Although I appreciate that Harman had to come up with a numbered rating system in order to show correlation of data, and in that respect it's useful, but it certainly has limitations for detailed headphone reviews & appraisal).
 
The second thing I'll say here is that industry buy-in from Dr. Olive and Harman may be a requirement for many who are familiar with their research, but there there is additional research on this topic outside of what Harman has done.
Careful about that "research." That was what Mad_ told me so I let him create a target based on one of these papers for 5128 only to realize the results were not making sense after I started testing headphones. This is an awfully tricky area to get one's arm around. It took significant effort across many years and countless research projects for Dr. Olive and crew to come up reasonable target curves. One-shot research may not apply I am afraid. Don't be tempted to make my mistake.

But as it relates to buy-in, I was speaking with Sean about all of this and he did indicate they want to do research with the 5128 as well since that's set the new measurement standard. I think we can all look forward to that.
If so, the prudent thing would have been to wait until Sean completed that work instead of cooking up your own based one or two papers. Indeed, that was where I went after our own made up target failed. Alas, it became clear that Harman/Sean had their own business area of research they want to pursue and no solution for 5128 would be forthcoming. That is when I sent the fixture back and told B&K that they need to produce such a target (their answer was that this was outside of the scope of what they were doing). You had perfectly good fixture to use for measurements so there was no reason to jump on the 5128 bandwagon.
 
Yeah, we're currently beta testing a bunch of slopes, so the focus has been on the 5128. The thing is, we've measured Grados in the past on the 43AG already. For new products (or at least new to our lab) they get both.
Ok then. Waiting for new reviews containing both sets of measurements.
As much as I don't agree with you on many of aspects of the headphones that you talk in subjective part of the videos I highly value your measurements and that at least you are trying to find correlation between measurements and subjective feeling.
99% of youtubers don't do that and in 90% of the time I find their reviews to by abysmal.
 
Don't be tempted to make my mistake.
In all fairness... several years have past and can only guess how much experience Mad_ has added in his arsenal of absorbing geeky science on this aspect.

If so, the prudent thing would have been to wait until Sean completed that work instead of cooking up your own based one or two papers.
Unless one had the cash to waste on it and were eager to try and find out if the (arguably more 'accurate' acoustic impedance) could still lead to 'more accurate' results and not is willing to wait till Harman, which already has a well researched target/fixture, is finally interested in repeating extensive tests but perhaps rather would apply their knowledge and see how the 5128 responds with real headphones and only then start to try to create a target.
It seems to me they (Harman) or some university totally wanting to research the crap out of the 5128 combined with perception are not the only ones that may want to spend time on it. Time spent and knowledge may well be the key to reaching clear and valid research.

It appears from the replies of those guys is that they are just merely now and then are working on it out of personal interest. Still... I don't see why those guys should not be looking at it while they can actually compare it to their own GRAS in all kinds of ways and be open about experiments.
Sure Resolve (consumer, reviewer viewpoint) will look at headphones and fixtures differently than Mad will (science geek, maybe blinded by science). That to me makes it interesting to see where they end up with. After that I will make my mind up on whether or not they succeeded.

I can understand why you don't see it that way and don't want to waste time on this.
 
One issue in waiting for Harman is whether their legal department will let them release any such research. See how they censored the target curve for 5128 in Sean's presentation:

FAjdm-hUUAM3kxU


Also note that even with the adjusted target, one does not get the same preference score with the two fixtures (84 vs 77). One can only imagine on impossible it would be then to create a preference score using Sean's linear regression but with a DIY target! We went through the same with speakers where my Klippel NFS measurements show more variations than Harman's anechoic so result in different scores.
 
There's around 8 significant outliers from that trend I'd say by eyeballing it & counting the outliers. So 8 outliers out of 32 headphones, or 25% of them not really following the trend - personally I'm not a fan of the Predicted Score, I don't find it that useful for assessing headphone performance, I'd rather just interpret the raw frequency response measurement vs Harman Target. So use of Predicted Score in reviews should be taken with grains of salt. (Although I appreciate that Harman had to come up with a numbered rating system in order to show correlation of data, and in that respect it's useful, but it certainly has limitations for detailed headphone reviews & appraisal).
It's not even a trend line is it? The dotted line is just x = y , showing where the predicted score = actual score if a data point fell on it. @Chocomel is misinterpreting it by just drawing a circle around the line; it doesn't really mean anything (unless it was meant in jest).

I'm following both sides with interest but @amirm 's interpretation of the graph, 'Predicted score of 60 refers to actual ratings from 25 up to 70+', is at least reading it correctly.
 
It's not even a trend line is it? The dotted line is just x = y , showing where the predicted score = actual score if a data point fell on it. @Chocomel is misinterpreting it by just drawing a circle around the line; it doesn't really mean anything (unless it was meant in jest).

I'm following both sides with interest but @amirm 's interpretation of the graph, 'Predicted score of 60 refers to actual ratings from 25 up to 70+', is at least reading it correctly.

I think it could be, the actual linear trend line aligns well :

Screenshot 2023-05-05 at 11.41.02.png


But we're not testing the correlation between two measured independent variables, we're testing the correlation between a measured variable and a mathematical model purposefully designed to predict it, so it's to be expected that it should at least provide a sensible trend line :D ! That such a model has a decent correlation with the actual scores and trends sensibly is the least we should expect from it (duh !).

Because of its predictive purpose, wouldn't that also be why the presence of more or less significant outliers would be an issue ? If one wants to apply that model as a way to give a specific pair of headphones a grade, or rank headphones, if the number of significant outliers is too important it will create a rating system that's at best useless and at worst misleading. How do you know that the specific model of headphones you're currently reviewing based on that system aren't going to be a more or less significant outlier ? You've just slapped a "52" score on a pair of headphones, what if, if virtualised and subjected to listening tests, it would actually score a "70" ?

Example, as a ranking system, showing how the the predictive system results in quite significant shifts in the ranking order vs the actual score with this data set :

Screenshot 2023-05-05 at 12.11.28.png


All of that is not looking like something that is effectively doing a lot of predicting in my view ?

And I know that by this point I'm just a parrot but we're not even considering HPTF issues.
 
It's not even a trend line is it? The dotted line is just x = y , showing where the predicted score = actual score if a data point fell on it. @Chocomel is misinterpreting it by just drawing a circle around the line; it doesn't really mean anything (unless it was meant in jest).

I'm following both sides with interest but @amirm 's interpretation of the graph, 'Predicted score of 60 refers to actual ratings from 25 up to 70+', is at least reading it correctly.
It was just a joke ya :). I would never actually cherry pick data like that.
 
Also note that even with the adjusted target, one does not get the same preference score with the two fixtures (84 vs 77).
Yep, that is the consequence of fixtures not reacting the same way to a headphone.

One question could be which is more in line with that from the public rather than to a predictive model. Of course... Harman researched showed a relation but a different fixture will, in the best case, just shuffle the number ranking and maybe also the 'tilt' number. The question is will that be worse or better matching with reports from 'the public' (taken over a very large number of 'layman' reports. We can only say something once the 'target' has been determined and more than enough 'reports' have been analyzed.
One will certainly be closer to a the old standard where the 5128 might be closer to that obtained from a future standard.
Speculation or hunches is one thing. Data is another. Data is not yet available as the target is not defined yet but only some preliminary ones that will very likely differ from the one Mad_ came up with in the days a 5128 came to your house for a visit and date with headphones you had at hand.

Also acc. to Dr. Olive the preference rating should be at least 7 points in difference to become meaningful (resolution of the scale is too high)
77 and 84 happens to have a 7 point difference so would be on the edge of being appreciable ?

Of course there will be people buying a headphone that is rated say 74 but has awful ergonomics over a headphone that is 73 (and may even have a different slope) that is very comfortable and still believe they have 'the better' headphone.
This is the biggest gripe I have with ratings. These are just acc. to some aspects of a measurement on a specific fixture taken in a rather limited frequency band.

Indicative... sure a really poor one will very likely be at the bottom of the list (and some folks might still like those) and the 'better' ones will be higher up in the ranking.

Like you I have stayed out of the conversations, both here, at headphones.com and other site(s) where nerds converge on this subject.
I know too little about the actual science is my excuse which differs from your standpoint. I see no harm in them trying. It does not invalidate the present science and could be an addition or even show it does not improve anything or just shuffles ratings a bit and results in different 'computer generated EQ'.
 
Last edited:
One issue in waiting for Harman is whether their legal department will let them release any such research. See how they censored the target curve for 5128 in Sean's presentation:

FAjdm-hUUAM3kxU


Also note that even with the adjusted target, one does not get the same preference score with the two fixtures (84 vs 77). One can only imagine on impossible it would be then to create a preference score using Sean's linear regression but with a DIY target! We went through the same with speakers where my Klippel NFS measurements show more variations than Harman's anechoic so result in different scores.
Yeah, doesn't seem like a good omen at all for Harman sharing any future research! :(
 
Headphones.com's measurements are indeed their IP - so too are my measurements (that is, those done on my personal gear) my IP, but I'm quite unconcerned with people using mine. This is pretty standard - e.g. @oratory1990's measurements are the IP of his firm.
It's fair for the originator of the data to decide. @oratory1990 has allowed for his data to be collected in a few third party projects. With headphones.com data it was met with a takedown request. So the way to access it is scattered in reviews across different platforms and forums, and discord. If 43AG measurements were published.
 
Last edited:
It's fair for the originator of the data to decide. @oratory1990 has allowed for his data to be collected in a few third party projects. With headphones.com data it was met with a takedown request. So the way to access it is scattered in reviews across different platforms and forums, and discord. If 43AG measurements were published.
Yep, best places for that are the Headphones.com forum and Discord (perhaps obviously, given it’s their platform)
 
Last edited:
Yep, best places for that are the Headphones.com forum and Discord (perhaps obviously, given it’s their platform

Yeah we don't care that our data gets shared around in various places, in fact that's a good thing for everyone. The issue is it was publicly represented in someone's database without our permission, and we're perfectly justified in requesting it be removed.


Now... I don't think it was done so with any malicious intent, and I expect the person didn't actually know this kind of thing is protected under copyright (as are data scrapings in general... turns out). But we're also building something to better represent our data right now and would prefer it not to exist in other public databases that aren't ours.
 
Last edited:
It's fair for the originator of the data to decide. @oratory1990 has allowed for his data to be collected in a few third party projects. With headphones.com data it was met with a takedown request.
Insofar as I'm aware, none of Konstantin's measurements have been uploaded to squiglink - if this is wrong, please let me know so I can let him know.

So the way to access it is scattered in reviews across different platforms and forums, and discord. If 43AG measurements were published.
I think you'll be quite pleased with something we've got in the works at the moment for displaying our and other folks' data! That's about all I can say for now, but keep your eyes peeled.
 
Careful about that "research." That was what Mad_ told me so I let him create a target based on one of these papers for 5128 only to realize the results were not making sense after I started testing headphones.
I'm not terribly keen to jump to the defense of a curve I threw together from on paper in a moment...but what the heck, let's look at some comparisons.

This is the HD650 you measured in 2020 on the 5128 fed into the 2018 model but with my "synthetic" 5128 Harman target as the target response
1683515830890.png


Here by comparison is your measurement on the 45CA against the 2015 target (which was the basis for the target I provided you in 2020)

1683515901931.png

And here is Keith Howard's measurement of the HD650, which found a spread in PPR encompassing both results (likely, based on the differing ear gain on his measurements, due to either very different placement or severe defect of one channel).

If it would be of interest to people, I can perform the same for the HE400i, ether CX, and so on - coupling variation will of course be a factor (and, again, should people like, we can replicate this sort of thing with our 43AG and 5128 - indeed, this is part of my intended internal testing of our targets).
It took significant effort across many years and countless research projects for Dr. Olive and crew to come up reasonable target curves. One-shot research may not apply I am afraid. Don't be tempted to make my mistake.
This really isn't true, though - the "RR1_G" target was produced in 2013, in one of the first few papers, and was strongly preferred! While the Harman work refined subsequent targets, its primary notable feature was controlling for a ludicrous number of variables across subsequent studies (which was a very good thing to do, and we all owe Sean a debt of gratitude for his efforts).

Also note that even with the adjusted target, one does not get the same preference score with the two fixtures (84 vs 77)
This adjusted target is simply based on an average difference for headphones in the dataset, so tautologically unless they all varied the same way, it was going to vary substantially - in that case, the headphone in the plot had smaller variations than several of the outliers left in the dataset, otherwise it would have come closer. Ultimately, using headphones on different rigs cannot reveal a transfer function between them (because no such thing exists), but that also isn't what we really should be looking for to begin with...

All of that is not looking like something that is effectively doing a lot of predicting in my view ?
From my POV this is pretty harsh - all predictive models are flawed, and there are always outliers, but the PPR model is pretty good! It gets the "shape" of things generally right, with good headphones trading places somewhat, and a couple of headphones rated as being much worse than they were heard to be - odds are you could avoid those "false negatives" with some refinements to the algorithm (and, indeed, the slightly more complex in-ear model has fewer of them).

You're very unlikely to build a perceptual model that's perfect in predicting human responses to complex stimuli, IMO - but one which can look at FR and spit out a generally not-too-far-off ranking of headphones is quite an innovation! An imperfect but still significantly better than guessing (and, if you want my two bits, probably better than "dead reckoning" based on plots) model is a useful starting point for future development!
 
The biggest question is (and will remain for a while) which of the 2 is closer to reality.
The differences are all above a few kHz where neither plots may look anything like what the wearer of the HD650 in question might experience.
Based on my crappy $1.- FP mic measurements my bet (closest to reality) would be on the 5128 in the comparison you just made... above 8kHz but.. at 5kHz agrees more with the GRAS.
The HD650 is one of the easiest headphones to measure consistently IME.

The generated number tells me nothing really other than... this headphone could well sound good to me but it won't tell me if I might prefer another headphone with a similar range number (not looking at slope).
 
Last edited:
From my POV this is pretty harsh - all predictive models are flawed, and there are always outliers, but the PPR model is pretty good!
So we thought with the similar metric for speakers except that correlation was much better than headphones. Reality then set in when we started to test, listen and EQ countless speakers. The conclusion is now clear that a single number cannot possibly predict the infinite variations we see in speaker responses, nor does it take into account psychoacoustics of our hearing system. As I quoted earlier, Dr. Toole now agrees with us that this score is not something you want to chase. Do otherwise at your peril especially when you are going to develop your own target which itself can have significant errors compared to what was used to develop this simple linear model.
 
Let me add a key point: the listener scores are themselves variable. What is used in the vertical axis is 95 percentile value. This is NOT what is normally done to find trends where we start with measurements, not subjective assessments:
index.php


Further complicating matters is that the preference score is not from a real headphone but a surrogate one EQed to the response of the test headphone. The real headphone will perform differently so that is another variability.

Given all of this, even if we had 100% accuracy in our predicted score, we would still be facing errors due to listener data themselves not being precise!
 
Based on my crappy $1.- FP mic measurements my bet (closest to reality) would be on the 5128 in the comparison you just made... above 8kHz but.. at 5kHz agrees more with the GRAS.
After testing some 300+ headphones and speakers, the story is told by the time you get to 5 or 6 kHz. I have applied EQ correction above that but the effect is incredibly small. There is not just not a ton of content above those frequencies and what is there is more to taste and reality. Get the response correct to 5 or 6 kHz and you are golden. Worst offences happen from 500 Hz to 3 kHz really.
 
Status
Not open for further replies.
Back
Top Bottom