ASR Headphone Testing and BK 5128 Hats Measurement System

DDF · Aug 14, 2020

bobbooo · Aug 14, 2020

MZKM said:
at least 5 measurements of slightly different positions (including tilting the headband) should be done

Oratory measures every headphone 6-10 times with positional variation (and multiple headphone units of a model if he has access to them). Combined with the highly realistic anthropometric KB5000 pinna he uses (on a GRAS 43AG), of the ones currently publicly available his measurements are likely to be the most generalizable and closest to the average response heard from a particular headphone unit purchased.

JohnYang1997 · Aug 14, 2020

bobbooo said:
Oratory measures every headphone 6-10 times with positional variation. Combined with the highly realistic anthropometric KB5000 pinna he uses (on a GRAS 43AG), of the ones currently publicly available his measurements are likely to be the most generalizable and closest to the average response heard.

What's needed is typical not averaged.

Robbo99999 · Aug 14, 2020

JohnYang1997 said:
What's needed is typical not averaged.

median vs average? I'm not entirely sure how you'd apply that in this case.

JohnYang1997 · Aug 14, 2020

Robbo99999 said:
median vs average?

Roughly.

Jimbob54 · Aug 14, 2020

JohnYang1997 said:
Roughly.

Mode

bobbooo · Aug 14, 2020

JohnYang1997 said:
What's needed is typical not averaged.

I never said which kind of average

Maybe he does calculate the median instead of mean response, I wouldn't put it past him.

bobbooo · Aug 14, 2020

Jimbob54 said:
Mode

The mode is primarily for discrete variables - measured SPL at a given frequency is a continuous variable, so there likely won't be any common values to take as the mode.

Mad_Economist · Aug 14, 2020

Robbo99999 said:
That's some interesting data re the measured variance of human subject HRTF (Fig 5), I find the standard deviation graph to be one of the most useful there, as it's showing the bracket within which 68% of the population would fall. Up to 3kHz it's showing that 68% of humans would fall only +/- 1dB of the Target Response (mean). So for sure you can say HATS in terms of HRTF are valid up to 3kHz. From 4-7kHz you've got about +/-2.5dB variation, and above 7kHz you got about +/-5dB variation. So taking that all into consideration then I would think measurements are meaningless above 7kHz unless individuals responses above 7kHz are a predictable/consistent "Shelf dB" above or below the mean, in which case a person could experiment with Shelf EQ's above & below the calculated EQ (e.g. Oratory1990 for example)??

My previous paragraph, that just addresses HRTF calculated from "speakers in a room", but then you've got your HpTF that you mentioned on top of that as an additional variable and source of variance within a population. This is what you were showing in Fig 6. I think. Regarding variance of HpTF as seen in those graphs it seems to be about the same level of variance in HpTF as there is with HRTF and following the same patterns across the frequency range in terms of greater variance at the higher frequencies.

So given we have two sources of variance, both HRTF [(calculated from "speakers in a room") which is used to create any Target Frequency Response we use for headphone EQ] and HpTF which just shows the variance of a given headphones Frequency Response for any individual then where does that leave us in terms of how accurate EQ's can be that are based on measurements on Dummy Heads? I suppose combining the two variances of HRTF and HpTF this is further magnifying the overall error, I suppose you could work out mathematically how much that variance increases when you combine them given that variance of each variable seems about equal - would it double the overall variance or is it something like a 1.5 factor? EDIT: although in a previous post you said you've found HpTF and HRTF are often linked, so then that would indicate that you think when combining the variance of HpTF & HRTF that the overall variance is not as large as one would initially think, what kind of an increased variance factor would it be do you think?

Going back to an observation I made in the first paragraph that most variance is above 7kHz, can we use Shelf EQ's above 7kHz to manipulate EQ's that have been created on dummy heads to experiment ourselves as to what sounds most accurate, or does an individuals ear frequency response above 7kHz not follow the general trends at all, and thereby "Shelf EQ technique" above 7kHz holds no water?

Again, a lot of the validity of headphone measurements along with the validity of Target Responses will come down to how far an individuals physical anatomy varies from the dummy standard as a whole, although it seems pretty darn reliable & indisputable up to the 2kHz.

As for the validity of the B&K 5128 and the validity of the ASR headphone project.....I think we have to accept that there are all these variances within the population that we've talked about here, and if we're gonna do it then it would have to be something above & beyond what is offered on other sites. If the B&K 5128 itself is not inherently more "accurate" than the equipment being used by other sites then that's not a differentiation point either.....cursory evaluation seems to suggest B&K 5128 offering very similar results to other sites for HD650 it seems (I've not looked at this in detail, just gone off other members good posting in this thread). So any differentiation to other sites will come down to what we do with the measurements in terms of interpretations related to comparative headphone quality between different headphones (that might come under things like distortion & other measured variables incl frequency response), and perhaps of course we can also offer headphone EQ service in terms of offering filters for people.......I think we gotta do something different if B&K 5128 itself isn't proving to be "next gen" or "anything special".

Some good questions here.

In broad terms, I think the premise that high-frequency equalization should mostly take the form of shelf or very low Q peak filters (broad adjustments in level, essentially) is quite advisable. I personally take this to a relative extreme; I don't even aim to notch the peak from my HD800, because I've found that doing so without putting a hole in the surrounding treble somewhere is troublesome. For most practical people's purposes, so long as it doesn't sound bad, I'd say that relatively fine/high Q equalization is reasonable up to 8-10khz, although it depends on a number of variables (particularly how the headphone's response varies with position on the head of the wearer).

Regarding the intersection of HRTF and HpTF, let's consider two extremes and how they would impact things:

First, let's imagine a world where headphones are "HRTF chameleons" - by some process, whether acoustic or computer controlled, they perfectly approximate the individual wearer's HRTF in the target sound field. In this world, you could see extremely wide variation in HRTFs and HpTFs, but have zero variation in subjective timbre of headphones, because headphone subjective frequency response is equal to HpTF minus HRTF. This world is almost entirely reconcilable with Hammershøi & Møller's data.

Second, let's imagine a world where headphones are "HRTF blind" - perhaps in this world a trend of very deeply inserted in-ear monitors dominates, but for whatever reason, the HpTF is absolutely constant between wearers, even as individual HRTF varies. Subjective frequency response would track roughly according to the scenario you're outlining in this post - up to around the peak of the ear resonance there'd be agreement, and then it would all go south. A headphone that sounds peaky to one person would be smooth to another, and we'd have a great deal of trouble comparing our subjective impressions of headphones at all in the higher frequencies.

Now, in reality, we don't live in either of those worlds - some of the aspects of individual anatomy that influence HRTF also influence HpTF, so they aren't uncorrelated, but they also don't coincide perfectly. There's also the "x factor" of whether headphones have individualized and atypical interactions with some anatomy that doesn't relate in any way to HRTF - unarguably this happens at low frequencies with headphones with high acoustic impedance when a leak is present in the pad volume, but it could also happen at higher frequencies, and this would introduce another source of response variation.

Pragmatically, I don't think that we can use measurements of headphones on population average measurement fixtures to make some of the projections we can make about other types of equipment - e.g. "DUT A will sound the same as DUT B" - and non-individualized equalization is inevitably going to be an area where caution is wise, and erring towards broader filters and leaving the high Q features mostly alone is likely to yield better results on average, but equally, headphones are such radically audibly different devices that we don't need the degree of consistency of coherency between test and in situ that we have with amplifier or DAC measurements to make very reasonable extrapolations about what will sound better.

This all said, I still don't entirely grasp what Amir's primary goal with this headphone testing project is; if it's to redefine headphone metrology and take it out of the dark ages...well, we weren't in the dark ages to begin with, so that'd be pretty hard

If it's to present an additional impartial source of reliable headphone measurements alongside what presently exists (Oratory, Resolve/Headphones.com, Clarityfidelity/Speakerphone, Keith Howard/HeadphoneTestLab, Brent Butterworth/Soundstage Solo, etc), then we've already got validation of concept. If it's to improve on the current state of the art in headphone metrology that's very conceivable - the 5128 can reasonably claim to be the most accurate way to measure headphones that presently exists, and there's still plenty of room for innovation in methodology if Amir is interested in that.

Mad_Economist · Aug 14, 2020

DDF said:

Griesinger's approach is interesting but flawed - see again re: subjective loudness matching, it just does not yield the frequency response adjustments that will make for subjectively neutral timbre in headphones, and using a free field-ish reference just compounds that issue.

Prutser said:
Maybe Tyll Hertsens is looking for something todo again, maybe he could team up with ASR.

Let poor Tyll enjoy his retirement! He worked hard for us for years, he deserves some time partying in a van in the wilderness.

DDF · Aug 15, 2020

Mad_Economist said:
Griesinger's approach is interesting but flawed - see again re: subjective loudness matching, it just does not yield the frequency response adjustments that will make for subjectively neutral timbre in headphones, and using a free field-ish reference just compounds that issue.

It won't give a free field calibration because the room response in the modal frequencies will tend to get baked in (for good (e.g Harman boost < 100 Hz) or bad (nulls/peaks)).

Short of measuring your own HRTF, I think it would be more neutral than an off the shelf response meant for some global typical. But I can be convinced otherwise and would listen to a solid technical counterargument.

Mad_Economist · Aug 15, 2020

DDF said:
It won't give a free field calibration because the room response in the modal frequencies will tend to get baked in (for good (e.g Harman boost < 100 Hz) or bad (nulls/peaks)).

Do note, the <100hz Harman boost is entirely a product of EQ, it isn't the "natural" low-frequency response of the Revels in the Harman room, which starts to rise earlier and is much less substantially raised

I would intuitively tend to think that there would be some meaningful indirect contributions to the net response, but since I've put together the spreadsheet using Chris's HRTF-and-room-and-speaker formula, I decided to have a look. In the video, David is perhaps a meter from the speaker, in a relatively sizable room. I left the room volume matching the Harman reference room (because I am lazy - I'm happy to try a different volume if you like), plugged in the directivity of JBL 104 as representative of a small loudspeaker (again, if you've got another speaker's directivity data you'd like used, please feel free to suggest it), used an RT60 of .5s per Bradley 1986, and the IEEE 1652 free and diffuse field HRTFs as the references for direct and indirect sound. This was the result:

As you can see, it tracks fairly closely with the free field response for those parameters.

DDF said:
Short of measuring your own HRTF, I think it would be more neutral than an off the shelf response meant for some global typical. But I can be convinced otherwise and would listen to a solid technical counterargument.

There's some ambiguity here re: how significantly you think that on-head HpTF relates to listener HRTF - in the case of an in-ear monitor, I'd be somewhat more inclined to agree with you than with a circumaural headphone. This said, IMO when we consider the substantially north of subjectively preferred treble of free field targets, the SLD effect/the inherent error from comparing levels between two different perceived acoustic sources, and the fact that generic compensations like the Harman target generally correlate well with subjective preference, I think there's a strong enough case to be made.

The more difficult to argue against position would be something more like "level matched against good speakers in a farfield listening setup by ear" - I'm legitimately unsure of what would be preferred there. And at the point where we're measuring speakers in listening rooms at the ears (e.g. with intra-aural microphones), I'd definitely fall on the side that the individualized response would almost certainly tend to be preferred - so it really depends on what scenarios we're comparing here.

Edit: Pardon me, just watched a bit of the video again - David says around 18" from the loudspeaker, so I've corrected my spreadsheet with a .5m speaker distance:

I've also attached the zipped .xlsx so you can have a play with it yourself, if you'd like.

amirm · Aug 15, 2020

Mad_Economist said:
This all said, I still don't entirely grasp what Amir's primary goal with this headphone testing project is; if it's to redefine headphone metrology and take it out of the dark ages...well, we weren't in the dark ages to begin with, so that'd be pretty hard If it's to present an additional impartial source of reliable headphone measurements alongside what presently exists (Oratory, Resolve/Headphones.com, Clarityfidelity/Speakerphone, Keith Howard/HeadphoneTestLab, Brent Butterworth/Soundstage Solo, etc), then we've already got validation of concept. If it's to improve on the current state of the art in headphone metrology that's very conceivable - the 5128 can reasonably claim to be the most accurate way to measure headphones that presently exists, and there's still plenty of room for innovation in methodology if Amir is interested in that.

What I want to do is data dependent. There are two questions here:

1. Do we get into headphone testing at all?

2. Do we do it with 5128?

To go with 5128, it needs to show substantial value over existing rigs. If the difference is very minor, then it makes no sense to pay its huge premium.

We have to somehow quantify 5128's value. In a sea of measurement variability this is challenging but we need to do that in the next few days before I have to return the thing.

Mad_Economist · Aug 15, 2020

amirm said:
What I want to do is data dependent. There are two questions here:

1. Do we get into headphone testing at all?

2. Do we do it with 5128?

To go with 5128, it needs to show substantial value over existing rigs. If the difference is very minor, then it makes no sense to pay its huge premium.

We have to somehow quantify 5128's value. In a sea of measurement variability this is challenging but we need to do that in the next few days before I have to return the thing.

While I definitely agree, this definitely makes your definition of "very minor" extremely important here - from my standpoint working with HATS, we are close enough to anthropomorphic systems already that there are only relatively minor improvements remaining in our emulation of heads and ears. More flexibility/collapsibility that better mirrors real pinnae, matching real ears' acoustic impedance to a higher frequency, etc. I'm quite confident that the 5128 provides this - I'm not sure if that means it's more than a very minor improvement relative to ex. a GRAS fixture to you.

Of course, one caveat is that there are not many 5128s in the wild at present, so we haven't had a lot of time for data eccentricities to shake out - seeing how the headphones you have on-hand at present measure will be interesting, at least.

Mad_Economist · Aug 15, 2020

Reflecting on comparative value a bit, I will say that unless you plan to measure prolifically, $40kusd on headphone metrology is probably better spent on a $10k fixture and hiring an underling to measure a thousand or so headphones for you, but that does admittedly discount the substantial negative value of having to manage underlings...

crinacle · Aug 15, 2020

Mad_Economist said:
Reflecting on comparative value a bit, I will say that unless you plan to measure prolifically, $40kusd on headphone metrology is probably better spent on a $10k fixture and hiring an underling to measure a thousand or so headphones for you, but that does admittedly discount the substantial negative value of having to manage underlings...

I'd gladly be a hired gun for the chance to fondle a 5128

Tks · Aug 15, 2020

Mad_Economist said:
Reflecting on comparative value a bit, I will say that unless you plan to measure prolifically, $40kusd on headphone metrology is probably better spent on a $10k fixture and hiring an underling to measure a thousand or so headphones for you, but that does admittedly discount the substantial negative value of having to manage underlings...

Worse in my book. You know that feel you get when you buy something really expensive, and then realize it's only the beginning? Same here, I feel like if I bought this thing, I want to clean up my work and presentation and put out very professional reviews with lots of formatting and interactivity. Ain't gonna be no thing like with DACs and AMPs where sometimes a handful of measurements are missing, while on some reviews you see metrics never before employed.

For something like this fixture, I'd like for laymen to come by and see the value not only innate with the potential the measurement rig provides, but also cleanly presented data that gives me a reason to stay here more, rather than other sites. And this has nothing to do with some sort of explicit monetary incentive, I'd just feel like I'd be insulting myself if I didn't make the headphone reviews while using this rig - the best reviews on the internet basically.

Mad_Economist · Aug 15, 2020

Tks said:
Worse in my book. You know that feel you get when you buy something really expensive, and then realize it's only the beginning? Same here, I feel like if I bought this thing, I want to clean up my work and presentation and put out very professional reviews with lots of formatting and interactivity. Ain't gonna be no thing like with DACs and AMPs where sometimes a handful of measurements are missing, while on some reviews you see metrics never before employed.

For something like this fixture, I'd like for laymen to come by and see the value not only innate with the potential the measurement rig provides, but also cleanly presented data that gives me a reason to stay here more, rather than other sites. And this has nothing to do with some sort of explicit monetary incentive, I'd just feel like I'd be insulting myself if I didn't make the headphone reviews while using this rig - the best reviews on the internet basically.

I'll just note here, these things are not contradictory - indeed, to some degree they align well. RTings, which I would argue has the slickest and most extensive (albeit not necessarily best) presentation of headphone measurements online, started moving the grunt work of measurements to lower level people fairly early, leaving more time for development presentation for the folks up top.

Of course, there are many merits to the "full stack" approach, exemplified by Tyll Hertsens and Brent Butterworth - particularly the ability to present "one-off" measurements as pertinent to the product, and an ability to easily "cross-check" the subjective and objective analyses against each other - but it's time consuming as well, and Amir's time is finite. Measuring headphones can, honestly, be kind of a pain in the butt, and generally headphone measurement databases don't "catch on" until and unless they have dozens or hundreds of common points of interest measured, so there's a quality to quantity here.

Edit: FWIW, Amir seems to be indefatigable in general reviewing, so I'm certain that if he did do things himself and was as dogged as he has been with speakers, his headphone measurements and reviews would stack up quite quickly.

Beershaun · Aug 15, 2020

Things like reliability, repeatability, ease of use, will go a long way to justifying a premium price. In many cases where you are buying precision equipment the cheapest way is to spend your money once on the most expensive highest quality device. Instead of spending it over and over again as you learn through trial and error why the other cheaper devices were cheaper.

amirm · Aug 15, 2020

Created a new thread with measurements. Please see: https://www.audiosciencereview.com/...easurements-using-brüel-kjær-5128-hats.15352/

ASR Headphone Testing and BK 5128 Hats Measurement System

Addicted to Fun and Learning

Major Contributor

Master Contributor

Master Contributor

Master Contributor

Grand Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Attachments

Founder/Admin

Addicted to Fun and Learning

Addicted to Fun and Learning

Member

Major Contributor

Addicted to Fun and Learning

Major Contributor

Founder/Admin

Similar threads