HBK Headphone Measurement Talks from Head-Fi and Sean Olive

GaryH · Oct 26, 2021

MayaTlab said:
Inasmuch as I agree, I'm not quite certain that we're there yet or any time soon. Have you tried the N90Q ?

Yes. They do seem to sound smoother post-calibration (a view Tyll of Innerfidelity shared), but I do not have confidence in that subjective evaluation, so it would be good to see proper measurements of them calibrated on-HATS and on-head (but both measured on-HATS) to see what TruNote is doing exactly.

MayaTlab said:
In-ear measurements will have to make do in the meantime, and while they have their own limitations, as I've already explained to you, there are ways to cut through the noise and extract actionable results, even above 1kHz.

And as I've already explained, I remain unconvinced those convoluted methods have achieved that.

MayaTlab said:
That's good to know, but I am not aware of an actual article that evaluated that. If you're aware of it I'll take it.

Not that I know of, but there could be. I doubt Harman publish every single internal investigation they do as an AES paper though. It would make sense that a training program that delineates sound quality into specific areas as How to Listen does would result in the participant being more adept at identifying and using these same specific terms to accurately describe sound quality:

Band Identification
Peaks, Dips, Peaks & Dips, Low and High pass filters
Bright-Dull
Full-Thin
Coloration
Reverberation
Noisy/Noise-free
Hum/Hum-free
Left/Right Balance (stereo mode)
Front/Rear Balance (surround mode)

MayaTlab · Oct 26, 2021

GaryH said:
And as I've already explained, I remain unconvinced those convoluted methods have achieved that.

You haven't because you very specifically avoided responding to my explanations in the relevant thread.

GaryH said:
these same specific terms to accurately describe sound quality:

These specific terms have very little to do with the terminology that we've been talking about (ie the stuff audio reviewers love to use, such as "slam", "imaging", "detail", "soundstage"), and you know it. Goalpost shifting ?

I'm bored with that fruitless exchange anyway. Have a nice day.

ADU · Oct 27, 2021

Re the Harman listening tests... if you think that it could help you to improve or hone your listening and reviewing skills, then I would suggest doing it.

Test results can sometimes be faked though with a bit of practice, as Amir demonstrated in the video below. So I personally would not put too much stock in a reviewer's test results on something like the Harman listening tests. I suppose that something like this might give a newbie to the subject of headphones maybe a degree of greater confidence in a reviewer's abilities or competency. If you are a fairly experienced user/listener of headphones though, then it isn't that difficult to separate the reviewers who have some idea what they're doing and talking about from those who do not, by simply reading or listening to their reviews for a number of different headphones.

ADU · Oct 27, 2021

Re comparative listening tests between different headphones... Fwiw, I agree with both Resolve and Gary H, and also Tyll Hertsens about the need and importance of this.

When I'm buying a new pair of headphones, I always try to listen to as many different models as possible to get what I'm personally after in the way of both sound quality and frequency response. Graphs help (if you actually know how to read and interpret them,... which alot of people do not). But they aren't a replacement for actual listening imo.

If a headphone is considerably off of a neutral response, then I would hope that I'd have at least some ability to discern that by simply listening to that one headphone all by itself. And maybe playing around with some EQ-ing on it. Audio memory is poor though (at least mine is anwyay). So I always prefer to have several different headphones to use as a basis for comparison. And based on my own experiences on this, I'd probably have much less faith in headphone reviews where a reviewer DID NOT perform these types of comparisons.

As I've mentioned previously, the ability to compare and analyze the sound of several different headphones when doing a review was the main reason why Tyll maintained his Inner Fidelity headphone "wall of fame".

solderdude · Oct 27, 2021

ADU said:
If you are a fairly experienced user/listener of headphones though, then it isn't that difficult to separate the reviewers who have some idea what they're doing from those who do not, by simply reading or listening to their reviews for a number of different headphones.

That's certainly the case. When you often find yourself in agreement with certain subjective reviewers then you get a feel for who to trust.

The issue is that when one hasn't got much experience with headphones that have been reviewed by these reviewers then you have no way of knowing who to trust.

I don't think flashing your hard earned 'Harman level' is going to mean much for newbies looking for headphones nor for reviewers you often do not agree with.

If anything... when you are interested in finding out your hearing capabilities I do recommend doing this for yourself.

ADU said:
Re comparative listening tests between different headphones... Fwiw, I agree with both Resolve and Gary H, and also Tyll Hertsens about the need and importance of this.

Yes.. never ever judge the sound of a headphone without a reference to compare it to and to 'reset' your hearing.
It is very easy to get used to a certain sound signature. Compare anything against it and it will sound 'incorrect' at least for a while. Even when this is a true reference headphone.
Hearing a single headphone somewhere without a reference is almost guaranteed to leave you with the wrong impression.

ADU · Oct 27, 2021

solderdude said:
I believe Harman research showed that experienced/trained listeners are better at correctly identifying good sound quality.

This may be your conclusion from the Harman research. But to the best of my knowledge, Harman made no such determinations.

They concluded that the trained listeners were more consistent in their responses to the subjective tests. Which is not the same thing as "correctly identifying good sound quality" imo.

To the best of my knowledge, there isn't really a "correct" or "incorrect" response when it comes to subjective listening tests. Because all you're really looking at in those kinds of tests are people's preferences. Which may or may not align precisely with a "neutral response" (whatever that is).

If you are using an "average preference" as your benchmark though for what is the "correct" response, then I believe that the trained listeners did not fare as well as a group in some of the early subjective listening tests as the listeners as a whole did. Because the trained listeners generally seemed to prefer both less bass and less treble than the average preferences of all of the listeners.

Whether that was also the case in some of their later tests though, I don't really recall. Perhaps Dr. Olive can shed some more light on that.

solderdude · Oct 27, 2021

So Harman training was not useful and gave no benefits other than they were more consistent, which differs from becoming better at evaluating sound ?

ADU · Oct 27, 2021

solderdude said:
So Harman training was not useful and gave no benefits other than they were more consistent, which differs from becoming better at evaluating sound ?

This is probably a question better posed to Dr. Olive. But I suppose that could be one possible conclusion you could draw from some of their early tests.

ADU · Oct 27, 2021

solderdude said:
The issue is that when one hasn't got much experience with headphones that have been reviewed by these reviewers then you have no way of knowing who to trust.

Fwiw, I'd agree with this. If the reviewer has not done reviews of headphones that I've actually heard or used myself, then it's a little harder to assess their potential biases and whether they actually know what they're talking about.

One way to help ameliorate that on the user/listener's end is simply to listen to more headphones, to better expand your knowledge and experience on that end.

GaryH · Oct 27, 2021

ADU said:
This may be your conclusion from the Harman research. But to the best of my knowledge, Harman made no such determinations.

They concluded that the trained listeners were more consistent in their responses to the subjective tests. Which is not the same thing as "correctly identifying good sound quality" imo.

They didn't just find trained listeners were more consistent/reliable, they found they were more discriminating too. This does result in them being able to identify good sound quality (to them) better, and here's why:

Screenshot_20211027-155338_Acrobat for Samsung.png

Look carefully at the both the size of the error bars (which is indicative of listener reliability), and how spread out the ratings are (indicative of listener discrimination). Now let's zoom in to headphones 26-28. Both untrained and trained listeners on average rated these in the same order of preference. However, due to the lower rating spread (poor discrimination) and bigger error bars (poor reliability), statistically you cannot conclude that the untrained listeners could distinguish between them, rating them equally within margin of error. Conversely, the trained listeners demonstrated better discrimination (bigger spread) and reliability (smaller error bars), allowing you to conclude that they could in fact distinguish between all three headphones, and so accurately identify which had better sound quality (to them).

ADU · Oct 27, 2021

GaryH said:
They didn't just find they were more consistent/reliable, they found they were more discriminating too. This does result in them being able to identify good sound quality (to them) better, and here's why:

View attachment 161700

Look carefully at the both the size of the error bars (which is indicative of listener reliability), and how spread out the ratings are (indicative of listener discrimination). Now let's zoom in to headphones 26-28. Both untrained and trained listeners on average rated these in the same order of preference. However, due to the lower rating spread (poor discrimination) and bigger error bars (poor reliability), statistically you cannot conclude that the untrained listeners could distinguish between them, rating them equally within margin of error. Conversely, the trained listeners demonstrated better discrimination (bigger spread) and reliability (smaller error bars), allowing you to conclude that they could in fact distinguish between all three headphones, and so accurately identify which had better sound quality (to them).

Thank you for posting this, GaryH.

If you are using the size of the scores as your gauge for a listener's ability to discriminate, then it appears that the trained listeners performed slightly better than the untrained listeners (and also listeners as a whole) on that, based on the above graph.

I believe this graph is from one of Harman's early studies though, which was based on a relatively small sampling of listeners that included some Harman employees (including possibly Dr. Olive?). So I'd discourage people from trying to draw too many broad-based conclusions from the results in this test about headphone users and listeners as a whole. If you use the degree of spread or difference between the highest and lowest ranked headphones in the study though as your gauge for which group is the most discriminating, then the untrained listeners performed about the same as the more trained listeners. Because there is about a 50 point spread between the highest and lowest rated HPs in both groups.

The (apparent) ability of the trained listeners to better separate some of the headphone rankings which are more in the middle of the preference range than the untrained listeners is interesting. But I'm not sure that necessarily translates to a better ability at correctly identify "good sound quality" on there part. What it might possibly indicate is some increased ability to parse, separate, or form opinions on headphones with different types of mediocre sound quality... Which could be a very useful skill for some headphone reviewers.

-----------------------------------

The generally smaller error bars on the group of trained listeners is also not that surprising. Because if you take a group of people and train them to listen for certain qualities or characteristics in a transducer (for example), then you'd expect their responses to be more in sync or consistent with one another than in a group of individuals that has not be trained to look for those specific qualities or traits. So that just makes sense.

Notice also that the error bars are about the same size in the two groups on the two most highly rated headphones in the test (which are Harman's "Target", and "Target 2"). And that the differences in the size of the error bars between the two groups only starts to become more apparent or significant as you begin to go down the scale towards the headphones with lower preference ratings than the top two.

Imo, that indicates a similar ability to discriminate the sound quality of the "better sounding" headphones in the two groups (which is perhaps a more useful skill). And a somewhat reduced ability to form those same kinds of opinions with the "poorer sounding" headphones in the untrained listeners. To put this another way, the trained listeners were better at forming the same or similar opinions to one another when listening to what they perceived as a headphone with poorer or less-pleasing sound quality. (Which is also not surprising, because that's undoubtedly what the Harman training taught them to do!)

It takes an additional step (or maybe even two, or possibly three) of logic or imagination though, imho, to reach the conclusion that this would somehow also make the trained listeners better at correctly identifying (and discriminating) "good sound quality" in a more general sense, as solderdude seemed to imply in his previous comment.

-----------------------------------

As far as objective gauges or metrics for a headphone's overall sound quality are concerned, I think the only conclusion that the Harman research really reached was that people generally seemed to prefer the sound of a headphone that more closely resembled the frequency response of a pair of well-extended, neutral, anechoically flat loudspeakers in a room than a headphone which did not... Which is basically the same conclusion they also reached in their loudspeaker tests. Beyond that, I think the only other sound quality characteristic or issue they looked at a little bit on headphones (and made public) was nonlinear distortion. And I believe in the one or two tests they did on that, they found a relatively low correlation between that characteristic and a headphone listener's preferences. Some more research is probably needed on this though.

Going back to the FR testing... If I remember right, I think the first Harman Target on the above graph, which was ranked the highest by the untrained listeners, was the headphone that probably came the closest to the measured in-ear response of a good loudspeaker in a room. And Harman's "Target 2" was based on a similar response, but with both the bass and treble reduced a bit. So if you were using the (unaltered) response of a neutral loudspeaker in a room as your metric for "good sound quality", then it would seem from the above graph that the untrained listeners were actually better at identifying that correctly than the trained listeners. Because they gave the first Harman Target a higher score than the modified Harman Target 2.

I'm goin mostly by memory on alot of this though. So I'm not entirely sure this is all correct.

ADU · Oct 27, 2021

ADU said:
I think the only other sound quality characteristic or issue they looked at a little bit on headphones (and made public) was nonlinear distortion. And I believe in the one or two tests they did on that, they found a relatively low correlation between that characteristic and a headphone listener's preferences. Some more research is probably needed on this though.

This is slightly OT, but this is the one study related to distortion (using headphones) that Harman was involved in that I'm aware of (and was referencing in the above comments)...

And I think this is the AES paper from which this was derived...

https://www.aes.org/e-lib/browse.cfm?elib=17441

pozz · Oct 27, 2021

ADU said:
Thank you for posting this, GaryH.

If you are using the size of the scores as your gauge for a listener's ability to discriminate, then it appears that the trained listeners performed slightly better than the untrained listeners (and also listeners as a whole) on that, based on the above graph.

I believe this graph is from one of Harman's early studies though, which was based on a relatively small sampling of listeners that included some Harman employees (including possibly Dr. Olive?). So I'd discourage people from trying to draw too many broad-based conclusions from the results in this test about headphone users and listeners as a whole. If you use the degree of spread or difference between the highest and lowest ranked headphones in the study though as your gauge for which group is the most discriminating, then the untrained listeners performed about the same as the more trained listeners. Because there is about a 50 point spread between the highest and lowest rated HPs in both groups.

The (apparent) ability of the trained listeners to better separate some of the headphone rankings which are more in the middle of the preference range than the untrained listeners is interesting. But I'm not sure that necessarily translates to a better ability at correctly identify "good sound quality" on there part. What it might possibly indicate is some increased ability to parce, separate, or form opinions on headphones with different types of mediocre sound quality... Which could be a very useful skill for some headphone reviewers.

-----------------------------------

The generally smaller error bars on the group of trained listeners is also not that surprising. Because if you take a group of people and train them to listen for certain qualities or characteristics in a transducer (for example), then you'd expect their responses to be more in sync or consistent with one another than in a group of individuals that has not be trained to look for those specific qualities or traits. So that just makes sense.

Notice also that the error bars are about the same size in the two groups on the two most highly rated headphones in the test (which are Harman's "Target", and "Target 2"). And that the differences in the size of the error bars between the two groups only starts to become more apparent or significant as you begin to go down the scale towards the headphones with lower preference ratings than the top two.

Imo, that indicates a similar ability to discriminate the sound quality of the "better sounding" headphones in the two groups (which is perhaps a more useful skill). And a somewhat reduced ability to form those same kinds of opinions with the "poorer sounding" headphones in the untrained listeners. To put this another way, the trained listeners were better at forming the same or similar opinions to one another when listening to what they perceived as a headphone with poorer or less-pleasing sound quality. (Which is also not surprising, because that's undoubtedly what the Harman training taught them to do!)

It takes an additional step (or maybe even two, or possibly three) of logic or imagination though, imho, to reach the conclusion that this would somehow also make the trained listeners better at correctly identifying (and discriminating) "good sound quality" in a more general sense, as solderdude seemed to imply in his previous comment.

-----------------------------------

As far as objective gauges or metrics for a headphone's sound quality are concerned, I think the only conclusion that the Harman research really reached was that people generally seemed to prefer the sound of a headphone that more closely resembled the frequency response of a pair of well-extended, neutral, anechoically flat loudspeakers in a room than a headphone which did not... Which is basically the same conclusion they also reached in their loudspeaker tests. I think the only other sound quality characteristic or issue they looked at a little bit on headphones (and made public) was nonlinear distortion. And I believe in the one or two tests they did on that, they found a relatively low correlation between that characteristic and a headphone listener's preferences. Some more research is probably needed on this though.

Going back to the FR testing... If I remember right, I think the first Harman Target on the above graph, which was ranked the highest by the untrained listeners, was the headphone that probably came the closest to the measured in-ear response of a good loudspeaker in a room. And Harman's "Target 2" was based on a similar response, but with both the bass and treble reduced a bit. So if you were using the (unaltered) response of a neutral loudspeaker in a room as your metric for "good sound quality", then it would seem from the above graph that the untrained listeners were actually better at identifying that correctly than the trained listeners. Because they gave the first Harman Target a higher score than the modified Harman Target 2.

I'm goin mostly by memory on alot of this though. So I'm not entirely sure this is all correct.

I think you're somewhat complicating things. The Harman research question initially was: do listeners have different preferences for headphones? They got they answer based on trials and developed the Harman target, etc.

The objective aspect of trained listening is that these subjects can describe in technical terms anything they hear which seems off (elevated midrange around 1kHz, Q of 5) instead of relying on adjectives, and their opinions are consistent. This is in part because they are also screened for hearing damage, which as it increases also makes subject assessments less consistent.

Regarding preference scores given by trained vs. untrained listeners: untrained listeners tend to rate everything higher, trained lower. This does not mean that untrained listeners prefer things more, only that they use the scale differently.

Edit: Typo.

ADU · Oct 27, 2021

pozz said:
The objective aspect of trained listening is that these subjects can describe in technical terms anything they hear which seems off (elevated midrange around 1kHz, Q of 5) instead of relying on adjectives, and their opinions are consistent. This is in part because they are also screened for hearing damage, which as it increases also makes subject assessments less consistent.

Thank you for the reply, pozz. ~~I think the untrained listeners may also have been screened for normal hearing. But not totally sure about that.~~

pozz said:
Regarding preference scores given by trained vs. untrained listeners: untrained listeners tend to rate everything higher, trained lower. This does not mean that untrained listeners prefer things more, only that they use the scale differently.

I'm not really an expert on this kind of testing. But I suppose that makes some sense.

pozz said:
I think you're somewhat complicating things. The Harman research question initially was: do listeners have different preferences for headphones? They got they answer based on trials and developed the Harman target, etc.

Fwiw, I'm just looking at the data, and giving you some of my impressions based on what I'm seeing there. Others' impressions may be very different though.

It probably also bears repeating that I believe the graph that GaryH posted above with the different headphone preference ratings was based on a relatively small sampling of listeners. Which, if so, would make it difficult to really reach any definitive conclusions from the results.

GaryH · Oct 27, 2021

pozz said:
I think you're somewhat complicating things.

Indeed.

ADU said:
It probably also bears repeating

It doesn't, because that's incorrect. The graph is from a 2017 study of IEMs with 71 listeners (36 trained and 35 untrained). A study of over-ear headphones with even more listeners (238) had similar findings:

Screenshot_20211027-224254_Acrobat for Samsung.png

Here the closeness of the 'bunching' of the lines is indicative of better consistency/reliability, and the spread of the ratings, as previously, is indicative of better discrimination.

pozz · Oct 27, 2021

ADU said:
This is slightly OT, but this is the one study related to distortion (using headphones) that Harman was involved in that I'm aware of (and was referencing in the above comments)...

And I think this is the AES paper from which this was derived...

https://www.aes.org/e-lib/browse.cfm?elib=17441

IIRC, the conclusion was that distortion was not a reliable indicator of preference.

Like all research on this topic so far, it's not the final word and requires a lot more work. I have a few ideas, but they are far fetched (e.g., bypassing acoustics totally and modelling the auditory brain response) and outside of my abilities.

ADU said:
Thank you for the reply, pozz. I think the untrained listeners may also have been screened for normal hearing. But not totally sure about that.

I'm not really an expert on this kind of testing. But I suppose that makes some sense.

Fwiw, I'm just looking at the data, and giving you some of my impressions based on what I'm seeing there. Others' impressions may be very different though.

It probably also bears repeating that I believe the graph that GaryH posted above with the different headphone preference ratings was based on a relatively small sampling of listeners. Which, if so, would make it difficult to really reach any definitive conclusions from the results.

I think that's fair.

GaryH · Oct 28, 2021

GaryH said:
If reviewers want their subjective judgements to be some kind of useful data point, they must in some way expect or at least hope their readers trust in their ability to adeptly discern good sound quality. In the same way that we require measurement rigs to conform to industry standards that demonstrate the accuracy and reliability of their readings, if subjective reports are to have any utility to readers, these 'measurement rigs' i.e. the reviewers ears should be subject to provable standards of accuracy too, and being a Harman level 8 trained listener is as good a standard for this as I can see. So calling all reviewers, @Resolve , @metal571 , @crinacle , @antdroid , Jude (I know you're watching ), care to give Harman's How to Listen a go and post your results?

Only one (potential) taker so far in @Resolve ? Ok I'll open this up to not just headphone reviewers - @hardisj , @Gene DellaSala , @joentell , @napilopez , @John Atkinson , @Kal Rubinson , willing to give it a try?

pozz · Oct 28, 2021

GaryH said:
Only one (potential) taker so far in @Resolve ? Ok I'll open this up to not just headphone reviewers - @hardisj , @Gene DellaSala , @joentell , @napilopez , @John Atkinson , @Kal Rubinson , willing to give it a try?

Start a new thread. This really doesn't belong in here.

It can stand as an open challenge. I don't see any downside for pro reviewers to train themselves.

ADU · Oct 28, 2021

GaryH said:
They didn't just find trained listeners were more consistent/reliable, they found they were more discriminating too. This does result in them being able to identify good sound quality (to them) better, and here's why:

View attachment 161700

Look carefully at the both the size of the error bars (which is indicative of listener reliability), and how spread out the ratings are (indicative of listener discrimination). Now let's zoom in to headphones 26-28. Both untrained and trained listeners on average rated these in the same order of preference. However, due to the lower rating spread (poor discrimination) and bigger error bars (poor reliability), statistically you cannot conclude that the untrained listeners could distinguish between them, rating them equally within margin of error. Conversely, the trained listeners demonstrated better discrimination (bigger spread) and reliability (smaller error bars), allowing you to conclude that they could in fact distinguish between all three headphones, and so accurately identify which had better sound quality (to them).

GaryH said:
ADU said:

It probably also bears repeating

Click to expand...

It doesn't, because that's incorrect. The graph is from a 2017 study of IEMs with 71 listeners (36 trained and 35 untrained). A study of over-ear headphones with even more listeners (238) had similar findings:

Thank you for clarifying this, GaryH.

If you had mentioned this graph was for IEMs to begin with though, then I probably wouldn't have bothered to respond on it. Because imho IEMs are much a less reliable tool for analyzing and predicting users' frequency response preferences than over-ear headphones (or speakers). So I'm not sure why it would be that relevant in this discussion. (?)

Dr. Olive briefly covered this particular study in his presentation for the HBK online conference though. And I've cued the video below to that point, in case others would like to hear his comments and a few more details on it.

Based on his remarks, there were 71 listeners in the study altogether, approximately half of which were trained (as GaryH mentions above). And all of the participants in the study were US Harman employees, with a median age of 35. It appears that the untrained listeners in this study were NOT tested for normal hearing though, like the trained listeners were. Which is interesting, and would tend to introduce more variables into the results imo than if they had also been screened for this.

This was Dr. Olive's last thought on the above test though...

"Whether you're trained or untrained, it seems as though what headphone you like is very consistent."

pozz · Oct 28, 2021

ADU said:
This was Dr. Olive's last thought on the above test though...

"Whether you're trained or untrained, it seems as though what headphone you like is very consistent."

This is to do with the overall rating of the headphones compared to each other, not a subject's ability to provide the same rating when presented with the same stimulus.

It's the same thing that's found in the speaker research. Not sure if you've read Toole's book, but in the third edition, the introductory chapters make it a point to go over the testing procedure in a lot of detail.

HBK Headphone Measurement Talks from Head-Fi and Sean Olive

Major Contributor

Addicted to Fun and Learning

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Слава Україні

Major Contributor

Major Contributor

Слава Україні

Major Contributor

Слава Україні

Major Contributor

Слава Україні

Similar threads