• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

"Bias" of some members towards headphone measurements?

What we don't see is speakers that are generally considered good causing clear dividing lines along genre. There is no such thing as the "Genelec Classical Army" and the "KEF Rock Armada", what we observe is fans of all genres who variously enjoy (or don't) any given "good" speaker. You don't see Hip-Hop fans raving about their Neumanns but warning Rock fans to stay away, or vice-versa.
You may be interested in this ASR thread where someone asked which speakers classical music pros are using:
Spoiler: There's a LOT of B&W photos in there.

I think accurate discrimination and judging quality are almost the same thing.
The Harman study/studies we're talking about specifically were designed to assess discriminatory ability. The only relationship to "judging quality" would be the ability to differentiate the quality between the speakers. In other words, the listeners were more likely to be able to assign 3,5,9, instead of 3,4,6 to the same 3 speakers.

As I understand it, "We can look at FR charts and reliably determine how good a speaker will sound" is a statement of being able to read FR charts and probably a statement about the PIR model being close to real-world in-room results. As I read it, it's not a statement about being able to predict individual preference for specific speakers.
What I observe is that enthusiasts believe they can confidently predict whether a speaker will have high quality sound based on eyeballing FR charts, and they seem to think that Harman research allows them to do this reliably.

Think about it: if a speaker truly has high quality sound, then blind listeners should be able to assign it a high score. That's one of the reasons Harman decided to measure "perceived sound quality" as blind listener preference ratings.
 
Last edited:
What I observe is that enthusiasts believe they can confidently predict whether a speaker will have high quality sound based on eyeballing FR charts,
Well, I think you'd also want to see the distortion charts. But yeah.

I can tell a good speaker from a bad one pretty easily by looking at a few graphs. Lots of people can. I assume practically everyone posting on this site can.

Can I tell a 95th percentile from 97th percentile speaker from looking at graphs? No, and I'm not sure that idea even makes sense. It's a question of degree.

and they seem to think that Harman research allows them to do this reliably.
Speaking only for myself I only think the PIR simulation helps with that, which wasn't done at Harman AFAIK.

Spoiler: There's a LOT of B&W photos in there.
I have seen that thread, I am not sure if it's a smoking gun, exactly.
 
I can tell a good speaker from a bad one pretty easily by looking at a few graphs. Lots of people can. I assume practically everyone posting on this site can.

Can I tell a 95th percentile from 97th percentile speaker from looking at graphs? No, and I'm not sure that idea even makes sense. It's a question of degree.
Obviously. Bottom line is that Harman research demonstrated that computerized analysis of a series of spinorama measurements was only able to account for 74% of the variation in listener preference scores. If anyone is claiming to be able to predict sound quality with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible. Such an astronomical claim "should" give people pause, yet it doesn't. And that, in my opinion, represents a measurement bias here, supporting the OP's original assertion.

Speaking only for myself I only think the PIR simulation helps with that, which wasn't done at Harman AFAIK.
No, the Harman papers DID, in fact, analyze PIR charts. 2 out of the 4 variables in the regression formula, used to predict listener preferences from measurements, were based on PIR chart analysis.
 
EXACTLY!! BINGO!!! And THIS is why the demonstrated relationship between measurements and listener preferences may not be fully generalizable to different types of music. As you point out, the type of playback music influences listener preferences of the same speaker. Scroll up to see my rap/hip-hop illustration.

I don't think anyone here (including me) disagrees that measurements can give you an idea of what a speaker sounds like. For instance, you can easily figure out "no bass" or "bright" from FR charts, particularly if they are obvious. But more precisely, what we ultimately want to do is to predict perceived sound quality from measurements. Harman chose to target sound quality by means of blinded listener ratings (or "preferences"). I agree with this. Perhaps you know better than Harman?

Obviously. Bottom line is that Harman research demonstrated that computerized analysis of a series of spinorama measurements was only able to account for 74% of the variation in listener preference scores. If anyone is claiming to be able to predict sound quality with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible. Such an astronomical claim "should" give people pause, yet it doesn't. And that, in my opinion, represents a measurement bias here, supporting the OP's original assertion.

Why are you making a strawman argument that ASR folks are "eyeballing a single FR chart"? @amirm 's speaker reviews show on-axis and off-axis FR, multiple horizontal and vertical directivity measures, and distortion performance at various SPLs. All of that might not be sufficient - but no one has to establish that it is sufficient in order to demonstrate that your critique of "eyeballing a single FR chart" is irrelevant since that's not what anyone's doing.

As for "the demonstrated relationship between measurements and listener preference may not be fully generalizable to different types of music," that is indeed possible. But I also think it misses the point. Bass performance very well might be more decisive with hip-hop and other bass-heavy genres. But that doesn't mean that it's useful or particularly meaningful to call a speaker whose F3 is 35Hz but has a broad suckout from 1-3kHz and poor directivity "a better sound quality performer for hip hop" than a linear, well-behaved speaker whose F3 is 50Hz. (Not to mention, saying a speaker is better for hip-hop because of its bass extension is precisely "predicting perceived sound quality from measurements,")

It's not about listener preference and its relationship to measurements. It's about how we philosophically or conceptually want to approach the question of evaluating speakers. That poorly performing F3-35Hz speaker offers "better sound quality for hip-hop" than the F3-50Hz speaker only if our only two choices are those speakers.

And the entire point of the kinds of speaker measurement Amir (and Erin, and other measurement-oriented reviewers) does is to help everyone see what the options are out there in the market, so folks can find an F3-35Hz speaker for similar money that is more linear and has better directivity. Or so they can determine whether or not it's possible to obtain such a speaker at a given price point, or in a given size, or if they need to spend more or use a subwoofer.

The way you are using listener preference and musical genre in this argument seems to ignore this fundamental issue of performance benchmarks that - as all reviews and measurements must do - take into account the possibility of the end user listening to all manner of different genres, not to mention different individual recordings within each genre since those vary widely in production style and spectral content.

So we can, in fact, understand quite a lot about a speaker's quality by examining the measurements. The link between those measurements and listener preference might very well be weaker - but contrary to what you have been repeatedly asserting, that does not mean that our claims about speaker quality are foolish, biased, or misguided. Speaker quality, like any audio-gear benchmarks, is about establishing a reference. If we cannot achieve that reference then sure, we can talk about how different listeners who prefer different genres might make different trade-offs depending on personal preference or genre tendencies. But they are trade-offs from the ideal of that quality reference point.

So my view is that many of your points are quite valid in and of themselves - but I don't see them as persuasive evidence against the utility and validity of the kinds of measurements that comprise the speaker reviews here.
 
f anyone is claiming to be able to predict sound quality with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible.
I think this is a bit of a strawman. Predicting the preferences of the general population (Harman), predicting how a speaker will sound, (I've seen a few fr plots in my day) and defining "sound quality" are really 3 different things.
No, the Harman papers DID, in fact, analyze PIR charts
What I meant is that I think PIR itself was developed at NRC, not Harman.

At the end of the day, I don't think the Harman research actually has anything to say about one individual's ability to figure out how a speaker sounds from looking at graphs. It certainly doesn't suggest an upper limit on interpreting the data.

Suppose I somehow developed the ability to create a perfect mental model for the sound of a speaker based on graphs. This wouldn't necessarily help me tell you if you would like the speaker.

And then defining sound quality is a philosophical exercise to an extent.
 
If anyone is claiming to be able to predict sound quality with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible.

I had asked you ...

How, exactly, do you define "sound quality"?

... and you had said ...

In the context of what I was saying, it's my subjective preference for the perceived sound reproduction when listening to familiar material

So your statement therefore becomes ...

If anyone is claiming to be able to predict my subjective preference with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible.

... which might be true or might not. :)
 
Last edited:
I think this is a bit of a strawman. Predicting the preferences of the general population (Harman), predicting how a speaker will sound, (I've seen a few fr plots in my day) and defining "sound quality" are really 3 different things.

And then defining sound quality is a philosophical exercise to an extent.
If we cannot agree with how to define "sound quality," and you are arguing that it is a "philosophical exercise," than I don't see how we can continue a conversation about how measurement may predict "sound quality." I'm using Olive's definition, which is something I happen to agree with.
 
I had asked you ...



... and you had said ...



So your statement therefore becomes ...



... which might be true or might not.

At this point, I must ask you whether it is possible that you do not fully understand your own preferences. The Harman studies were conducted blind, were they not? Would you consider undergoing a double-blind test to verify that your personal subjective (sighted) preferences are or are not in line with the data published by Harman?

If it turns out that a DBT verifies that your subjective preferences are in line with the Harman tests, then I would say that a graph with sufficient information to conclude that it is or is not in line with the qualities espoused by Harman would mirror whether it does or does not predict your preferences accurately.

As you said, "BINGO!". :)

Addendum: It is also possible that you are an outlier. However, I would have expected you to postulate that long, long ago, rather than engage in criticism.
Sorry, Jim, I read your response 3 times and I honestly could not follow what you are trying to convey. Perhaps you can rephrase or someone else can paraphrase.
 
If we cannot agree with how to define "sound quality," and you are arguing that it is a "philosophical exercise,"
The philosophical aspect comes in when you have to make a decision on when to give up on the notion of fidelity to the original, as maintaining an exact relationship between the "original" (which original) and the sound field in your room becomes impossible.
 
your critique of "eyeballing a single FR chart" is irrelevant since that's not what anyone's doing.
Really? Not even in this headphone forum where a single FR chart is posted? Nobody is commenting on the SQ based on that single chart?

As for "the demonstrated relationship between measurements and listener preference may not be fully generalizable to different types of music," that is indeed possible.
Thank you for acknowledging this.
So we can, in fact, understand quite a lot about a speaker's quality by examining the measurements. The link between those measurements and listener preference might very well be weaker - but contrary to what you have been repeatedly asserting, that does not mean that our claims about speaker quality are foolish, biased, or misguided.
Nobody is arguing that you can't learn a lot about a speaker's sound quality by examining measurements. The Harman research demonstrated this relationship and further quantified how strong of a correlation there was between measurements and perceived sound quality. Thank you for acknowleding that the link between measurements and listener preference is not perfect.
So my view is that many of your points are quite valid in and of themselves - but I don't see them as persuasive evidence against the utility and validity of the kinds of measurements that comprise the speaker reviews here.
You may want to re-read what I've written. I never argued that measurements of speakers lacked utility or validity. good grief
 
The philosophical aspect comes in when you have to make a decision on when to give up on the notion of fidelity to the original, as maintaining an exact relationship between the "original" (which original) and the sound field in your room becomes impossible.
Unless you're listening to a recording of a live, unmixed performance, what would you consider the "original?" Think about it. That's why with consumer speakers/headphones, Harman was targeting listener preferences as the closest way to represent "perceived sound quality." For solid state devices, yes, fidelity/transparency (which can be roughly defined as the inability to distinguish in blinded, controlled listening tests). But transducers, no.

Consider the possibility that Olive and the Harman group had already considered all of the other ways to define/measure "sound quality" before landing on listener preference scores. Also consider that many of the conclusions that Olive and Toole have conveyed on what measurement qualities predict better sounding speakers is based on their research using listener preference scores to define "sound quality." Then decide if you know better.
 
Last edited:
I worry about being misunderstood on this one. I do not believe in magic.
But while the ASR community is partly a bastion against snake oil and subjectivism, I think some ASR users tend to overrate the meaningfulness of measurable data when it comes to headphones. (Maybe this also goes for other audio devices, but I almost only read the headphone topics here.)

a) Most of us agree that the frequency response is the most important parameter. But it is all about the frequency response in your ear, not on some measurement rig (you cannot know it exactly in beforehand). I have quite a few IEMs and Over-Ears, and when I tune them to Harman, they all sound different - some quite significantly - as they interact with my ear in a different way (also HRTF, hair, glasses etc.). Soundstage also seems a bit random.
b) The Harman Target is a very helpful standard, but it is not the perfect target for everyone.
c) Distortion is important if it exceeds a certain amount. But many people completely overestimate how well they can hear it. Besides, it is irrelevant if a headphone has high distortion on 114dbspl if you never listen to it at 114dbspl anyway.

So basically, my point is that, while all this data surely is more helpful than highly subjective reviews, our ears still are not measurement rigs. The things you hear come from an interaction between headphone and your ear, and not everything which is measurable does really influence your listening experience.

- If a headphone does exactly hit the Harman target that doesnt mean that it will sound perfect to you (or that there is something wrong with your ear if it does not sound perfect to you).
- You probably cannot hear in a blind test if a headphone has less distortion unless one of them performs badly.
- If you like a headphone which was reviewed with average/mediocre results, you were not necessarily fooled. It is not necessarily a good idea to buy a "better" headphone if you didnt feel something was wrong before you read the review.
The alternative of using unmeasurable criteria isn't an attractive one. That's not a bias so much as a rational choice.
 
Unless you're listening to a recording of a live, unmixed performance, what would you consider the "original?"
Right, that's the bit that gets philosophical because there is no obviously suitable 'original' once transducers are in the game.

I'm not questioning anything Toole or Olive did here, the point I was trying to make is that "sound quality" is a nebulous term with a lot of different definitions depending on context.

the link between measurements and listener preference is not perfect.
This is where I think the straw man is here.

The Harman research, as I understand it, does not provide a model for individual preference, just people in the aggregate.

I'd argue that if you can read measurements well enough, and you have enough experience listening, measurements can (in theory) perfectly predict your personal preference, without reference to Harman. I don't think many people can do this, but I think it's a thing people could do.

Where we agree is that we can't really take that understanding and also predict another person's preferences. What we can do based on Harman is say that most people would prefer X or Y, which may be overstated sometimes, I guess?
 
Right, that's the bit that gets philosophical because there is no obviously suitable 'original' once transducers are in the game.

I'm not questioning anything Toole or Olive did here, the point I was trying to make is that "sound quality" is a nebulous term with a lot of different definitions depending on context.


This is where I think the straw man is here.

The Harman research, as I understand it, does not provide a model for individual preference, just people in the aggregate
As it should. What good would it be to have measurements correlate with a single listener's preference?

I'd argue that if you can read measurements well enough, and you have enough experience listening, measurements can (in theory) perfectly predict your personal preference, without reference to Harman. I don't think many people can do this, but I think it's a thing people could do.
I completely disagree based solely on the fact that a person's ability to analyze a series of spin charts is orders of magnitude less capable than the computerized analysis that was used by Harman. For instance how do you weight 20 different individual peaks and valleys at various frequencies with different dB deviations and Q's? And do this over multiple charts. And do this with hundreds of speakers trying to memorize which spin imperfections correlated with how you liked a particular speaker. And so forth. Completely beyond human capabilities.

Where we agree is that we can't really take that understanding and also predict another person's preferences. What we can do based on Harman is say that most people would prefer X or Y, which may be overstated sometimes, I guess?
I would say that if we used computerized spin analysis and apply Olive's formula, then yes we can predict with some (but not complete) accuracy how unbiased listeners might rate the sound quality. But just having an enthusiast eyeb all a series of spin charts for 60 secs, that reliability to predict listener preference drops way down. That's my opinion.
 
Sorry, Jim, I read your response 3 times and I honestly could not follow what you are trying to convey. Perhaps you can rephrase or someone else can paraphrase.

By your own admission, you conflate the terms "sound quality" and "my subjective preference". That's confusing. They are not the same. It would be best for you to refrain from using the term 'SQ" and simply use the term "my preference".

You seem to be very angry and frustrated. That leads you to exaggerate your criticisms. If I were to distill your criticism without exaggeration and invective, I would do so thus:

"I don't understand how some people can look at a series of graphs and make quick judgements about speakers or headphones."

There are members here who engineer tube gear. They can look at schematics and point out characteristics, good or bad, and predict how it affects the sound of the unit.
I can't do that.
There are members here who have a deep understanding of speaker driver characteristics. They can look at test results (including, but not limited to Thiele-Small) and tell you what the driver can and cannot do.
I can't do that.
There are members here who are professional recording engineers; they have a deep understanding of recording requirements. They can tell how mic placements can give varied results.
I can't do that.

I think if you were to slow down and commit to learning more about correlation (how your personal characteristics relate to scientific standards) you would not only find answers to your questions, but also understand more about the capabilities of science in general. ;)
 
Last edited:
I don't know how we can endlessly debate some of these issues.
Using modern measurement tools we can measure speakers and closely determine how well it can reproduce a waveform
both anechoic and into some known room parameters.
Preference is preference but when we discuss a tool for High Fidelity Music REPRODUCTION, a speaker is little different from any other source.
It's output should mirror it's input as accurately as possible.
Unfortunately speakers are as yet a long way from being as accurate as an amp, and the room interface is a huge unknown factor.
But still, within small margins we are able to measure and determine which of those speakers come quite close, and which miss by a mile.
If accuracy is important to the listener, we then have to choose the speakers from within the "very good" measurement group, which ones will best suite
our personal preferences and needs for room size, spl, etc.
If accurate reproduction isn't what you want, you can even use the spec's to find those that will please the boom and sizzle crowds.
We've come a long way baby. :)
 
While we really want a more open exchange on the forum, threads like this one get to be a challenge for moderators. @AdamG has already issued 2 recent warnings. Am considering this a 3rd strike and baseball fans know what that means.

Going to give this thread a rest for a bit.
 
After further consideration, can see that some members may have missed that the thread topic is about headphones. Many recent posts have been regarding speakers. Lacking a comparable speaker-specific thread, perhaps someone could create a thread for them? If so, will move them to it.

Otherwise, these posts are clogging a thread that headphone folks may want to view without the speaker discussion. Please try to stay on topic. Thanks!
 
I completely disagree based solely on the fact that a person's ability to analyze a series of spin charts is orders of magnitude less capable than the computerized analysis that was used by Harman. For instance how do you weight 20 different individual peaks and valleys at various frequencies with different dB deviations and Q's? And do this over multiple charts. And do this with hundreds of speakers trying to memorize which spin imperfections correlated with how you liked a particular speaker. And so forth. Completely beyond human capabilities.

It was only until 2018, that computers were able to beat humans in the game of Go, a game that has more possible game states than there are atoms in the observable universe.

And not for a lack of trying either. Computers became only better than humans after Google poured millions into developing a revolutionary AI (which in hindsight was the grandfather of all the AI stuff today).

I like this discussion, but don’t underestimate humans. Finding patterns in enormous amounts of data seemingly “by intuition” is exactly what humans excel at.

And looking at dips and valleys in two FR plots (horizontal and vertical), a Distortion plot and a waterfall for good measure? Even less for headphones, that’s not even that much data.

It does take a lot of experience to build that intuition (ie. Listening to speakers of which you know the plots). But if you remember the 3 biggest weaknesses in the plots and compare them to your experience, and repeat that for 10 speakers, that should already do a lot.
 
Yes. That we don't have a standard to measure something yet doesn't mean that that thing necessarily doesn't exist.

Absence of proof doesn't mean proof of absence.

At least some users here like to think that soundstage doesn't exist. I prefer the Rtings approach, because they are trying to measure it; with more or less success.
I like your comment. I've read on RTings that the soundstage is created not recreated by the equipment. A measurement protocol would therefore pick it up as distortion.
 
Back
Top Bottom