• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

I cannot trust the Harman speaker preference score

Do you value the Harman quality score?

  • 100% yes

  • It is a good metric that helps, but that's all

  • No, I don't

  • I don't have a decision


Results are only viewable after voting.

IPunchCholla

Major Contributor
Forum Donor
Joined
Jan 15, 2022
Messages
1,116
Likes
1,400
Well, yes, the problem is that limitation leads to some rather bizarre results that don't always match with reality.
In what way do they not match reality? Do we know people prefer B&Ws more than AirPods? I would think they would measure better, but I have no idea if more people would prefer the sound.
 

afranta

Member
Joined
Jan 14, 2022
Messages
6
Likes
8
Those who understand science would automatically be aware of the limitations of those findings based on the methodologies of the test.

The problem is that on this and other sites where science-literate people gather, there are also many others who join in do not understand the limitations of science and statistics-based testing, and extrapolate limited conclusions to be conclusive. And then the endless arguments begin...
I understand enough of the science to distinguish between preference scores and metrics that might usefully predict my preferences. Which is what I wrote.
 

Asinus

Member
Joined
Mar 30, 2020
Messages
75
Likes
90
The formula is a regression of the dataset available at the time the original paper was published, it was not meant to be a silver bullet for deciding which speaker is better for all eternity but to show that objective measurements (the spinorama in particular) had enough information to predict consistently and reliably subject preferences in double-blind tests.

I think we must be careful comparing an Olive score calculated from HATS in an anechoic chamber spin vs a Klippel NFS vs manual gated measurements if the scores are close enough due to differences in smoothing, resolution and calibration. Scores calculated from Amir's measurements and Harman published data for the same speaker are not the same, while the spin look pretty much the same but smoothed in Harman's spin.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,722
Likes
241,618
Location
Seattle Area
Scores calculated from Amir's measurements and Harman published data for the same speaker are not the same, while the spin look pretty much the same but smoothed in Harman's spin.
This is clearly one of our problems. I am confident a different set of coefficients would have been computed if our data was used instead of original Harman measurements. We also have far richer set of speakers measured now which may have again led to changes to the formula. Alas, we don't have controlled listening test data to compute the parameters ourselves.
 

jae

Major Contributor
Joined
Dec 2, 2019
Messages
1,208
Likes
1,510
While there may be good sounding speakers that don't score as expected, it is fair to say that the scoring system does not let any bad-sounding or ill-preferred speakers near the top. If you are honest with yourself and can acknowledge this, then you also must admit the methodology is more "useful" than flawed- which is far from useless.

What is more important, the magnitude of the score itself or understanding why something scored what it did with respect to the determinant parameters? I like using Pierre's site (https://pierreaubert.github.io/spinorama/scores.html), and it is very easy in my opinion to see why a seemingly good or bad sounding speaker did not get a certain score, and why all the top and low scoring speakers got the score they did. If subjectively, to you, one of the parameters isn't that important for whatever reason but you understand how it biases the score, you can still effectively use the scoring system along with the available data to judge speakers. A score alone can tell us quite a bit but it is only one figure which is the biggest limitation.

My time is finite and valuable, if I have to buy speakers or make a shortlist for audition, or if I have to make a recommendation to another person I am starting at the top of this list with no exception, then moving down considering the budget, listening setting/environment, and so on. The subtleties and nitpicking and waxing personal preference when it comes to one versus the other(s) can come afterwards. I have no interest to play the game of looking for proverbial "diamonds in the rough" in the middle of the chart that perform well in my special room at 11-o-clock on a Tuesday when all the best speakers with a high degree of veracity are already on a list for me.
 

pierre

Addicted to Fun and Learning
Forum Donor
Joined
Jul 1, 2017
Messages
965
Likes
3,069
Location
Switzerland
This is clearly one of our problems. I am confident a different set of coefficients would have been computed if our data was used instead of original Harman measurements. We also have far richer set of speakers measured now which may have again led to changes to the formula. Alas, we don't have controlled listening test data to compute the parameters ourselves.

The score is useful in a variety of ways:
- takes into account flatness of on axis, listening window and predicted in room
- takes into account how much bass you have and some properties of the bass
- it is not an opinion

A lot of companies design for good score: KEF, Genelec, etc because it matches what we know about precise reproduction of sound.

The score is not useful in some ways:
- it is about preferences (statistically significant and people over estimate how far they are from the median)
- not precise: +/-0.5 is the same bucket in terms of statistics
- designed from bookshelves and tower speakers monopole and for far field listening: doesn’t mean much for in wall, omnis, surround …
- does not take into account SPL and distortion: rank large and small speakers the same if they have close spinorama
- it is unclear if we have preferences when the score is high and SPL good enough for the test. For example: if you put a genelec one and a kef in the same room, both are very good after EQ, sound different, but divergence in preference will decrease (my observation).

How I use the score:
- must be over 6
- SPL and distortion must match my requirement for a room

Then I have a good speaker.
 

pierre

Addicted to Fun and Learning
Forum Donor
Joined
Jul 1, 2017
Messages
965
Likes
3,069
Location
Switzerland
This is clearly one of our problems. I am confident a different set of coefficients would have been computed if our data was used instead of original Harman measurements. We also have far richer set of speakers measured now which may have again led to changes to the formula. Alas, we don't have controlled listening test data to compute the parameters ourselves.

I have been wondering for a long time if the parameters would be very different or not. I think not. We would increase precision but the score would still be very correlated to a flat PIR, a flat LW and as much bass as possible. SPL, distortion and possibly time alignment or group delay would be great to have mixed into the score.
 
  • Like
Reactions: kuf

DanielT

Major Contributor
Joined
Oct 10, 2020
Messages
4,836
Likes
4,785
Location
Sweden - Слава Україні
Directivity and FR perhaps a matter of opinion, subjectively what one likes BUT distortion. Does anyone like speakers with audible distortion?

But I know someone who likes it:

I'll teach it "- and slashed the speaker cone. It changed the sound of my guitar



But that is a consciously created musical expression. That's another thing. Then it is instead a question of a yummy distortion.:cool::p

Edit:
FR can be changed according to taste (at least on axes, to a certain extent). Okay from the beginning straight FR on axes facilitates to later EQ to taste.:) But distortion can not be changed. It is what it is.

Directivity. Then you have to test yourself. For example, dipoles. Some like them, others do not like them at all. Same thing with electrostatic speakers. By the way, that is good. :) The world would be a sad place if everyone liked exactly the same.
 
Last edited:

Frgirard

Major Contributor
Joined
Apr 2, 2021
Messages
1,737
Likes
1,043
How many of us have gone to a dealer and asked to listen to a single speaker?
“Ah, this one sounds better, I’ll have two to take away, please…”
Or brought a single speaker home to test?
It’s absurd.
No. A habit to change. Especially since sellers lend.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,722
Likes
241,618
Location
Seattle Area
A lot of companies design for good score: KEF, Genelec, etc because it matches what we know about precise reproduction of sound.
I have not heard of any of these companies or even Harman using the score for any speaker design. Where have you read this?

What companies are designing to is the fundamentals of flax on axis and smooth directivity. That is very sound and ideal. But it doesn't equate to computing the score and trying to up the number.
 

abdo123

Master Contributor
Forum Donor
Joined
Nov 15, 2020
Messages
7,447
Likes
7,956
Location
Brussels, Belgium
It’s not like we have 23 mathematical models to calculate speaker preference.

We work with what we have imo. And with the ever decreasing interest in Audio reproduction i doubt that there will ever be research of that magnitude in the future.

If it’s nothing or the score i would always pick the score but at the same time we must understand its limitations. Dr. Olive himself said he would not trust a score difference below one point to be significant.
 

jonfitch

Senior Member
Joined
Sep 2, 2019
Messages
481
Likes
534
The preference score is just a score of the frequency response. Obviously anyone listening to the Sonos Roam will hear it's deficiencies compared to other speakers in terms of lack of dynamic punch from the DSP limiters, and the extremely loud hiss, which all detract from the actual perceived performance of the speakers.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Sorry what is absurd?

The formula, or more to the point, the methodology used to determing listening preference.

This is the study I was referring to:

UvF7rx5.png

WB7mAsq.png


Toole's reasoning, which I disagree with (for assessing stereo reproduction), is that mono is more descriminating because it produces larger differences.
In my view, the differences come mostly from differences in directivity (which affect perceived width and spaciousness) when listening to a single speaker (which is not the intended use in a stereo system), but also inadequate positioning of the speaker (dipoles work with the wall behind them).
He is also a proponent of multi-channel, which in a way conflicts with those of us use 2-channel only.

It appears that @tuga does not accept the reasoning for testing tonality and directional characteristics with a single loudspeaker rather than a stereo pair. That reasoning is complex and I understand it pretty well, but not well enough to be one who explains it.

It is a matter of fitness for purpose.
Single-speaker assessment is effective for determining frequency response characteristics and identifying issues/distortions, but one of the joys of stereo is the "three-dimensional" illusion, which you can't eveluate by listening to a single speaker.
On top of that, placing a single speaker in the middle of a room will not "show" how it (or a pair) will interact with the room when positioned as a pair (which depends on the speaker's directivity and also the room boundary absorption).

Other aspects have to do with having people assess speakers from a theatre-like set of seats, not always on axis or the correct axis for a particular speaker, and a complete disregard for optimal positioning of speaker and listener in terms of bass even though Toole defends that bass is what people rate highest.
The shuffler is a good attempt but a flawed one in my view. If blind testing does not use adequate methodology it will still produce the wrong results. Unbiased yes, but wong.

I have discussed this a lot in the forum; an example here:

https://www.audiosciencereview.com/...ts-of-room-reflections.13/page-13#post-447274

I started doing this in shops because of this site and let me tell you, evaluating a single speaker is far far more useful than evaluating in stereo.

Stereo evaluations hide speaker flaws, that's all they accomplish.

E: sry for double post, stupid phone.

See reply above.
 
Last edited:

TimVG

Major Contributor
Forum Donor
Joined
Sep 16, 2019
Messages
1,200
Likes
2,651
Just as a single curve cannot encompass loudspeaker performance, neither can a single number.
While it may seperate the obviously flawed from the rest, the score is meaningless without accompanying data - and ironically, once you have that data, the score itself becomes redundant. In other words, I have no use for it.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Just as a single curve cannot encompass loudspeaker performance, neither can a single number.
While it may seperate the obviously flawed from the rest, the score is meaningless without accompanying data - and ironically, once you have that data, the score itself becomes redundant. In other words, I have no use for it.

Agreed.

I would go a step beyond and say that a Spinorama is insufficient to characterise (audible) loudspeaker performance.
 

Frank Dernie

Master Contributor
Forum Donor
Joined
Mar 24, 2016
Messages
6,454
Likes
15,809
Location
Oxfordshire
The more I look into it the less I can trust the Harman speaker quality score. IMHO it is totally meaningless metric. I know the background, I read all the papers even before Harman was involved. However, it works so badly that IMHO it is a useless metric.

Here are some scores that I took from this database to prove why.

The KEF Reference 2C Meta: 5.6
Sonos Roam: 5.5
JBL M2: 5.1

The 17cm x 6cm smart-speaker Sonos Roam scores just shy of a real reference of a speaker from KEF, while a JBL flagship that weighs 60kg scores less than that smart-speaker.

Do I have a case or not?
I agree.
Maybe useful in reducing the size of a short list, but not much more IME.
 

abdo123

Master Contributor
Forum Donor
Joined
Nov 15, 2020
Messages
7,447
Likes
7,956
Location
Brussels, Belgium
The more I look into it the less I can trust the Harman speaker quality score. IMHO it is totally meaningless metric. I know the background, I read all the papers even before Harman was involved. However, it works so badly that IMHO it is a useless metric.

Here are some scores that I took from this database to prove why.

The KEF Reference 2C Meta: 5.6
Sonos Roam: 5.5
JBL M2: 5.1

The 17cm x 6cm smart-speaker Sonos Roam scores just shy of a real reference of a speaker from KEF, while a JBL flagship that weighs 60kg scores less than that smart-speaker.

Do I have a case or not?

Not sure if that has been brought up yet or not but picking out these speakers is a bit extreme out of the entire database.

The 2C Meta is vendor measurement, so the score is inflated because the measurements are of lower resolution compared to the near field scanner.

The Sonos Roam honestly measures quite well, but the score is also inflated because it is a time gated measurement (lower resolution).

the JBL M2 measurements are from a nearfield scanner, but honestly the speaker just doesn't measure that good for a digital active design with tens of filters applied. Just shows how good the nearfield scanner is compared to other measurements methods, in particular, the 'Harman Audio Test System'.

Spin%2B-%2BJBL%2BM2%2B%2528missing%2Bon-axis%2Bdata%2529.png


CEA2034%20--%20JBL%20M2%20%28Crown%20iTech%205000%20Amp%3B%20M2%20Base%20Configuration%29.png
 
Last edited:

TimVG

Major Contributor
Forum Donor
Joined
Sep 16, 2019
Messages
1,200
Likes
2,651
the JBL M2 measurements are from a nearfield scanner, but honestly the speaker just doesn't measure that good for an digital active design with tens of filters applied. Just shows how good the nearfield scanner is compared to other measurements methods, in particular, the 'Harman Audio Test System'.

Well for starters in traditional measurement methods, the oblique angles are missing entirely. Then there's the matter of a reference sample vs production samples.

I would go a step beyond and say that a Spinorama is insufficient to characterise (audible) loudspeaker performance.

In a sense I agree. It's a great tool, better than any score or single curve, to weed out the bad from the good, but there is more to it then that. Otherwise their own blind test champion would also have the best looking spinorama, which it doesn't.
 

abdo123

Master Contributor
Forum Donor
Joined
Nov 15, 2020
Messages
7,447
Likes
7,956
Location
Brussels, Belgium
Well for starters in traditional measurement methods, the oblique angles are missing entirely. Then there's the matter of a reference sample vs production samples.

Can you explain what you mean by the oblique angles are missing?

Also I wouldn't argue sample to sample variation when there are several resonances in the 200Hz to 500Hz region which from my prespective seem strategically smoothed out.
 
OP
sarumbear

sarumbear

Master Contributor
Forum Donor
Joined
Aug 15, 2020
Messages
7,604
Likes
7,324
Location
UK
Your first post states that it's useless, which does imply wanting to get rid of it. If that's not what you meant, then you should use a different word, because yes, you did say you want to get rid of it by calling it useless. Useless literally means it has no use. I don't get why you always try to hide behind semantics in so many of your posts. It's very weird and counterproductive.
I said "IMHO it is totally meaningless metric." You showed your ignorance by failing to read and understand that what I said then blamed me by hiding behind semantics.

But you are rude as well. What right do you have to tell a fellow member what words to use and what not to use?

Anyway, whatever. These threads never go anywhere useful.
Please take your clairvoyance abilities elsewhere and do not pollute this thread if you have nothing positive to add to it.
 
Top Bottom