I cannot trust the Harman speaker preference score

mcdn · Mar 21, 2022

Nice graphs from @Bjorn . My conclusion from them would be that the in-room response is so room dominated that actually the Harman score is probably only relevant for purchasing decisions if you're going to apply room correction.

Vuki · Mar 21, 2022

Of course - why bother with listening test when we have preference prediction score.

Bjorn · Mar 21, 2022

mcdn said:
Nice graphs from @Bjorn . My conclusion from them would be that the in-room response is so room dominated that actually the Harman score is probably only relevant for purchasing decisions if you're going to apply room correction.

They would be more even and much closer to the estimated response if the speaker design was better, which is very possible and something I've shown and partially explained. And that's my whole point: The Spinorama ends up giving high score to fairly mediocre speaker designs and doesn't differentiate between these and better ones.

And that's only in regards to the frequency response. If we also look at the how the speaker works in regards to the time domain in the room (early arriving high gain specular energy) with its directivity, intermodulation distortion, dynamics, coherency, cabinet diffraction and cabinet resonances, etc. the data is even more lacking.

If people buy speakers based on this and thinking they are getting SOTA because of a high score, they are being misleaded.

BTW: One can't equalize these short comings with a great result. That's simply not possible. You can, however, treat the room acoustically to minimize the problems but it will never be like avoiding them in the first place.

Newman · Mar 21, 2022

Vuki said:
Of course - why bother with listening test when we have preference prediction score.

What listening test? 90% of the world does not have access to a local store where they can set up a listening test, and remember, even then, if you want to know which sound waves you prefer, the listening tests need to be controlled for non-sonic factors. Good luck with that.

Holmz · Mar 21, 2022

Newman said:
What listening test? 90% of the world does not have access to a local store where they can set up a listening test, and remember, even then, if you want to know which sound waves you prefer, the listening tests need to be controlled for non-sonic factors. Good luck with that.

^+^

I was at a couple of shops in SoCal in the last month, and one in Perth, WA Australia 6 months ago, as well as at a manufactures place.
But this is rare…
It always involves a plane trip to seek out equipment, and if we are on vacation, it is not easy to drag the Haus-Boss into every shop I see.
And then most everything is Best Buy, Harvey Norman, or Dick Smith shops… so one needs to know what they need to look for, and find the hidden doors.

A chart, or excel spreadsheet of speakers with ratings is a dandy start.
Follow that up with the distortion and compression, and maybe impulse response… Run then all through some cost filter… and one is then pretty well equiped by getting a list of hundreds down to half a dozen or less.

At the bottom end < $2000 it is difficult.
At the high end of 10k to 200k is also somewhat difficult.
But in the $2k - $10k range the metric give some pretty decent insight as to what to investigate further and what to not bother with (IME).

The cost of supporting Amir or Erin in a patron sense, is stupidly great value for people that are not camped out in NYC (or other HiFi Mecca sites).

Vuki · Mar 21, 2022

As if any loudspeaker manufacturer provides harman's score... It's calculated from the spinorama measurements, and once you are provided with spinorama results more often than not you also get a set of other valuable measurements (THD, dynamic compression etc.) and sometimes even listening impressions from trained listener

Guess all that gets you much more complete picture than single deceiving number.

sarumbear · Mar 21, 2022

Newman said:
The issue is not the research, not the scoring tool - it is the data quality and how it is used.

That is my point and why I started my OP. The issue is the score. I believe it was a good and honest start to solve a problem but due to total lack of follow up research, during the many decades that fallowed, it is now nothing but an interesting research. This is what I think and it is in no way disrespecting Dr. Olive or anyone else involved in the research.

BoredErica · Mar 21, 2022

What I learned is anti score people tend to be rude and super annoying, whether or not the score is good.

srrxr71 · Mar 21, 2022

The apple HomePod is a bad example as it EQs itself very differently as a single or as a pair. I bet it even changes it’s dispersion.

Floyd Toole · Mar 22, 2022

The origin of the "Harman Score" - it is not what you may have thought.

Truth is, we have Consumer Reports magazine to thank for motivating the correlation work.

Within Harman we created the spinorama and used the data for loudspeaker design and evaluation, interpreting it using educated eye/brain systems. For our design engineers this was enough; they could see more data than is presented in the final "spinorama" set of curves and knew what it all meant.

As I describe in Section 5.7 in the 3rd edition, the influential Consumer Reports loudspeaker ratings had long been a problem for the loudspeaker industry. Their ratings often did not make sense, even though they had the appearance of science, being based on anechoic measurements, psychoacoustic transformations and so on. One day I got a call from the Harman CEO saying, essentially, that they were paying me a lot of money to make Harman products good, so why does he see some low ratings for Harman products in the current issue of Consumer Reports? As corporate VP of Acoustical Engineering this was my responsibility. I had a long discussion with him and Sidney Harman, explaining that their “science” was wrong, and that our products were fine. I also explained that they had been told, by me among others, that their process was flawed but did nothing about it. They still sold magazines and their magazines influenced consumers. In fact, two CR engineers visited me at the NRCC in 1985 and I showed them that in one of their magazines there was a negative 0.7 correlation between their scores and the results of my double-blind listening tests – customers should turn the ratings page upside down to get closer to the truth. They did nothing. They did no - NO - controlled listening tests, only 1/3-octave sound power measurements and calculations to arrive at an accuracy score. I visited their facility shortly afterwards and saw where and how their science was developed - not impressive.

Challenged with finding a way to put this right, we agreed that it was time to put some effort into definitively proving that there was a better way to rate loudspeakers, and that Harman would spend the money necessary to do it. Sean and I had long discussions, decided that we had a lot of information about the meaning of measurements, and he set off on a new project that culminated in his two benchmark papers:

Olive, S.E. (2004a). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 1 – listening test results”, 116th Convention, Audio Eng. Soc., Preprint 6113.

Olive, S.E. (2004b). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 2 – development of the model”, 117th Convention, Audio Eng. Soc., Preprint 6190.

The 13 bookshelf loudspeakers in the well-controlled test were those evaluated by CR in their review. Sean’s calculated sound-quality ratings correlated with double-blind subjective ratings with a correlation coefficient of 0.995 (perfection!) with a high significance p < 0.0001. The Consumer Reports ratings had a correlation coefficient of – 0.22 (low and negative) with a low significance of p = 0.46. The highest rated product in the Harman subjective evaluations was in fact the lowest "calculated" rating on the Consumer Reports scale. In essence their ratings were slightly negatively correlated and substantially random. When they read the papers they stopped publishing loudspeaker reviews. They spent some effort trying to upgrade their methods, adding something distinctive, but eventually abandoned it. They could not simply accept the word of a mere "manufacturer".

So, the “Harman score” has served several purposes:

First, the motivation for doing it was satisfied; there were no more misleading Consumer Reports loudspeaker reviews. I would like to think that this was a favor to humankind.

Second, the value of spinorama data was absolutely confirmed, which was gratifying to us and reassurance to others contemplating using it in their own product development and product evaluations. Eventually, Harman competitors - at least some of them must have done some subjective/objective comparisons - concluded that the spinorama was worthy of being incorporated into an industry standard. It is now included in professional loudspeaker measurement hardware and software.

Third, the spinorama is now appearing in internet forums and some manufacturer literature as a performance descriptor. This is an infinite improvement on 27 Hz to 23 kHz +/- 3 dB, or a solitary on-axis curve of questionable origin which is still too common. Marketing people have low expectations of consumers – instead of giving them information that all of them might not understand, they give information that is worthless even to the educated.

Harman has not promoted the score as anything more than is described above, so I find it disconcerting that some are saying it is “worthless” – to whom and for what? Harman has provided spinorama data on request for many products, but not single-number ratings that I am aware of. It is neither a marketing item, or an engineering item – but it has become an “internet” item. As far as I am concerned it is, as the British would say, “a storm in a teacup”. Not worth the energy being expended on it.

As a combination metric embracing neutrality/absence of resonances, low-frequency extension, and smoothness of directivity, it is best at identifying the very best and the very worst products, but eye/brain analysis of a spinorama is necessary to provide clarity in products that are not well behaved in all respects. If one adds personal biases such as preferences for wide or narrow dispersion it is obvious that the rating cannot provide the required guidance. There can be identically rated products with different overall directivities. At the moment, there is only loose anecdotal data on listener preferences for loudspeakers with substantially different directivities in environments of different sizes and acoustical properties. Such discussions are truly debates among people with individual playback systems in different rooms, yielding different opinions. It feeds endless commentary on forums. More real research could add statistical data on customer preferences with their preferred program material in their kinds of rooms, but such an effort is prohibitively expensive. Is there a reason to expect additional customer satisfaction, or profit, from such an effort? We must wait and see. All of this for stereo, a system known to be significantly compromised from the get-go.

Obviously, the simple rating does not take into account non-linear behavior or power compression – those are research topics in their own right. There are no definitive metrics at the moment, but measurements are done, and many people feel gratified to see conventional data even if it is questionable or, some of it, arguably meaningless for this purpose.

In conclusion, the single-number “Harman score” has turned out to be a major component in the process of improving our trust in and understanding of anechoic measurements. It combines evaluations of multiple factors so it is necessarily ambiguous in cases where customers have particular interests in, for example, directivity, or bass extension. The information is in the spinorama, but it requires manual – eye/brain – interpretation. The somewhat argumentative experience this forum has been through could have been avoided, which is a pity, except that I hope there has been some added perspective.

Cheers to all,
Floyd.

Floyd Toole · Mar 22, 2022

Bjorn said:
It's coming from an understanding of how a speaker operates in regards to boundaries. With these designs and directivity, it's unavoidable. I have touched on this before in the thread and why the Harman score is quite misleading.

There are of course in some cases where the dips will be less because of peaks in the same area or the opposite, but generally you'll se this trait. As mentioned previously, google the in room response of many of the high score speakers. There are few examples below from Erin's reviews but I encourage you too look for a lot more examples to get the general picture.

View attachment 193885

View attachment 193886
View attachment 193887

View attachment 193888

While the estimated response isn't totally off, we can see that it deviates quite a bit. It's the nature of how the speaker designs work.

Thank you for displaying these curves Bjorn. They look totally familiar, in that the predicted room curve is a good match above about 500 Hz, and is corrupted by room interactions at lower frequencies. All of this is discussed in AES papers dating from the mid 1980s, and my books. Obviously all setups and rooms yield different curves at lower frequencies, so individual attention is required for each installation if maximum performance is expected. Listeners have a powerful ability to adapt to, and “listen through” rooms to the extent that, in a given room, they are able to rate and rank the inherent sound quality of loudspeakers in a relative sense – the good ones still win, the bad ones still lose. However, in setting up one’s personal system the job is not done. To maximize the satisfaction from any loudspeaker, whatever its inherent performance, the huge influence of the room at lower frequencies must be addressed, and we know that bass is a substantial factor (30 %) in our perception of sound quality.

At frequencies below transition/Schroeder room resonances must be tamed, and if done successfully, the sound quality will improve, and it will be improved for multiple listeners, not just in the sweet spot. Multiple subwoofers in a bass managed system is the best solution. Bass management high-pass filters the satellite (bookshelf?) loudspeakers allowing them to play louder. This is inconvenient or impossible for dedicated two-channel listeners, but bass management is included in all multichannel processors. This alone might be motivation to have the option of occasional multichannel experiences or upmixing – 5.1 systems can be impressive; Atmos is overkill, but is commercially driven by you-know-who.

Notions that multichannel systems are incapable of superb two-channel reproduction are wrong. With bass management and competently designed subwoofer systems a system based on good bookshelf units can be seriously good. For me, video music concerts of all musical genres can provide high quality entertainment – there are many mediocre examples, unfortunately, but the best of them have excellent multichannel sound tracks putting you at the performance – and I like seeing the performers.

Still not at all as good as the live LA Phil concert we attended recently.

Pdxwayne · Mar 22, 2022

Floyd Toole said:
The origin of the "Harman Score" - it is not what you may have thought.

Truth is, we have Consumer Reports magazine to thank for motivating the correlation work.

Within Harman we created the spinorama and used the data for loudspeaker design and evaluation, interpreting it using educated eye/brain systems. For our design engineers this was enough; they could see more data than is presented in the final "spinorama" set of curves. As I describe in Section 5.7 in the 3rd edition, the influential Consumer Reports loudspeaker ratings had long been a problem for the loudspeaker industry. Their ratings often did not make sense, even though they had the appearance of science, being based on anechoic measurements, psychoacoustic transformations and so on. One day I got a call from the Harman CEO saying, essentially, that they were paying me a lot of money to make Harman products good, so why does he see some low ratings for Harman products in the current issue of Consumer Reports? As corporate VP of Acoustical Engineering this was my responsibility. I had a long discussion with him and Sidney Harman, explaining that their “science” was wrong, and that our products were fine. I also explained that they had been told, by me among others, that their process was flawed but did nothing about it. They still sold magazines and their magazines influenced consumers. In fact, two CR engineers visited me at the NRCC in 1985 and I showed them that in one of their magazines there was a negative 0.7 correlation between their scores and the results of my double-blind listening tests – customers should turn the ratings page upside down to get closer to the truth. They did nothing. They did no - NO - controlled listening tests, only 1/3-octave sound power measurements and calculations to arrive at an accuracy score.

Challenged with finding a way to put this right, we agreed that it was time to put some effort into definitively proving that there was a better way to rate loudspeakers, and that Harman would spend the money necessary to do it. Sean and I had long discussions, decided that we had a lot of information about the meaning of measurements, and he set off on a new project that culminated in his two benchmark papers:

Olive, S.E. (2004a). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 1 – listening test results”, 116th Convention, Audio Eng. Soc., Preprint 6113.

Olive, S.E. (2004b). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 2 – development of the model”, 117th Convention, Audio Eng. Soc., Preprint 6190.

The 13 bookshelf loudspeakers in the well-controlled test were those evaluated by CR in their review. Sean’s calculated sound-quality ratings correlated with double-blind subjective ratings with a correlation coefficient of 0.995 (perfection!) with a high significance p < 0.0001. The Consumer Reports ratings had a correlation coefficient of – 0.22 (low and negative) with a low significance of p = 0.46. The highest rated product in the Harman subjective evaluations was in fact the lowest "calculated" rating on the Consumer Reports scale. In essence their ratings were slightly negatively correlated and substantially random. When they read the papers they stopped publishing loudspeaker reviews. They spent some effort trying to upgrade their methods, adding something distinctive, but eventually abandoned it. They could not simply accept the word of a mere "manufacturer".

So, the “Harman score” has served several purposes:

First, the motivation for doing it was satisfied; there were no more misleading Consumer Reports loudspeaker reviews. I would like to think that this was a favor to humankind.

Second, the value of spinorama data was absolutely confirmed, which was gratifying to us and reassurance to others contemplating using it in their own product development and product evaluations. Eventually, Harman competitors - at least some of them must have done some subjective/objective comparisons - concluded that the spinorama was worthy of being incorporated into an industry standard. It is now included in professional loudspeaker measurement hardware and software.

Third, the spinorama is now appearing in internet forums and some manufacturer literature as a performance descriptor. This is an infinite improvement on 27 Hz to 23 kHz +/- 3 dB, or a solitary on-axis curve of questionable origin which is still too common. Marketing people have low expectations of consumers – instead of giving them information that all of them might not understand, they give information that is worthless even to the educated.

Harman has not promoted the score as anything more than is described above, so I find it disconcerting that some are saying it is “worthless” – to whom and for what? Harman has provided spinorama data on request for many products, but not single-number ratings that I am aware of. It is neither a marketing item, or an engineering item – but it has become an “internet” item. As far as I am concerned it is, as the British would say, “a storm in a teacup”. Not worth the energy being expended on it.

As a combination metric embracing neutrality/absence of resonances, low-frequency extension, and smoothness of directivity, it is best at identifying the very best and the very worst products, but eye/brain analysis of a spinorama is necessary to provide clarity in products that are not well behaved in all respects. If one adds personal biases such as preferences for wide or narrow dispersion it is obvious that the rating cannot provide the required guidance. There can be identically rated products with different overall directivities. At the moment, there is only loose anecdotal data on listener preferences for loudspeakers with substantially different directivities in environments of different sizes and acoustical properties. Such discussions are truly debates among people with individual playback systems in different rooms, yielding different opinions. It feeds endless commentary on forums. More real research could add statistical data on customer preferences with their preferred program material in their kinds of rooms, but such an effort is prohibitively expensive. Is there a reason to expect additional customer satisfaction, or profit, from such an effort? We must wait and see. All of this for stereo, a system known to be significantly compromised from the get-go.

Obviously, the simple rating does not take into account non-linear behavior or power compression – those are research topics in their own right. There are no definitive metrics at the moment, but measurements are done, and many people feel gratified to see conventional data even if it is questionable or, some of it, arguably meaningless for this purpose.

In conclusion, the single-number “Harman score” has turned out to be a major component in the process of improving our trust in and understanding of anechoic measurements. It combines evaluations of multiple factors so it is necessarily ambiguous in cases where customers have particular interests in, for example, directivity, or bass extension. The information is in the spinorama, but it requires manual – eye/brain – interpretation. The somewhat argumentative experience this forum has been through could have been avoided, which is a pity, except that I hope there has been some added perspective.

Cheers to all,
Floyd.

I always wonder why CR stopped rating speakers.....You caused it!

: )

Newman · Mar 22, 2022

It’s all in the book!

sarumbear · Mar 22, 2022

Floyd Toole said:
Harman has provided spinorama data on request for many products, but not single-number ratings that I am aware of. It is neither a marketing item, or an engineering item – but it has become an “internet” item. As far as I am concerned it is, as the British would say, “a storm in a teacup”. Not worth the energy being expended on it.

As a combination metric embracing neutrality/absence of resonances, low-frequency extension, and smoothness of directivity, it is best at identifying the very best and the very worst products, but eye/brain analysis of a spinorama is necessary to provide clarity in products that are not well behaved in all respects. If one adds personal biases such as preferences for wide or narrow dispersion it is obvious that the rating cannot provide the required guidance.
[...]
In conclusion, the single-number “Harman score” has turned out to be a major component in the process of improving our trust in and understanding of anechoic measurements. It combines evaluations of multiple factors so it is necessarily ambiguous in cases where customers have particular interests in, for example, directivity, or bass extension. The information is in the spinorama, but it requires manual – eye/brain – interpretation.

I have edited my post using the words of @Floyd Toole to clarify what I had said in my OP. Italic text are new new.

The more I look into it the less I can trust the Harman speaker quality score. IMHO it is a totally meaningless metric by itself. I know the background, I read all the papers even before Harman was involved, and their patent. However, it ~~works~~ lacks clarity so badly that IMHO it is a meaningless metric to classify speakers other than identifying the very best and the very worst products.

fineMen · Mar 22, 2022

sarumbear said:
I have edited my post using the words of @Floyd Toole to clarify what I had said in my OP. Italic text are new new.

The more I look into it the less I can trust the Harman speaker quality score. IMHO it is a totally meaningless metric by itself. I know the background, I read all the papers even before Harman was involved, and their patent. However, it ~~works~~ lacks clarity so badly that IMHO it is a meaningless metric to classify speakers other than identifying the very best and the very worst products.

=> Level of scales

I cannot emphasise enough that one must not ignore the very basics of addressing test subjects regarding their opinions. Before the infamous 'algorithms' on 'social networks' took over, some scientific analysis of the then new, now outdated field was invested.

I've not seen anybody mentioning the 'level of scales' even. So "the score" wasn't categorised to begin with. Subsequently it all became chaotic, barely causal heat, just as on the unreasonable surface of a dark star.

Good to know that Floyd was able stop misusing scientific results in inappropriate ways. "The score" was, as I read it, just a proof of the model, not the other way round. And the model was to approve the significant parameters to be optimised by a well-meaning manufacturer.

At least but not to the last we've got an accepted standard. That is something to write home about, dear pals!

Better think of equalising Your speakers. In case of lacking interest (!) get the help of competent personnel in, means a trustworthy dealer. To do even better than any score could possibly predict.

srrxr71 · Mar 22, 2022

Trustworthy dealer is like an oxymoron.

Vuki · Mar 22, 2022

fineMen said:
=> Level of scales

...

At least but not to the last we've got an accepted standard. That is something to write home about, dear pals!

...

What accepted standard?

Floyd Toole · Mar 22, 2022

ANSI-CTA-2034 "Standard Method of Measurement for In-Home Loudspeakers"

Vuki · Mar 22, 2022

Floyd Toole said:
ANSI-CTA-2034 "Standard Method of Measurement for In-Home Loudspeakers"

I know, but unfortunately many manufacturers ignore it.

Cote Dazur · Mar 22, 2022

It is not every day that one has the opportunity to write a post, in a thread where Floy Toole is writing as well, so I could not resist.

To be in presence of so much knowledge and wisdom is a privilege. I only recently read the book and watched the video, but have known about the influence on home music reproduction for a very long time.

Science, in any field, has limitations, to be effective in using science, we need to understand those limitations. It is a tool, how we use it is as important as what it can do.

The major findings from Floyd Toole research are lighting up the way for us to find a better path, it is up to us make the most of what we are provided. Giving it more power than it has is as bad, maybe even worst than not recognizing its virtues.

Is the rating reflecting all the virtues, I don't know, probably not. Maybe we need more data before we can do a rating, or read the data more in depth before issuing a rating.

On the face of it, the rating does not look like a major tool that I can use in my quest for better speakers for my 2 speakers stereo home music reproduction system.

I cannot trust the Harman speaker preference score

Do you value the Harman quality score?

100% yes

It is a good metric that helps, but that's all

No, I don't

I don't have a decision

Addicted to Fun and Learning

Senior Member

Major Contributor

Major Contributor

Major Contributor

Senior Member

Master Contributor

Addicted to Fun and Learning

Major Contributor

Senior Member

Senior Member

Major Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Senior Member

Senior Member

Senior Member

Addicted to Fun and Learning

Similar threads