The origin of the "Harman Score" - it is not what you may have thought.
Truth is, we have Consumer Reports magazine to thank for motivating the correlation work.
Within Harman we created the spinorama and used the data for loudspeaker design and evaluation, interpreting it using educated eye/brain systems. For our design engineers this was enough; they could see more data than is presented in the final "spinorama" set of curves and knew what it all meant.
As I describe in Section 5.7 in the 3rd edition, the influential Consumer Reports loudspeaker ratings had long been a problem for the loudspeaker industry. Their ratings often did not make sense, even though they had the appearance of science, being based on anechoic measurements, psychoacoustic transformations and so on. One day I got a call from the Harman CEO saying, essentially, that they were paying me a lot of money to make Harman products good, so why does he see some low ratings for Harman products in the current issue of Consumer Reports? As corporate VP of Acoustical Engineering this was my responsibility. I had a long discussion with him and Sidney Harman, explaining that their “science” was wrong, and that our products were fine. I also explained that they had been told, by me among others, that their process was flawed but did nothing about it. They still sold magazines and their magazines influenced consumers. In fact, two CR engineers visited me at the NRCC in 1985 and I showed them that in one of their magazines there was a negative 0.7 correlation between their scores and the results of my double-blind listening tests – customers should turn the ratings page upside down to get closer to the truth. They did nothing. They did no - NO - controlled listening tests, only 1/3-octave sound power measurements and calculations to arrive at an accuracy score. I visited their facility shortly afterwards and saw where and how their science was developed - not impressive.
Challenged with finding a way to put this right, we agreed that it was time to put some effort into definitively proving that there was a better way to rate loudspeakers, and that Harman would spend the money necessary to do it. Sean and I had long discussions, decided that we had a lot of information about the meaning of measurements, and he set off on a new project that culminated in his two benchmark papers:
Olive, S.E. (2004a). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 1 – listening test results”, 116th Convention, Audio Eng. Soc., Preprint 6113.
Olive, S.E. (2004b). “A multiple regression model for predicting loudspeaker preference using objective measurements: part 2 – development of the model”, 117th Convention, Audio Eng. Soc., Preprint 6190.
The 13 bookshelf loudspeakers in the well-controlled test were those evaluated by CR in their review. Sean’s calculated sound-quality ratings correlated with double-blind subjective ratings with a correlation coefficient of 0.995 (perfection!) with a high significance p < 0.0001. The Consumer Reports ratings had a correlation coefficient of – 0.22 (low and negative) with a low significance of p = 0.46. The highest rated product in the Harman subjective evaluations was in fact the lowest "calculated" rating on the Consumer Reports scale. In essence their ratings were slightly negatively correlated and substantially random. When they read the papers they stopped publishing loudspeaker reviews. They spent some effort trying to upgrade their methods, adding something distinctive, but eventually abandoned it. They could not simply accept the word of a mere "manufacturer".
So, the “Harman score” has served several purposes:
First, the motivation for doing it was satisfied; there were no more misleading Consumer Reports loudspeaker reviews. I would like to think that this was a favor to humankind.
Second, the value of spinorama data was absolutely confirmed, which was gratifying to us and reassurance to others contemplating using it in their own product development and product evaluations. Eventually, Harman competitors - at least some of them must have done some subjective/objective comparisons - concluded that the spinorama was worthy of being incorporated into an industry standard. It is now included in professional loudspeaker measurement hardware and software.
Third, the spinorama is now appearing in internet forums and some manufacturer literature as a performance descriptor. This is an infinite improvement on 27 Hz to 23 kHz +/- 3 dB, or a solitary on-axis curve of questionable origin which is still too common. Marketing people have low expectations of consumers – instead of giving them information that all of them might not understand, they give information that is worthless even to the educated.
Harman has not promoted the score as anything more than is described above, so I find it disconcerting that some are saying it is “worthless” – to whom and for what? Harman has provided spinorama data on request for many products, but not single-number ratings that I am aware of. It is neither a marketing item, or an engineering item – but it has become an “internet” item. As far as I am concerned it is, as the British would say, “a storm in a teacup”. Not worth the energy being expended on it.
As a combination metric embracing neutrality/absence of resonances, low-frequency extension, and smoothness of directivity, it is best at identifying the very best and the very worst products, but eye/brain analysis of a spinorama is necessary to provide clarity in products that are not well behaved in all respects. If one adds personal biases such as preferences for wide or narrow dispersion it is obvious that the rating cannot provide the required guidance. There can be identically rated products with different overall directivities. At the moment, there is only loose anecdotal data on listener preferences for loudspeakers with substantially different directivities in environments of different sizes and acoustical properties. Such discussions are truly debates among people with individual playback systems in different rooms, yielding different opinions. It feeds endless commentary on forums. More real research could add statistical data on customer preferences with their preferred program material in their kinds of rooms, but such an effort is prohibitively expensive. Is there a reason to expect additional customer satisfaction, or profit, from such an effort? We must wait and see. All of this for stereo, a system known to be significantly compromised from the get-go.
Obviously, the simple rating does not take into account non-linear behavior or power compression – those are research topics in their own right. There are no definitive metrics at the moment, but measurements are done, and many people feel gratified to see conventional data even if it is questionable or, some of it, arguably meaningless for this purpose.
In conclusion, the single-number “Harman score” has turned out to be a major component in the process of improving our trust in and understanding of anechoic measurements. It combines evaluations of multiple factors so it is necessarily ambiguous in cases where customers have particular interests in, for example, directivity, or bass extension. The information is in the spinorama, but it requires manual – eye/brain – interpretation. The somewhat argumentative experience this forum has been through could have been avoided, which is a pity, except that I hope there has been some added perspective.
Cheers to all,
Floyd.