this comparison tells us exactly nothing of use about speaker performance or listener preference for different speaker response profiles.
I agree wholeheartedly with this part of your comment but you seem to be imparting a perspective of intent/meaning to my series of questions than is intended. My questions are to make sure the Preference Score is understood comprehensively, with science and not hand waving.
If preference score was all that was needed to capture speaker performance, Dr. Toole wouldn't have had to write a 490 page, 2.58 pound book on sound reproduction.
When recommending a new computer or laptop to friends or family, if I don't want to be life-long tech support for that person, I'll make a conservative/safe recommendation even though it may not be the right choice for me personally. Along those same lines, when recommending a speaker to a friend, going with something that has a high preference score generally ensures that you'll have a happy friend. When designing a speaker for the market, going with something that has a high preference score generally ensures that you'll have favorable reviews.
Companies like Bowers & Wilkins, Dynaudio, Magnepan don't design to the preference score "standard." You could argue that B&W and Dynaudio have seen ownership changes over the years, but you cannot see this as a negative when Harman was also seen ownership changes. You have to be an attractive aquisition target to be acquired. No one is eager to buy the assets of Oceangate. Magnepan has been in continuous operation since 1969.
To address the question of: "what is the point of producing products that do not meet the standard?"
My answers to my own questions:
How many people were used to create the standard? Is this representative of the membership here? What does the standard say? (What is the effect size? MCID?)
The preference score was designed around a study with a sample of 42 listeners, of which only 28 were actually used for the development of the preference score.
Subsequent studies conducted by Harman have shown that it's generally still valid. But it doesn't account for outliers.
A fair question is why 14 of the original study group were ignored. These were listeners who had "moderate or high judgment variability." These are customers too. Was the variability between identical speakers playing identical music, where one day the listener like it and the other day the listener did not? Or was the variability that they really liked that speaker with some content and really didn't like it with other content? It's not well defined.
What's so special about those 28 people? "These were the listeners whose " fidelity ratings " showed the greatest consistency within individuals
and the closest agreement across the group of individuals." That creates potential for selection bias if the audiophiles you are testing are those who a priori prefer and listen to smoothness in frequency response.
That is, John Doe liked speaker A >> B > C > D, William Doe liked speaker A >> B > C >> D, Jane Doe liked B >> C >> A > D and Janet Doe liked B >> D > C >> A, they wouldn't have agreement across the group.
"Not all listeners auditioned all loudspeakers and not all loudspeakers were included in each experiment."
Maybe the judgment variability only matters for certain speakers -- but then the whole group was thrown out.
Dr. Toole himself acknowledges that Bose, Briggs, and Harwood have done quality research to show that reverberant speakers camouflage peaks in response and disguise the audible effects of technical imperfections and how little low-Q resonance actually is audible.
In the worst-case scenario, the 14 who were ignored should have been included. That means
the majority of people adhere to the Preference Score as the standard of choice. It's the logical business decision as well. However, it also means as much as 1/3rd of the market may end up preferring something different.
The companies making products that "do not meet the standard" are going after this 1/3rd of the market.
We can actually go further with Dr. Toole's direct statements:
"There is a trade-off, it seems, between the loudspeaker directivity required to preserve the illusion of truly compact sound sources in specifically localizable stereo images and that required to give the listener the impression of being immersed in another acoustic space."
The preference score helps to guide the "ideal" balance in trade-off for the majority of people who had agreement across the group and majority of people Harman has sampled, but clearly, it is well within science to recognize that individuals may lean one way or another toward a difference in preference.
In one extreme direction of the trade-off, you have Bose. His research showed that the "spatial property of the sound incident upon a listener is a parameter ranking in importance with the frequency spectrum of the incident energy for the subjective appreciation of music." In the other direction, you have headphones which offer vanishingly low distortion and the ability to fine tune the frequency response but eliminate the impression of being immersed in another acoustic space.
Why didn’t the highest numerical scoring product win?
The preference score is a predictor of speaker likeability based upon monaural listening tests done with standardized music selection in a standardized room from a sub-selection of customers who have a group-consistent ranking of preference between speakers which also represent at least 2/3rds majority of all customers.
My recordings didn't reflect this. My microphone offers "perfect" consistency within the "individual." It doesn't hear differently from one day to the next. It just doesn't hear what you hear, and once you change rooms and have a specific musical sample, the results can vary.
The preference score is a very powerful, industry changing standard. But it's not an immutable law of the Universe.
None of what I'm saying is anti-science or anti-preference score.
First of all, spinorama is not Harman's score. Harman's score is a recommendation for evaluating the objectively measured data in regard to human perception.
I guess I don't understand your comment that "This product is again a deviation from the standard, whose basis I see in a certain, quite simple, inability."
The spinorama is a consistent method of measurement. I don't think Dynaudio or any other company will say that the spinorama data itself is of any question. The only question is what your "goals" should be for the data collected by the spinorama measurement process. Dr. Toole's preference score is a majority-validated, science-driven answer to the goals of those measurements -- but majority isn't all.
I'm not a historian, but if I'm not mistaken, no one since the founding fathers has won the US presidency with a 2/3rds majority in the popular vote. If we accept that reasonable people can vote for the losing candidate (again, talking about the last 100 years and not specific election years), it's reasonable to accept that reasonable audiophiles may prefer speakers that don't match the popular choice.