The authors found a strong correlation between predictions made by two models in particular ("DS" and "R(nonlin)") and subjective rating.
They also examined the correlation between the older THD and IMD metrics vs listener rating and found (as expected) very poor correlation (listener rating on the y-axis, distortion according to given metric on the x-axis):
One should take a close look at what is compared in the study.
It was compared how well an overall score for distortion correlates with the auditory impressions (THD score, IMD score, DS and R(nonlin) scores)
Certain survey methods were selected for this purpose.
But this has nothing to do with the usual surveys and evaluations of distortion measurements of loudspeakers.
Usually HD2, HD3, HD4, HD5 (and THD) are determined for harmonic distortions over almost the entire frequency spectrum (as far as the measuring system allows) and then evaluated using the measurement diagram.
In order to determine IMD (with is part of the multitone distortion) it is common practice for loudspeakers to do this with a multitone signal (with special distribution of the excitation frequencies over the audible spectrum). The evaluation follows also here usually by the evaluation of the measurement diagram.
What was done in the study?
The THD score was determined as follows:
So only at 1kHz the harmonic distortions were determined and then summed up with the usual procedure.
The IMD were determined as follows:
This method is of course not very useful for the determination of IMD of loudspeakers.
The study clearly shows that the used distortion scores for THD and IMD used to describe distortion in amplifiers for example (e.g. 0.1% THD@1kHz) do not correspond to the listener ratings and that there are better methods for this.
However, this does not mean that HD and MD measurements of loudspeakers are not useful or meaningless. The "DS" and "R(nonlin)" methods are nothing more than evaluations of MD measurements.
Here the multitone excitation used in the study:
Of course it would be great, especially for non-technical consumers, if there were a single score for loudspeaker distortion.
All the better if we could simply calculate a post-EQ preference score using either the Harman target curve or other selectable curves. Presumably there are limits to how much EQ a speaker can take before something else happens, like additional harmonic distortion or changes in directivity (or even phase distortion depending on how the EQ is implemented, right?). Otherwise we would all be EQing our speakers to perform like Genelecs.
The Olive Score is only an orientation value, which describes to a certain degree the possible sound and potential of a loudspeaker.
If @amirm comes to a different conclusion in his personal evaluation, it is unlikely to be due to distortions.
It's simply that a loudspeaker with less than optimal directivity can still sound very good if the crossover is carefully tuned.
And that a loudspeaker with very good directivity, with poor crossover tuning, can sound very good if the consumer's EQ is carefully adjusted.
But why can't Amir do this with every speaker?
Distortions can certainly play a role under certain conditions. If at 1kHz a -5dB dip occurs due to a suspension resonance, it's usually not a good idea to compensate this with EQ, so the "error" cannot be corrected.
If the crossover is badly done and for example the filter slope of the low midrange driver is too early or too steep, then it is also problematic to linearize the resulting dip by EQ, because then the tweeter is overloaded.
But what is probably much more common is the fact that you just can't find the optimal EQ (without spending a lot of time).