Yeh and those reasons are absolutely applicable to us:
"Two outliers were found in the model (HP6 and
HP30) that produced higher predicted
preference ratings than observed.
Both
headphones have audible medium Q resonances
that we believe are underestimated in the model."
See? That says the model is wrong as it can't predict those preferences correctly. Not that the model is correct and something is wrong with the measurements. The Caldera suffers from the problem they talk about:
I consider the trough between 800 Hz and 2 kHz "medium Q." In my equalization, I picked 4 as the Q.
They go on to say:
"An updated version of the model will address this issue in the future."
Until you have that model, you are dealing with rather inexact model.
What I do by eye is not only look at the deviation but apply my knowledge and experience of psychoacoustics to what I am seeing. This is precisely what Dr. Toole, Olive, etc. would do if you showed them a measurement of a headphone or speaker. They would NOT run to compute the model to give you a good/bad answer.
As back up to this, Harman never used the preference score to design speakers even though it is more robust than this model for headphones.
The purpose of the model is to say, "look, frequency response matters and we can even model it using linear prediction."
To be sure, bad numbers likely mean a bad headphone. And good numbers mean it is likely a good headphone. But that is it. So please do go computing two digit numbers like that and defending them against an experienced eye analyzing the response measurements.