All good questions. I think the Olive score is the hour hand in a clock. The minute hand is not there and we have to apply that judgement. With so much objective data, once the pandemic is behind us, I hope various groups of audiophiles get together and do some controlled testing to see what their subjective scores are.
Using a time analogy I am more inclined to consider the Olive preference score as the day rather than hour from my perspective.
It seems to miss a huge amoutnt of what is important to me in a fairly large room sitting between 11 and 21 feet from the speakers in the listening area, though the rearmost seats are normally only used watching a film.
Almost all of the good scoring speakers so far would be completely hopeless in here
Even these.
I like the time analogies, but I think they miss something. The more useful analogy is not time resolution (i.e. minutes vs hours or days), but geographic range. The Olive score is useful when comparing speakers in the same "place" in the market. It does
not allow for even approximate comparisons between widely differing designs, but rather, it allows for reasonably precise comparisons between
like designs.
In this respect, it may be best seen as a tool for manufacturers to compare speakers of similar cost/size/component quality against others in their class: If the speaker you've designed performs well relative to other speakers of similar size using similar design topology and similar-quality components (that is, presumably having similar SPL limitations, levels of nonlinearity, etc.), then you know you've designed a good speaker.
From the point of view of consumers who aren't able to interpret measurements, a
poor Olive score can be used to rule a speaker
out, so it can be useful in that respect. But it can't be used to do the inverse.
EDIT: notwithstanding the widely-discussed deficiencies of the model concerning the mathematical inconsistencies, questions about its applicability beyond non-conventional designs, etc..