I'm not sure what you're actually saying by "rationale." Can you explain further?
In the meantime, let me outline my scientific concerns how you (and really most of ASR) seem to be using this score (which is already stretched beyond its original formulation by adding EQ and/or subwoofer). Toole himself notes "predicted preference ratings correlated with those from listening tests with a correlation of 0.86, with a very high statistical significance (p = <0.0001)." Scores like this derived from regression analyses will show best results with the data sets from which they were derived.
Here is an example from the medical field:
https://pubmed.ncbi.nlm.nih.gov/10608376/. Clinical prediction rules require follow-up validation studies with a new set of data:
https://pubmed.ncbi.nlm.nih.gov/15292409/. Note "Clinical prediction rules typically demonstrate diminished performance in a new patient population because they are optimally modeled to the original data set." Here is another paper showing different results:
https://pubmed.ncbi.nlm.nih.gov/15118038/. Nonetheless, Kocher's prediction rule quickly became referred to as "Kocher criteria," as seen here:
https://pubmed.ncbi.nlm.nih.gov/30950940/.
Validation data for Olive's second preference prediction model was never published. From Toole's second edition of Sound Reproduction, "Figure 18.17 shows samples of two excellent, high-priced loudspeakers, that do almost everything well. To these should be added loudspeakers “R” and “I” in Figure 18.14. Collectively, these are examples of the present-day “kings of the hill.” There are others, of course, but the measurements do not look very different. When they are put against each other in double-blind tests, the audible differences are small, somewhat program dependent, and listener ratings tend to vary slightly and randomly around a high number. In the end there may be no absolute winner that is revealed with any statistical confi dence; the differences in opinion are of the same size as those that could occur by chance."
Since these two speakers are the Salon2 and Array 1400 with different preference prediction scores, Toole himself notes that they performed similarly in double-blind listening tests. I think this absolutely takes away from any rationale of using the score to predict that one speaker which has a slightly higher prediction score WILL outperform another with a slightly lower prediction score in double-blind listening tests (in rooms configured similarly to Harman's, etc)
Young-Ho