Many people's resistance is born of their not liking the findings, and that is usually because the findings contradict their contaminated/sighted listening experiences (and which they have used as sole guide for their long journey with a lot of learnings and very large hifi expenses over a very long time). For these people there is no sample size and replication that they will consider adequate, nor will there ever be.
Not me. I like, believe, and follow these findings, but standards of statistical inference stand apart from my predilections (particularly on this evidence-based site). As I indicated before, and as you sort of suggest, they make sense and are by far the best we’ve got. I treat them as a strong Bayesian prior, not fully settled science. But so far, I haven’t seen the scale required for very strong claims about everyone’s preferences. I may be wrong, that scale may be there, but I see majorities of small samples and a somewhat limited range of speakers and listeners. Unfortunately, to do more would be pretty expensive and difficult. I suppose there will always be some uncertainty about what an individual listener will prefer with loudspeakers in rooms and even, perhaps, certain limited distortion in the signal chain.
The “probably” in your quote, suggests a similar view. Toole and Olive Seem to be pretty careful with their claims. Anyway, his statement is slightly different (“build a good-sounding speaker based on measurements) from what I was suggesting (“we can’t reliably generalize to the entire population on Toole and Olive’s preference findings, but it is the best-supported hypothesis so far”).