And if you want to critique the science of it all, consider this entirely valid speculation: per Toole himself, an elite sub-group of listeners - pro engineers who compare live sound with loudspeakers all day every day as their job - often found all the tested speakers inadequate, to the point where, when pressed for a preference (and with no "none of the above" option available), expressed the hopelessness of their task by jotting down scores that later appeared inconsistent and random.
But this didn't fit with subliminal expectation, so an explanation had to be found ... wait ... I know! They've got hearing damage! You'll note that despite three editions of the book, this remains a mere assertion. No evidence was offered. No data. No audiograms. Which, per ASR's professed standards, is mere handwaving. It's in the same category as "oh, your system isn't resolving enough."
" no audiograms"? What have you (not) been reading? The books are naturally summary presentations of my and other researchers' work, but even they show data related to the audiograms that were measured on all of the participants in those experiments. BTW, audiograms have been routine screening for regular listeners in tests since those early tests in 1985.
Figure 3.5 in the 3rd edition shows summary relationships between hearing loss in different frequency ranges and judgement variability, and Figure 3.6 shows actual audiometric measurements on a number of listeners exhibiting high variability in their judgments. These listeners were clearly not exhibiting randomness in their judgements because of frustration and hopelessness over what they were hearing. They simply were not hearing all of the sound, and because of that made mistakes.
For more data and analysis you must go back to the original 1985 publication, where I would like to think there is much more than your asserted "handwaving". Toole, F. E. (
1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc.,
33. pp. 2-31. It still surprises me that this was 36 years ago - a lifetime - yet it is still not well understood. It was not the last word; more like the "first word" on the topic that I am aware of in the context of audio. More work needed to be done, but it raised a flag.
For an independent view on the role of hearing performance and the results of listening tests, there is the 2006 book: "Perceptual Audio Evaluation" by Bech and Zacharov, Wiley. Section 5.4.3 addresses "Subject selection", and part of the screening is to exhibit hearing thresholds in both ears that are within 15 dB of otologically normal persons. They comment that "approximately 50% of a university's student population (male and female) will not pass an audiometric test using a 15 dB rule". What have they been listening to?
Chapter 17 in the 3rd edition describes the criteria applied by OSHA and NIOSH for their occupational hearing conservation programs. These are often thought to prevent hearing loss, but they don't. They allow hearing loss to accumulate over a working lifetime, aiming to preserve enough that at the end of a career one can carry on a conversation at 1 m distance in the quiet. In audiometric terms that translates into about 25 dB threshold elevation at audiometric frequencies 1, 2 and 3kHz, resulting in an estimated 10% loss of understanding of entire sentences and a 50% misunderstanding of "PB" words. HiFi hearing is long gone.
Chapter 17 also describes something relatively new: Hidden hearing loss. It affects the binaural directional/spatial hearing system, independently of audiometric threshold elevations.
Hearing is fragile - preserve it!