- - - - - - - WARNING ! LONG POST ! PLEASE TAKE YOUR TIME - - - - - - -
Ok, @March Audio , you and others have brought up “The Toole Research” so many times, pointing to his time at NRC decades ago, that I think I will use an often-cited Toole article from that era to illustrate some of my points.
In “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2” from 1986, written when Toole was at NRC, he wrote the following (see this link: https://www.pearl-hifi.com/06_Lit_A...blications/LS_Measurements_Listener_Prefs.pdf):
“In all, there were 42 listeners in these tests, but the data used here pertain only to the 28 who exhibited low judgment variability”.
First, the vox populi type of research started out with 42 participants. In many other sciences that’s a good start, but highly a number of samples that make a golden rule or constitute the final evidence for a general theory.
A bit later, Toole wrote:
“Not all listeners auditioned all loudspeakers and not all loudspeakers were included in each experiment”.
So we had 42 listeners that didn’t perform all combinations during the test setup. So there was incompleteness there, which an outside reader could not get a grip on.
But what’s more important has already been stated: One third of the participants were removed from the data set due to these participants having random and/or high varibility in their stated preferences.
A guy I met many years ago, who made his PhD under a professor who later became a Nobel laureate, told me: “My professor insisted that the data must speak for themselves even if I don’t like what the data is saying”.
What the young PhD student found, which he was encouraged to discover, uncover and make public for all to see by the older professor, was a phenomenon that the Nobel laureate later has called the “most embarrassing” thing that his own theory could face.
The research I quote above, Toole (1986), is often referred to as the evidence that Toole produced while at NRC. Do I need to explain people why 42 participants at the outset, which was reduced by 33 percent to 28 participants due to high “judgment variability” of the discarded participants, can only work as a good start to form an opinion, not a golden rule, or evidence for a general theory.
Now, let me get a bit more technical. What happens in a data set where you remove outliers, is that the average and the median in the data set start to converge. It looks as if Toole (1986) used simple averages in speaker characteristics, though he didn’t specify how he produced the preference scores. I have previously explained the power of the median, where you’ll see that a median (say speaker) will start to get the best average percentile rank over time (in a machine producing random output). This phenomenon works well when we talk about characteristics that are represented in a linear fashion, but I am uncertain how the method works when you have characteristics that draw upon several dimensions or contain more complexity (say colours, smell?). I am not saying that the power of median explained the results in Toole (1986) on frequency response, but it had the potential to do so in a data set that was shallow to begin with and was manipulated by discarding one third of the voices in the vox populi contest of speakers. So we cannot know if the preference for a flat frequency response in Toole (1986) is because preference=truth, statistical chance, due to manipulation of the data set or a demonstration of the power of the median.
Adding to the troubles in Toole (1986), the author has made a large and diverse set of technical measurements (frequency response, phase, power etc.) that are supposed to reflect upon, correlate with listener preferences. Isn’t it obvious that it may be only a few of these measurements that will be picked up by ear and brain in a given situation, while other specifications may need another test setup to reveal themselves as audible or good-bad preference wise? Did Toole (1986) use a shotgun to cast light on subtle speaker qualities that are best taken down by finer tools than the shotgun?
In discourses on ASR, there are certain characteristics that carry weight to separate the sinners and the believers. As soon as one quotes Toole (1986), it’s as if further inquiry needs to stop. This is not the fault of gentleman Toole, but the fault of his followers as if they belonged to a religious camp.
My point is that it cannot hurt to have a more open stance on questions where the consensus research is incomplete, as well as being more critical to consensus, like digging into how many samples constitute “evidence” in audio science.
What Toole (1986) could have shown, which would make for quite an interesting avenue for a research program guided by Preference as opposed to Truth, is what happens when you go down the alley of highly variable preferences, i.e. the one third of the population that Toole (1986) started out with. As I have previously shown, modern medicine is about to leave the idea of the average treatment for more focus on the individual. So an area, which draws upon the largest research funds by far (!?), has already come to the conclusion that one can do better than average in certain respects. If a research program is Preference seeking, as opposed to Truth seeking, I guess this line of thinking is highly interesting when one has the means to offer individualistic products instead of a one-size-fits all offering.
The fact that Toole (1986) discarded 1/3 of the population due to unwanted characteristics - i.e. highly variable preference - could also be interpreted as “evidence” for subjectiveness in the hobby. Paradoxically, one could argue that the subjectivists have had the science, i.e. Toole (1986), on their side all the time because their preferences cannot be met by the one-size-fits-all that would come out of vox populi research. If preferences of about one third of the population are so highly variable as Toole (1986) hinted, it doesn’t make sense for a rational man of diverging or variable preferences to settle for the average. Could this paradox lie behind another observation by Toole where he hints at highly variable preferences (see his book on sound reproduction):
“However, many of us have seen evidence of such listener preferences in the “as found” tone control settings in numerous rental and loaner cars”.
Source: Floyd Toole, Sound Reproduction (latest ed.), chapter 12.3
In other words, the rational subjectivist could have had science on his side all the time. No need to ridicule him, right? Should we reflect upon the norms and standard for discourse on ASR, after all?
I could also make the case that a follower of Preference seeking research will be more confused than the rational subjectivist. While the rational subjectivist will know his preferences, to the extent that he owns a highly coloured setup suited to his non-variable preferences for all of his lifetime, the follower of Preference seeking research will be given new signals all the time that his setup is diverging from the New neutral (in term of preferences). Note that the preferences of a population can change even if the individual’s preferences are stable! So the follower of Preference seeking research will change his gear by every cycle. It’s as if the true believer in Preference seeking science would need to start listening to hip hop in the 2000s, while he listened to jazz and blues in the 1960s.
So who is more rational? The rational subjectivist or the follower of Preference seeking research?
Needless to say, wouldn’t it be more simple if research and science were Truth seeking, not Preference seeking?
Notes:
Does not understand basic experimental design.
Incapable of making a point without burying it in billows of fog. And there may not even be a point. Perfect for social "science."