NORMS AND STANDARDS FOR DISCOURSE ON ASR

SIY · Aug 8, 2019

svart-hvitt said:
- - - - - - - WARNING ! LONG POST ! PLEASE TAKE YOUR TIME - - - - - - -

Ok, @March Audio , you and others have brought up “The Toole Research” so many times, pointing to his time at NRC decades ago, that I think I will use an often-cited Toole article from that era to illustrate some of my points.

In “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2” from 1986, written when Toole was at NRC, he wrote the following (see this link: https://www.pearl-hifi.com/06_Lit_A...blications/LS_Measurements_Listener_Prefs.pdf):

“In all, there were 42 listeners in these tests, but the data used here pertain only to the 28 who exhibited low judgment variability”.

First, the vox populi type of research started out with 42 participants. In many other sciences that’s a good start, but highly a number of samples that make a golden rule or constitute the final evidence for a general theory.

A bit later, Toole wrote:

“Not all listeners auditioned all loudspeakers and not all loudspeakers were included in each experiment”.

So we had 42 listeners that didn’t perform all combinations during the test setup. So there was incompleteness there, which an outside reader could not get a grip on.

But what’s more important has already been stated: One third of the participants were removed from the data set due to these participants having random and/or high varibility in their stated preferences.

A guy I met many years ago, who made his PhD under a professor who later became a Nobel laureate, told me: “My professor insisted that the data must speak for themselves even if I don’t like what the data is saying”.

What the young PhD student found, which he was encouraged to discover, uncover and make public for all to see by the older professor, was a phenomenon that the Nobel laureate later has called the “most embarrassing” thing that his own theory could face.

The research I quote above, Toole (1986), is often referred to as the evidence that Toole produced while at NRC. Do I need to explain people why 42 participants at the outset, which was reduced by 33 percent to 28 participants due to high “judgment variability” of the discarded participants, can only work as a good start to form an opinion, not a golden rule, or evidence for a general theory.

Now, let me get a bit more technical. What happens in a data set where you remove outliers, is that the average and the median in the data set start to converge. It looks as if Toole (1986) used simple averages in speaker characteristics, though he didn’t specify how he produced the preference scores. I have previously explained the power of the median, where you’ll see that a median (say speaker) will start to get the best average percentile rank over time (in a machine producing random output). This phenomenon works well when we talk about characteristics that are represented in a linear fashion, but I am uncertain how the method works when you have characteristics that draw upon several dimensions or contain more complexity (say colours, smell?). I am not saying that the power of median explained the results in Toole (1986) on frequency response, but it had the potential to do so in a data set that was shallow to begin with and was manipulated by discarding one third of the voices in the vox populi contest of speakers. So we cannot know if the preference for a flat frequency response in Toole (1986) is because preference=truth, statistical chance, due to manipulation of the data set or a demonstration of the power of the median.

Adding to the troubles in Toole (1986), the author has made a large and diverse set of technical measurements (frequency response, phase, power etc.) that are supposed to reflect upon, correlate with listener preferences. Isn’t it obvious that it may be only a few of these measurements that will be picked up by ear and brain in a given situation, while other specifications may need another test setup to reveal themselves as audible or good-bad preference wise? Did Toole (1986) use a shotgun to cast light on subtle speaker qualities that are best taken down by finer tools than the shotgun?

In discourses on ASR, there are certain characteristics that carry weight to separate the sinners and the believers. As soon as one quotes Toole (1986), it’s as if further inquiry needs to stop. This is not the fault of gentleman Toole, but the fault of his followers as if they belonged to a religious camp.

My point is that it cannot hurt to have a more open stance on questions where the consensus research is incomplete, as well as being more critical to consensus, like digging into how many samples constitute “evidence” in audio science.

What Toole (1986) could have shown, which would make for quite an interesting avenue for a research program guided by Preference as opposed to Truth, is what happens when you go down the alley of highly variable preferences, i.e. the one third of the population that Toole (1986) started out with. As I have previously shown, modern medicine is about to leave the idea of the average treatment for more focus on the individual. So an area, which draws upon the largest research funds by far (!?), has already come to the conclusion that one can do better than average in certain respects. If a research program is Preference seeking, as opposed to Truth seeking, I guess this line of thinking is highly interesting when one has the means to offer individualistic products instead of a one-size-fits all offering.

The fact that Toole (1986) discarded 1/3 of the population due to unwanted characteristics - i.e. highly variable preference - could also be interpreted as “evidence” for subjectiveness in the hobby. Paradoxically, one could argue that the subjectivists have had the science, i.e. Toole (1986), on their side all the time because their preferences cannot be met by the one-size-fits-all that would come out of vox populi research. If preferences of about one third of the population are so highly variable as Toole (1986) hinted, it doesn’t make sense for a rational man of diverging or variable preferences to settle for the average. Could this paradox lie behind another observation by Toole where he hints at highly variable preferences (see his book on sound reproduction):

“However, many of us have seen evidence of such listener preferences in the “as found” tone control settings in numerous rental and loaner cars”.
Source: Floyd Toole, Sound Reproduction (latest ed.), chapter 12.3

In other words, the rational subjectivist could have had science on his side all the time. No need to ridicule him, right? Should we reflect upon the norms and standard for discourse on ASR, after all?

I could also make the case that a follower of Preference seeking research will be more confused than the rational subjectivist. While the rational subjectivist will know his preferences, to the extent that he owns a highly coloured setup suited to his non-variable preferences for all of his lifetime, the follower of Preference seeking research will be given new signals all the time that his setup is diverging from the New neutral (in term of preferences). Note that the preferences of a population can change even if the individual’s preferences are stable! So the follower of Preference seeking research will change his gear by every cycle. It’s as if the true believer in Preference seeking science would need to start listening to hip hop in the 2000s, while he listened to jazz and blues in the 1960s.

So who is more rational? The rational subjectivist or the follower of Preference seeking research?

Needless to say, wouldn’t it be more simple if research and science were Truth seeking, not Preference seeking?

Notes:
Does not understand basic experimental design.
Incapable of making a point without burying it in billows of fog. And there may not even be a point. Perfect for social "science."

dc655321 · Aug 8, 2019

svart-hvitt said:
The paradox is, are we still talking about vox populi if we discard “the deplorables” in a democratic election or in a study of audio preferences?

It seems to me that the framework you choose to hang that on is irrelevant.
Inputs producing random output contribute nothing but noise if your other sensors (subjects) are producing a definite signal.

PierreV · Aug 8, 2019

SIY said:
Notes:
Does not understand basic experimental design.
Incapable of making a point without burying it in billows of fog. And there may not even be a point. Perfect for social "science."

Well, exclusions and dropouts are a legitimate concern. The fact that he is extremely painful to read should not obfuscate that.

SIY · Aug 8, 2019

PierreV said:
Well, exclusions and dropouts are a legitimate concern. The fact that he is extremely painful to read should not obfuscate that.

If a detector (whether human or otherwise) is incapable of resolving a measurement, it cannot be reliably used. This is Experimental Design 101. One reason why in my sensory panels, I used positive controls and replicates. If you judged the same thing differently when presented with it more than once, your input is, as @dc655321 says, noise.

PierreV · Aug 8, 2019

SIY said:
If a detector (whether human or otherwise) is incapable of resolving a measurement, it cannot be reliably used. This is Experimental Design 101. One reason why in my sensory panels, I used positive controls and replicates. If you judged the same thing differently when presented with it more than once, your input is, as @dc655321 says, noise.

It really depends on the field and the purpose. Exclusions and dropouts are very, very concerning in medicine (and related fields). Unreliable responders as well.

It's all relative...

Theoretical Physicist: "95% confidence, you must be kidding Doc, that's not evidence, merely a hint"
Physician: "Ah, great, take that drug then, we are only 95% confident it kills 30% of the patients within two years"

SIY · Aug 8, 2019

PierreV said:
It really depends on the field and the purpose. Exclusions and dropouts are very, very concerning in medicine (and related fields).

In sensory research, it's the norm. This isn't pharmacology, so not surprisingly, the experimental design needs to be different.

Blumlein 88 · Aug 8, 2019

svart-hvitt said:
@Blumlein 88 , you said (my bolding) : “We (or I) am having a hard time seeing what he is trying to say”.

In an earlier post you wrote (as far as I interpreted it) that Akerlof (2019) and Ellison (2002) showed that Hard and Length was more visible in the softer of the sciences, while the hardest ones didn’t show the same extent of producing longer article texts. I will argue that my comments have had some ASR regulars admit that audio science is a mix of hard and soft sciences, which means that Akerlof (2019) and Ellison (2002) are highly relevant. To the extent that my goal has been to open eyes for a broader horizon, wouldn’t you agree that we’ve had some ASR regulars admitting that audio science is a mix of many different disciplines which makes golden rules and hard boundaries irrelevant and borderline counterproductive?

No I wouldn't.

A friend has a saying, "if you know audio, you know everything". By which he means getting good reproduction of music in an audio system encompasses so many fields. Physics, electrical engineering, mechanical engineering, psychology, acoustics, appearance and culture at least. So I don't think that is an unusual or new idea.

Now does that make hard boundaries or golden rules couterproductive? I don't see that it does. Some aspects do have hard boundaries, some don't. The golden rule or gold standard idea is something you seem very fixated on in a way that I don't see anyone promoting it. It seems like a misplaced concern.

The only thing concrete I've seen is you think worrying with DACs for a couple db better SINAD is helping nothing and other things should be given more effort to improve. It is hard because the best gear is pushing up against the limits of physics. I don't think there is anyone who really disagrees with that. You seem intent on not accepting the agreement. At one time there were two schools of thought about which part was most important. Source or speakers. It was said it is your source that limits everything possible after it. Or that speakers are what you hear, and the least well behaved part of the chain. At one time when the source was usually a turntable that had some merit. It no longer does. The limiting factor is speakers and how to apply them.

I also don't think very many people would disagree that understanding psycho-acoustics is important. There also is probably more information there than most audiophiles realize which isn't well utilized.

If you've something more to say which could help us all do better, I'm at a loss to say even in general what it is. You've had ample opportunity to tell us and if you have we aren't getting it.

svart-hvitt · Aug 8, 2019

mitchco said:
@svart-hvitt I really have no idea what you are on about. I tried to follow and I am not getting it.

As far as loudspeakers are concerned, their already exists a "Standard Method of Measurement For In-home Loudspeakers".
"This standard describes an improved method for measuring and reporting the performance of a loudspeaker in a manner that should help consumers better understand the performance of the loudspeaker and convey a reasonably good representation of how it may sound in a room based on its off-axis response and how this response affects the consumers experience."

This "standard" is publicly available for a fee of just over $100: https://webstore.ansi.org/standards/cea/cea20342015ansi
Have you purchased and read it? Is there something you feel is wrong with it?

I am quite surprised you and others don’t get the point.

On ASR, subjectivists and people who question research like Toole (1986) are often met with hostility. This is a reaction to be understood in a social context; we’re animals, after all.

I wanted to put a finger on that. To understand why there are camps in audio, why some of those camps make sense and why some of those camps are populated with nonsense. And depending on which camp you belong to, your audio venture will take different paths - sometimes for rational reasons, sometimes for reasons that appear and may be random.

Toole (1986) is a pillar of audio science on ASR. That piece of research is associated with Toole’s time at NRC and his book on sound reproduction makes references to Toole (1986) several times.

I used this piece of research - after people referred to it several times - to illustrate that Toole (1986) raises as many questions as it answers. One could argue that Toole (1986) may not answer one single question due to leaving 1/3 of the population out. Maybe the preferred flat smooth curve appeared in the data after the adjustment of population? Maybe some of the other factors that didn’t show positive correlation with preference would have shown positive correlation prior to the population adjustment? I am problematizing here, but that’s what science is very much about - especially the Truth seeking kind of science.

One could also argue that Toole (1986) left 1/3 out to produce a result which a more theoretical idea on Neutrality had answered long ago, so as to give the researcher support for a certain research program which is vox populi based audio research; now, he could show that vox populi gives us precise answers in audio research - much in tune with the general tendencies of post WW2 America where “wisdom of crowd” solutions came to be more and more preferred in society - pointing to the flat smooth curve of high preference among the population (which was reduced to two thirds) - which may not have looked exactly that way if 3 out of 3 voices were allowed to speak, not only 2/3.

Another interesting point of Toole (1986), which must surprise some people here, is that the camp of highly variable preferences was a very big one (1/3). It’s therefore evident, if the 1/3 of Toole (1986) was and is a representative number, that about 1/3 of the population systematically should seek solutions that are not in compliance with the one-size-fits-all solution that the vox populi based research program of averaging preferences into one number produces, so as to optimize their utility function. Should we ridicule people for being highly subjective, diverging from “science”, when science indicates they simply follow their true preferences - which vox populi research does anyway? Do you see the paradox here? Hmmm...maybe all of this simply boils down to having the right set of preferences, so that one can despise those of wrong preferences?

When you’re part of the objective camp - which I am - I think we still need to show some humility towards people who have diverging views and preferences. You know, it may be that some of our “objective answers” are not Truth after all.

So depending on which camp you belong to, I can try and forecast future behaviour of members of each camp:

1) Truth seeking science: “Truth” is very stable, so you may buy a set of gear today that will last you a life time.
2) Preference seeking research: Even if your preferences are stable, those of the population are not, so you change gear as preferences in the population drift (randomly).
3) Rational and subjective: You are confident about what you like in life and can buy good gear that will last you a life time.
4) Irrational and subjective: You may change gear as emotions change, which they do quite often.

There are probably more camps out there. And I am uncertain that the borders are strictly enforced. So let’s build a wall, right?

I think it’s better one talks with another for a better understanding of positions, but that’s a highly subjective opinion.

amirm · Aug 8, 2019

svart-hvitt said:
You could attain many different things by discarding 1/3 of the population.

The population is not audiophiles. Or has critical listening ability as to encompass people who do.

Blumlein 88 · Aug 8, 2019

svart-hvitt said:
You could attain many different things by discarding 1/3 of the population. Remember Clinton calling 1/2 of Americans voting Trump a “basket of deplorables”? So if you remove 1/3 of Americans - maybe only 1/4 according to Clinton - there will be no more deplorables in the USA.

The paradox is, are we still talking about vox populi if we discard “the deplorables” in a democratic election or in a study of audio preferences?

You've mentioned this before. Can you be more specific? As far as I'm aware, Toole et al, discard people with substandard hearing or damaged hearing. That doesn't seem like a wrong idea to me. Is there some other exclusion you have in mind?

amirm · Aug 8, 2019

svart-hvitt said:
On ASR, subjectivists and people who question research like Toole (1986) are often met with hostility.

Only if they come from the position of not understanding the research, applying lay intuition, listing to general "internet consensus" and such. Otherwise, there is no hostility at all.

svart-hvitt · Aug 8, 2019

Blumlein 88 said:
You've mentioned this before. Can you be more specific? As far as I'm aware, Toole et al, discard people with substandard hearing or damaged hearing. That doesn't seem like a wrong idea to me. Is there some other exclusion you have in mind?

Take a look at Toole (1986). To me, it seems like he discarded a basket of participants to have a population of “consistency within individuals and the closest agreement across the group of individuals”. Sounds like a political solution to me?

watchnerd · Aug 8, 2019

svart-hvitt said:
I am quite surprised you and others don’t get the point.

On ASR, subjectivists and people who question research like Toole (1986) are often met with hostility. This is a reaction to be understood in a social context; we’re animals, after all.

I wanted to put a finger on that. To understand why there are camps in audio, why some of those camps make sense and why some of those camps are populated with nonsense. And depending on which camp you belong to, your audio venture will take different paths - sometimes for rational reasons, sometimes for reasons that appear and may be random.

[bold added by me]

If this is really your question (and I'm not sure that it is, given all your commentary about gold standards, hard vs soft), Toole's work is not the right place to look.

Instead, you should be looking at cognitive bias studies, perhaps starting with the anchoring effect:

https://en.wikipedia.org/wiki/Anchoring

amirm · Aug 8, 2019

svart-hvitt said:
One could argue that Toole (1986) may not answer one single question due to leaving 1/3 of the population out.

You need to learn about the Simpson's Paradox: https://en.wikipedia.org/wiki/Simpson's_paradox

In this case, if you allow people who are voting randomly in the listening test, you can easily wind up with the outcome that nothing matters in speakers.

Taking out people who can't discriminate is a highly recommended practice in controlled listening tests.

Remember, our goal in conducting listening tests is to find any ounce of truth out there. We don't want to dilute the test as to hide that with random outcomes.

And as I noted, we are catering to a very tiny subset of the population who cares and pays attention to audio fidelity. Your arguments of worrying about the general population could not be farther than what we are trying to accomplish.

svart-hvitt · Aug 8, 2019

amirm said:
Only if they come from the position of not understanding the research, applying lay intuition, listing to general "internet consensus" and such. Otherwise, there is no hostility at all.

I am probably guilty of putting too much weight on those (few) who speak up against ideas that are not “statistically significant”. For that I am sorry.

However, forum members previously had a discourse where an idea that was formulated in an AES conference paper was found irrelevant by some regulars because it was not published in JAES. So there are episodes where ideas of qualified people are waved off because it didn’t fit a certain form.

amirm · Aug 8, 2019

svart-hvitt said:
I am probably guilty of putting too much weight on those (few) who speak up against ideas that are not “statistically significant”. For that I am sorry.

However, forum members previously had a discourse where an idea that was formulated in an AES conference paper was found irrelevant by some regulars because it was not published in JAES. So there are episodes where ideas of qualified people are waved off because it didn’t fit a certain form.

That is too vague for me to place. Regardless, the tone of discussions matters as you can tell from responses to your posts

.

svart-hvitt · Aug 8, 2019

amirm said:
You need to learn about the Simpson's Paradox: https://en.wikipedia.org/wiki/Simpson's_paradox

View attachment 30915

In this case, if you allow people who are voting randomly in the listening test, you can easily wind up with the outcome that nothing matters in speakers.

Taking out people who can't discriminate is a highly recommended practice in controlled listening tests.

Remember, our goal in conducting listening tests is to find any ounce of truth out there. We don't want to dilute the test as to hide that with random outcomes.

And as I noted, we are catering to a very tiny subset of the population who cares and pays attention to audio fidelity. Your arguments of worrying about the general population could not be farther than what we are trying to accomplish.

In the case of Toole (1986) “[m]any were musicians, but all of them had a background of serious critical listening”.

So this was not a noisy crowd at the outset. At a later point, 1/3 were found to be noisy and were pushed out of the class room so to speak.

watchnerd · Aug 8, 2019

amirm said:
Taking out people who can't discriminate is a highly recommended practice in controlled listening tests.

And taste tests, and market research of multiple flavors.

Almost any study that relies on human beings needs to factor qualification of the subjects into the preliminary filter, else p-values* become an issue.

(*cue incoming metascience critique of misuse of p-values.....)

Blumlein 88 · Aug 8, 2019

svart-hvitt said:
snip..........

The fact that Toole (1986) discarded 1/3 of the population due to unwanted characteristics - i.e. highly variable preference - could also be interpreted as “evidence” for subjectiveness in the hobby. Paradoxically, one could argue that the subjectivists have had the science, i.e. Toole (1986), on their side all the time because their preferences cannot be met by the one-size-fits-all that would come out of vox populi research. If preferences of about one third of the population are so highly variable as Toole (1986) hinted, it doesn’t make sense for a rational man of diverging or variable preferences to settle for the average.
snip..........

The sense in which some listeners were highly variable sounds like noise as others have mentioned. I don't recall the details, but some people had preference that varied so much you couldn't glean anything from the results. I thought some of those would rank a speaker highly this time and the same speaker lower next time.

I'd say if someone believes those variable listeners can show something important they need to do their research to show it is so. Perhaps @Floyd Toole could fill in some details about these variable listeners that isn't in his books or other writings.

Blumlein 88 · Aug 8, 2019

svart-hvitt said:
Take a look at Toole (1986). To me, it seems like he discarded a basket of participants to have a population of “consistency within individuals and the closest agreement across the group of individuals”. Sounds like a political solution to me?

Does not to me. As SIY mentioned, and I've seen even in informal testing. Some people presented with the same exact stimulus are very variable in how they judge it. What is this telling us that is useful?

I've worked where a process had a dozen sensors along a batch reaction. You'd expect small regular changes along that batch. If one gets noisy and is reading all out of whack from the others, I ignore it. I don't see this is much different.

NORMS AND STANDARDS FOR DISCOURSE ON ASR

Grand Contributor

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Grand Contributor

Major Contributor

Founder/Admin

Grand Contributor

Founder/Admin

Major Contributor

Grand Contributor

Founder/Admin

Major Contributor

Founder/Admin

Major Contributor

Grand Contributor

Grand Contributor

Grand Contributor

Similar threads