NORMS AND STANDARDS FOR DISCOURSE ON ASR

Thomas_A · Aug 8, 2019

Variable listeners could mean anything, but if they would not be consistent between trials on individual basis, they should be discarded before the test is performed. But never during a test.

SIY · Aug 8, 2019

Thomas_A said:
Variable listeners could mean anything, but if they would not be consistent between trials on individual basis, they should be discarded before the test is performed. But never during a test.

This is why in tests like this, you include replicates. But inevitably, that cannot be analyzed until after the test. If the criteria for exclusion are set ahead of time, this is absolutely basic experimental procedure.

watchnerd · Aug 8, 2019

This thread has wandered from:

-Importance vs Hardness (aka the "bikeshedding problem")
-gold standards
-vox populi testing methods
-audio as a narrow discipline vs interdisciplinary
-why audio subjectivists exhibit anchoring behavior.
-why bad listeners get discarded from a listening test

I'm starting to feel like this is all just a giant troll with no actual purpose.

Thomas_A · Aug 8, 2019

SIY said:
This is why in tests like this, you include replicates. But inevitably, that cannot be analyzed until after the test. If the criteria for exclusion are set ahead of time, this is absolutely basic experimental procedure.

As long as you don't exclude those with consistent results between trials despite different ranking between individuals, it should be ok.

Thomas_A · Aug 8, 2019

As written

watchnerd said:
This thread has wandered from:

-Importance vs Hardness (aka the "bikeshedding problem")
-gold standards
-vox populi testing methods
-audio as a narrow discipline vs interdisciplinary
-why audio subjectivists exhibit anchoring behavior.
-why bad listeners get discarded from a listening test

I'm starting to feel like this is all just a giant troll with no actual purpose.

Even double-Nobel prized Linus Pauling was wrong claiming the benefits of vitamin C. He missed a bunch of articles in his systematic review.

watchnerd · Aug 8, 2019

Thomas_A said:
As written

Even double-Nobel prized Linus Pauling was wrong claiming the benefits of vitamin C. He missed a bunch of articles in his systematic review.

Pauling also wanted people with genetic diseases to be tatooed so that two people with the same bad genes wouldn't procreate....

Tinder would be much more interesting in that case.

PierreV · Aug 8, 2019

Blumlein 88 said:
I've worked where a process had a dozen sensors along a batch reaction. You'd expect small regular changes along that batch. If one gets noisy and is reading all out of whack from the others, I ignore it. I don't see this is much different.

Well, the result of the study is often summarized as

"it has been proven that most people prefer..."

not even considering the data set and the stats themselves, a better summary is probably

"in that test 70% of the people preferred..."

now considering the exclusions/drop out, let's say 28 remaining out of 42 an even better summary could be

"42% from the initial contingent preferred..."

(don't shoot me on numbers, I am partly trusting svart here, partly vaguely remembering the study)

Now, in terms of market study, you could, of course, decide that the 70% of the people who "cared or were able to discriminate" expressed a consistent preference is the result that interests you. That maybe the market you decide to aim and optimize for.

But another way of looking at it can also be that 60% of the initial contingent either had other preferences, did not care or wasn't able to discriminate.
That may also be the market you decide to aim for: sell them anything, they can't tell anyway, will love Monday, hate Wednesday and replace Saturday.

Now, looking at the real world out there, and the chaos in the audiophile Internet, I am tempted to say, what? Only 60% ?

svart-hvitt · Aug 8, 2019

watchnerd said:
This thread has wandered from:

-Importance vs Hardness (aka the "bikeshedding problem")
-gold standards
-vox populi testing methods
-audio as a narrow discipline vs interdisciplinary
-why audio subjectivists exhibit anchoring behavior.
-why bad listeners get discarded from a listening test

I'm starting to feel like this is all just a giant troll with no actual purpose.

I agree that themes have been shifting, to appear as if it’s all random. This is much like a creative process where researchers come up with new ideas. Research and science by diktat never worked, so some degree of freedom is a good, not a bad, right?

The overarching theme of norms and standards of discourse has come up several times, though. In the latest post I showed how an omission in Toole (1986) could arguably be key to our understanding of that work; and omission was a point I started out with. So a certain tendency of concepts used underway can be seen in all the «randomness»?

Having said that, some of the concepts discussed or developed in course of the thread are:

=> Vox populi research
=> Truth speaker
=> Preference speaker
=> Camps of audiophiles and their behaviour
=> «Basket of deplorables» in listening tests
=> Existence of gold standard
=> Is audio science Hard or Soft?
=> Statistics and the power of median to mediocrity or truth
=> Review of a pillar in audio research, Toole (1986)

There’s probably more.

So I hope there have been both intended and unintented effects and side effects as the thread mounted in size.

As for the overarching aim of being more open to potentially important ideas without statistical significance support, can the above mentioned themes and concepts be seems as a terminology to open doors that used to be closed?

Blumlein 88 · Aug 8, 2019

PierreV said:
Well, the result of the study is often summarized as

"it has been proven that most people prefer..."

not even considering the data set and the stats themselves, a better summary is probably

"in that test 70% of the people preferred..."

now considering the exclusions/drop out, let's say 28 remaining out of 42 an even better summary could be

"42% from the initial contingent preferred..."

(don't shoot me on numbers, I am partly trusting svart here, partly vaguely remembering the study)

Now, in terms of market study, you could, of course, decide that the 70% of the people who "cared or were able to discriminate" expressed a consistent preference is the result that interests you. That maybe the market you decide to aim and optimize for.

But another way of looking at it can also be that 60% of the initial contingent either had other preferences, did not care or wasn't able to discriminate.
That may also be the market you decide to aim for: sell them anything, they can't tell anyway, will love Monday, hate Wednesday and replace Saturday.

Now, looking at the real world out there, and the chaos in the audiophile Internet, I am tempted to say, what? Only 60% ?

Reminds me of a fellow I met who was selling some Acoustats. He had lots of exotic gear. His favorite speaker was the Bozak Grand Concert.
Here is one without the grill cloth. It is about 5 ft tall as I recall. Basically a fridge.

A pair with the cloth.

Stereophile reviewed a pair including modern measurements.
https://www.stereophile.com/historical/1005bozak/index.html

Okay so the guy loved these because they played loud with lots of bass. And boy did they have lots of uncontrolled bass. And he loved you could hear what the tweeter, midrange and woofer were doing separately. As in you clearly heard three different speakers doing something in three defined frequency regions. The Acoustats were not loud even with a ton of power, had no rolling bass, and everything was homogeneous with all the frequencies coming all from the same place over the whole panel. He couldn't find enough bad things to say about how boring and uninteresting that sound was. So bland, yuck.

So he played his pride and joy Bozaks which to me were appallingly bad. When he told me they cost him $2500 in the 60's I was appalled even more (a new Chevy Impala was about this much). I didn't tell him all this. I was courteous about his pride and joy. But it wasn't easy. Ever since that experience when I hear the name Bozak I think the Bozo of speakers. To this guy they were the pinnacle of loudspeaker development. He had in mind selling the Acoustats to fund some McIntosh speakers. Like these. Woofer and midrange in one cabinet and line array of tweeters in another. He thought they might be good.

Floyd Toole · Aug 8, 2019

Blumlein 88 said:
The sense in which some listeners were highly variable sounds like noise as others have mentioned. I don't recall the details, but some people had preference that varied so much you couldn't glean anything from the results. I thought some of those would rank a speaker highly this time and the same speaker lower next time.

I'd say if someone believes those variable listeners can show something important they need to do their research to show it is so. Perhaps @Floyd Toole could fill in some details about these variable listeners that isn't in his books or other writings.

Section 3.2 in the 3rd edition and the original papers explain what is being misconstrued in this discussion:
Toole, F. E. (1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc., 33. pp. 2-31.

Toole, F. E. (1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc., 34, pt.1, pp. 227-235, pt. 2, pp. 323-348.

The listeners who exhibited high variations were of special interest - why was it happening? Fortunately I had done audiometric tests on all listeners and these outliers were found to be those with hearing loss. This was in fact how I discovered and quantified the effects of degraded hearing. They were not included in the final product ratings because when hearing loss is involved there are very strong individual effects - i.e different biases in addition to increased variation in judgments - Figure 3.8 illustrates this quite well. Because about 75% of the population qualifies as falling into what I called the "normal" hearing range it seemed like a reasonable decision. From that point on listeners were audiometrically screened and those exceeding about 20 dB threshold elevation were not invited back. As shown in the data, there was a clear trend of degradation even within that small range. The "normal hearing" people, then and now, consistently identify the least colored, most neutral, loudspeakers as their preferences.

As pointed out in the book, hearing loss is an occupational hazard in the audio industry, so it is a significant factor.

Unfortunately, for individuals with significant hearing loss the popular preference may or may not be appealing. In a very real sense, nothing can be guaranteed for such people because they don't hear small details or defects (Figure 3.6). See Chapter 17 and other discussions of the effects of hearing loss, which include significantly altered perceptions of spatial effects (hidden hearing loss). Spatial effects rank with lack of colorations (neutrality) in overall sound quality ratings. I cannot imagine any path for loudspeaker manufacturers to address the needs of hearing impaired individuals, but the availability of EQ/tone controls and forms of amplitude compression and expansion may be useful for those able to understand how to use them. This after all is what is used in hearing aids.

I have close association with some people needing hearing aids, and understand much of what happens in that domain. I have sat with highly regarded audiologists explaining basic auditory perception from an audio perspective - their training and goals are almost exclusively guided by concerns of speech intelligibility. Of the several I interacted with, only a few truly understood compression and expansion. Fitting hearing aids can be as much an art as a science, especially when binaural and signal-to-noise effects are involved. Satisfaction is not guaranteed for individuals with profound loss. One must presume that the same applies to their judgments of loudspeakers.

All that said, I cannot think of a rational reason to begin with anything other than a neutral reproducer of sound as a baseline. We can measure neutrality. - it is hard data.

Blumlein 88 · Aug 8, 2019

Floyd Toole said:
Section 3.2 in the 3rd edition and the original papers explain what is being misconstrued in this discussion:
Toole, F. E. (1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc., 33. pp. 2-31.

Toole, F. E. (1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc., 34, pt.1, pp. 227-235, pt. 2, pp. 323-348.

The listeners who exhibited high variations were of special interest - why was it happening? Fortunately I had done audiometric tests on all listeners and these outliers were found to be those with hearing loss. This was in fact how I discovered and quantified the effects of degraded hearing. They were not included in the final product ratings because when hearing loss is involved there are very strong individual effects - i.e different biases in addition to increased variation in judgments - Figure 3.8 illustrates this quite well. Because about 75% of the population qualifies as falling into what I called the "normal" hearing range it seemed like a reasonable decision. From that point on listeners were audiometrically screened and those exceeding about 20 dB threshold elevation were not invited back. As shown in the data, there was a clear trend of degradation even within that small range. The "normal hearing" people, then and now, consistently identify the least colored, most neutral, loudspeakers as their preferences.

As pointed out in the book, hearing loss is an occupational hazard in the audio industry, so it is a significant factor.

Unfortunately, for individuals with significant hearing loss the popular preference may or may not be appealing. In a very real sense, nothing can be guaranteed for such people because they don't hear small details or defects (Figure 3.6). See Chapter 17 and other discussions of the effects of hearing loss, which include significantly altered perceptions of spatial effects (hidden hearing loss). Spatial effects rank with lack of colorations (neutrality) in overall sound quality ratings. I cannot imagine any path for loudspeaker manufacturers to address the needs of hearing impaired individuals, but the availability of EQ/tone controls and forms of amplitude compression and expansion may be useful for those able to understand how to use them. This after all is what is used in hearing aids.

I have close association with some people needing hearing aids, and understand much of what happens in that domain. I have sat with highly regarded audiologists explaining basic auditory perception from an audio perspective - their training and goals are almost exclusively guided by concerns of speech intelligibility. Of the several I interacted with, only a few truly understood compression and expansion. Fitting hearing aids can be as much an art as a science, especially when binaural and signal-to-noise effects are involved. Satisfaction is not guaranteed for individuals with profound loss. One must presume that the same applies to their judgments of loudspeakers.

All that said, I cannot think of a rational reason to begin with anything other than a neutral reproducer of sound as a baseline. We can measure neutrality. - it is hard data.

Thank you, that is what I remembered. Those with hearing loss were the same people with highly variable results. So excluding those makes plenty of sense. It would be like measuring something with a test instrument known to be of low sensitivity and erroneous readings.

It also might explain some of the seeming chaos in high end audio. Most people are older say late middle age at least. You would have to think some number of them have some acquired hearing loss by age and exposure. Even without accounting for biasing from all the other factors.

PierreV · Aug 8, 2019

Blumlein 88 said:
Stereophile reviewed a pair including modern measurements.
https://www.stereophile.com/historical/1005bozak/index.html

Wow! just wow! Assuming that tweeter response wasn't the result of aging, I am just shaking my head.

In general, I guess the glorious bass we remember from our teenage years was probably what we would see as muddy 80-100Hz thumping nowadays.

DonH56 · Aug 8, 2019

Blumlein 88 said:
Reminds me of a fellow I met who was selling some Acoustats. He had lots of exotic gear. His favorite speaker was the Bozak Grand Concert.
Here is one without the grill cloth. It is about 5 ft tall as I recall. Basically a fridge.

Ah, at last I can contribute to this long thread despite not having the intellectual or communication skills of the cognoscenti, let alone the patience...

So here it is, my lone significant contribution...

I had a friend who owned Bozak Concert Grands, and worked with a dealer who sold them, way back in the late 1970's or so. I have listened to them and stood beside them. My contribution is:

They are about 4' tall, just a little over, not 5' tall.

Everybody exaggerates.

Oh, they weighed over 200 pounds each IIRC. I had to move the bloody things for demos and such; store owner was an older gent (probably younger then than I am now, time flies) and had a bad back.

Out - Don

Thomas_A · Aug 8, 2019

Blumlein 88 said:
Reminds me of a fellow I met who was selling some Acoustats. He had lots of exotic gear. His favorite speaker was the Bozak Grand Concert.
Here is one without the grill cloth. It is about 5 ft tall as I recall. Basically a fridge.
View attachment 30917
A pair with the cloth.
View attachment 30918

Stereophile reviewed a pair including modern measurements.
https://www.stereophile.com/historical/1005bozak/index.html

Okay so the guy loved these because they played loud with lots of bass. And boy did they have lots of uncontrolled bass. And he loved you could hear what the tweeter, midrange and woofer were doing separately. As in you clearly heard three different speakers doing something in three defined frequency regions. The Acoustats were not loud even with a ton of power, had no rolling bass, and everything was homogeneous with all the frequencies coming all from the same place over the whole panel. He couldn't find enough bad things to say about how boring and uninteresting that sound was. So bland, yuck.

So he played his pride and joy Bozaks which to me were appallingly bad. When he told me they cost him $2500 in the 60's I was appalled even more. I didn't tell him all this. I was courteous about his pride and joy. But it wasn't easy. Ever since that experience when I hear the name Bozak I think the Bozo of speakers. To this guy they were the pinnacle of loudspeaker development. He had in mind selling the Acoustats to fund some McIntosh speakers. Like these. Woofer and midrange in one cabinet and line array of tweeters in another. He thought they might be good.
View attachment 30919

Strange how old speaker can measure so horribly. Reminds me of the opposite, an active speaker from 1978 with the ace-bass principle to reduce distortion in the bass region; Audio Pro A4-14. It did measure really well. My first speaker was a passive model of the same brand called 2-25.

svart-hvitt · Aug 8, 2019

Floyd Toole said:
Section 3.2 in the 3rd edition and the original papers explain what is being misconstrued in this discussion:
Toole, F. E. (1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc., 33. pp. 2-31.

Toole, F. E. (1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc., 34, pt.1, pp. 227-235, pt. 2, pp. 323-348.

The listeners who exhibited high variations were of special interest - why was it happening? Fortunately I had done audiometric tests on all listeners and these outliers were found to be those with hearing loss. This was in fact how I discovered and quantified the effects of degraded hearing. They were not included in the final product ratings because when hearing loss is involved there are very strong individual effects - i.e different biases in addition to increased variation in judgments - Figure 3.8 illustrates this quite well. Because about 75% of the population qualifies as falling into what I called the "normal" hearing range it seemed like a reasonable decision. From that point on listeners were audiometrically screened and those exceeding about 20 dB threshold elevation were not invited back. As shown in the data, there was a clear trend of degradation even within that small range. The "normal hearing" people, then and now, consistently identify the least colored, most neutral, loudspeakers as their preferences.

As pointed out in the book, hearing loss is an occupational hazard in the audio industry, so it is a significant factor.

Unfortunately, for individuals with significant hearing loss the popular preference may or may not be appealing. In a very real sense, nothing can be guaranteed for such people because they don't hear small details or defects (Figure 3.6). See Chapter 17 and other discussions of the effects of hearing loss, which include significantly altered perceptions of spatial effects (hidden hearing loss). Spatial effects rank with lack of colorations (neutrality) in overall sound quality ratings. I cannot imagine any path for loudspeaker manufacturers to address the needs of hearing impaired individuals, but the availability of EQ/tone controls and forms of amplitude compression and expansion may be useful for those able to understand how to use them. This after all is what is used in hearing aids.

I have close association with some people needing hearing aids, and understand much of what happens in that domain. I have sat with highly regarded audiologists explaining basic auditory perception from an audio perspective - their training and goals are almost exclusively guided by concerns of speech intelligibility. Of the several I interacted with, only a few truly understood compression and expansion. Fitting hearing aids can be as much an art as a science, especially when binaural and signal-to-noise effects are involved. Satisfaction is not guaranteed for individuals with profound loss. One must presume that the same applies to their judgments of loudspeakers.

All that said, I cannot think of a rational reason to begin with anything other than a neutral reproducer of sound as a baseline. We can measure neutrality. - it is hard data.

Thanks for clearing up! Having the author here is such a great asset for ASR!

From Toole (1986) I didn’t quite get it that 1/3 were suffering from impaired hearing. The description of the participants (“These listeners all had hearing threshold levels within to dB of the ISO audiometric zero at frequencies below 1 kHz, and within 20 dB, up to 6 kHz”) could be understood as the original 42, as well as the remaining 28?

It would have saved you a lot (33%) of work if those 14 “deplorables” were taken out prior to testing, if not the purpose of the measurements was to show that “deplorables” have irrelevant opinions too.

Interestingly, in democratic elections, it has been suggested to give voting rights only to people of certain characteristics. Those characteristics used to be wealth, sex, race - but in modern times it seems like focus on some sort of intelligence has replaced the sorting criteria of yesteryear.

In any case, defining “the deplorables” of listening tests as those of impaired hearing makes sense if Truth seeking is a goal. If Preference seeking is a goal, I am not quite certain if leaving out “a basket of deplorables” is the obvious way to go.

If I am not mistaken, it can be shown that some people can systematically hear beyond CD 16/44. Does that make Redbook an inferior format? At what time does the size of the deplorables become so big as to make their exclusion controversial when the goal is to uncover preferences?

Of course, if the goal is to chase Truth, one goes for the higher resolution of the original recording as well as the ideal sound reproduction gear characteristics only few of us can hear. As a Truth seeker one would chase a Benchmark amplifier before a Schiit amplifier even if Schitt reproduces beyond most people’s ability to express a preference. In a way, Truth is simpler than Preference as long as costs aren’t prohibitive.

Floyd Toole · Aug 8, 2019

svart-hvitt said:
Thanks for clearing up! Having the author here is such a great asset for ASR!

From Toole (1986) I didn’t quite get it that 1/3 were suffering from impaired hearing. The description of the participants (“These listeners all had hearing threshold levels within to dB of the ISO audiometric zero at frequencies below 1 kHz, and within 20 dB, up to 6 kHz”) could be understood as the original 42, as well as the remaining 28?

It would have saved you a lot (33%) of work if those 14 “deplorables” were taken out prior to testing, if not the purpose of the measurements was to show that “deplorables” have irrelevant opinions too.

Interestingly, in democratic elections, it has been suggested to give voting rights only to people of certain characteristics. Those characteristics used to be wealth, sex, race - but in modern times it seems like focus on some sort of intelligence has replaced the sorting criteria of yesteryear.

In any case, defining “the deplorables” of listening tests as those of impaired hearing makes sense if Truth seeking is a goal. If Preference seeking is a goal, I am not quite certain if leaving out “a basket of deplorables” is the obvious way to go.

If I am not mistaken, it can be shown that some people can systematically hear beyond CD 16/44. Does that make Redbook an inferior format? At what time does the size of the deplorables become so big as to make their exclusion controversial when the goal is to uncover preferences?

Of course, if the goal is to chase Truth, one goes for the higher resolution of the original recording as well as the ideal sound reproduction gear characteristics only few of us can hear. As a Truth seeker one would chase a Benchmark amplifier before a Schiit amplifier even if Schitt reproduces beyond most people’s ability to express a preference. In a way, Truth is simpler than Preference as long as costs aren’t prohibitive.

You said: "It would have saved you a lot (33%) of work if those 14 “deplorables” were taken out prior to testing, if not the purpose of the measurements was to show that “deplorables” have irrelevant opinions too."

The tests were conducted for the Canadian Broadcasting Corporation, assisting them in choosing small, medium and large monitor loudspeakers for use nationwide. Naturally they wanted to use their own recording engineers and producers as judges. I included some of my regular - experienced, but untrained as now - listeners who included interested colleagues and local audiophiles. Back then, and still, pro audio people think that they can learn to listen around their hearing damage. They were proved to be wrong. So, although it was not intended to include "deplorables" (please stop using that inappropriate and revolting term) it turned out to be a very valuable learning experience for all of us. Everybody went away with a new respect for hearing protection.

In fact, at the time I don't know how I would have been able to recognize the low-performing listeners in advance because nobody had published any data indicting a relationship between hearing loss and listening performance. I think it was a first - new knowledge.

So, listeners with hearing loss exhibit less consistency in their opinions of sound quality, and, perhaps, bias in those opinions. They still enjoy, and in many cases, create, music, but their opinions of reproduced sound quality are their own, not to be shared. I was an excellent listener for decades and continued to participate in listening tests from time to time while in my management position. However around age 60 I stopped. At Harman we track the performance of both loudspeakers and listeners (the F statistic). I was deteriorating, and frankly I was finding the task more challenging. I still very much enjoy music and I still have opinions but they are my own. Some people in these forums should do the same

svart-hvitt · Aug 8, 2019

Floyd Toole said:
You said: "It would have saved you a lot (33%) of work if those 14 “deplorables” were taken out prior to testing, if not the purpose of the measurements was to show that “deplorables” have irrelevant opinions too."

The tests were conducted for the Canadian Broadcasting Corporation, assisting them in choosing small, medium and large monitor loudspeakers for use nationwide. Naturally they wanted to use their own recording engineers and producers as judges. I included some of my regular - experienced, but untrained as now - listeners who included interested colleagues and local audiophiles. Back then, and still, pro audio people think that they can learn to listen around their hearing damage. They were proved to be wrong. So, although it was not intended to include "deplorables" (please stop using that inappropriate and revolting term) it turned out to be a very valuable learning experience for all of us. Everybody went away with a new respect for hearing protection.

In fact, at the time I don't know how I would have been able to recognize the low-performing listeners in advance because nobody had published any data indicting a relationship between hearing loss and listening performance. I think it was a first - new knowledge.

So, listeners with hearing loss exhibit less consistency in their opinions of sound quality, and, perhaps, bias in those opinions. They still enjoy, and in many cases, create, music, but their opinions of reproduced sound quality are their own, not to be shared. I was an excellent listener for decades and continued to participate in listening tests from time to time while in my management position. However around age 60 I stopped. At Harman we track the performance of both loudspeakers and listeners (the F statistic). I was deteriorating, and frankly I was finding the task more challenging. I still very much enjoy music and I still have opinions but they are my own. Some people in these forums should do the same

Thanks! That comment of yours is for the history books to write down. Much appreciated!

On «deplorables»: As you probably understood already, I like to provoke to get a clearer image of people’s positions. Sometimes, it also helps to use a certain word to see how it works in another setting where it’s not normally used. I always thought it was revolting to use the word «deplorables» on ca. 1/4 of the American population. But it’s quite common to have a desire to exclude some people, and it’s always been that way. We’re animals that should know better. In Truth seeking science, however, one has to make a more cynical choice, which is to exclude the noise and focus on the signal. Science is not democratic.

March Audio · Aug 9, 2019

svart-hvitt said:
- - - - - - - WARNING ! LONG POST ! PLEASE TAKE YOUR TIME - - - - - - -

Ok, @March Audio , you and others have brought up “The Toole Research” so many times, pointing to his time at NRC decades ago, that I think I will use an often-cited Toole article from that era to illustrate some of my points.

In “Loudspeaker Measurements and Their Relationship to Listener Preferences: Part 2” from 1986, written when Toole was at NRC, he wrote the following (see this link: https://www.pearl-hifi.com/06_Lit_A...blications/LS_Measurements_Listener_Prefs.pdf):

“In all, there were 42 listeners in these tests, but the data used here pertain only to the 28 who exhibited low judgment variability”.

First, the vox populi type of research started out with 42 participants. In many other sciences that’s a good start, but highly a number of samples that make a golden rule or constitute the final evidence for a general theory.

A bit later, Toole wrote:

“Not all listeners auditioned all loudspeakers and not all loudspeakers were included in each experiment”.

So we had 42 listeners that didn’t perform all combinations during the test setup. So there was incompleteness there, which an outside reader could not get a grip on.

But what’s more important has already been stated: One third of the participants were removed from the data set due to these participants having random and/or high varibility in their stated preferences.

A guy I met many years ago, who made his PhD under a professor who later became a Nobel laureate, told me: “My professor insisted that the data must speak for themselves even if I don’t like what the data is saying”.

What the young PhD student found, which he was encouraged to discover, uncover and make public for all to see by the older professor, was a phenomenon that the Nobel laureate later has called the “most embarrassing” thing that his own theory could face.

The research I quote above, Toole (1986), is often referred to as the evidence that Toole produced while at NRC. Do I need to explain people why 42 participants at the outset, which was reduced by 33 percent to 28 participants due to high “judgment variability” of the discarded participants, can only work as a good start to form an opinion, not a golden rule, or evidence for a general theory.

Now, let me get a bit more technical. What happens in a data set where you remove outliers, is that the average and the median in the data set start to converge. It looks as if Toole (1986) used simple averages in speaker characteristics, though he didn’t specify how he produced the preference scores. I have previously explained the power of the median, where you’ll see that a median (say speaker) will start to get the best average percentile rank over time (in a machine producing random output). This phenomenon works well when we talk about characteristics that are represented in a linear fashion, but I am uncertain how the method works when you have characteristics that draw upon several dimensions or contain more complexity (say colours, smell?). I am not saying that the power of median explained the results in Toole (1986) on frequency response, but it had the potential to do so in a data set that was shallow to begin with and was manipulated by discarding one third of the voices in the vox populi contest of speakers. So we cannot know if the preference for a flat frequency response in Toole (1986) is because preference=truth, statistical chance, due to manipulation of the data set or a demonstration of the power of the median.

Adding to the troubles in Toole (1986), the author has made a large and diverse set of technical measurements (frequency response, phase, power etc.) that are supposed to reflect upon, correlate with listener preferences. Isn’t it obvious that it may be only a few of these measurements that will be picked up by ear and brain in a given situation, while other specifications may need another test setup to reveal themselves as audible or good-bad preference wise? Did Toole (1986) use a shotgun to cast light on subtle speaker qualities that are best taken down by finer tools than the shotgun?

In discourses on ASR, there are certain characteristics that carry weight to separate the sinners and the believers. As soon as one quotes Toole (1986), it’s as if further inquiry needs to stop. This is not the fault of gentleman Toole, but the fault of his followers as if they belonged to a religious camp.

My point is that it cannot hurt to have a more open stance on questions where the consensus research is incomplete, as well as being more critical to consensus, like digging into how many samples constitute “evidence” in audio science.

What Toole (1986) could have shown, which would make for quite an interesting avenue for a research program guided by Preference as opposed to Truth, is what happens when you go down the alley of highly variable preferences, i.e. the one third of the population that Toole (1986) started out with. As I have previously shown, modern medicine is about to leave the idea of the average treatment for more focus on the individual. So an area, which draws upon the largest research funds by far (!?), has already come to the conclusion that one can do better than average in certain respects. If a research program is Preference seeking, as opposed to Truth seeking, I guess this line of thinking is highly interesting when one has the means to offer individualistic products instead of a one-size-fits all offering.

The fact that Toole (1986) discarded 1/3 of the population due to unwanted characteristics - i.e. highly variable preference - could also be interpreted as “evidence” for subjectiveness in the hobby. Paradoxically, one could argue that the subjectivists have had the science, i.e. Toole (1986), on their side all the time because their preferences cannot be met by the one-size-fits-all that would come out of vox populi research. If preferences of about one third of the population are so highly variable as Toole (1986) hinted, it doesn’t make sense for a rational man of diverging or variable preferences to settle for the average. Could this paradox lie behind another observation by Toole where he hints at highly variable preferences (see his book on sound reproduction):

“However, many of us have seen evidence of such listener preferences in the “as found” tone control settings in numerous rental and loaner cars”.
Source: Floyd Toole, Sound Reproduction (latest ed.), chapter 12.3

In other words, the rational subjectivist could have had science on his side all the time. No need to ridicule him, right? Should we reflect upon the norms and standard for discourse on ASR, after all?

I could also make the case that a follower of Preference seeking research will be more confused than the rational subjectivist. While the rational subjectivist will know his preferences, to the extent that he owns a highly coloured setup suited to his non-variable preferences for all of his lifetime, the follower of Preference seeking research will be given new signals all the time that his setup is diverging from the New neutral (in term of preferences). Note that the preferences of a population can change even if the individual’s preferences are stable! So the follower of Preference seeking research will change his gear by every cycle. It’s as if the true believer in Preference seeking science would need to start listening to hip hop in the 2000s, while he listened to jazz and blues in the 1960s.

So who is more rational? The rational subjectivist or the follower of Preference seeking research?

Needless to say, wouldn’t it be more simple if research and science were Truth seeking, not Preference seeking?

For goodness sake. The research was continued beyond the NRC as you fully well know. There are valid reasons to exclude certain participants.

So can you quit your clear and repeated anti Toole research bias?

It's tedious and unjustifiable. In simple terms the “truth" correlated with preference. End of.

Also STOP saying that we are not open minded. It is patronising and plain wrong. You are conflating this with your inability to convince people to draw a different conclusion.
You have simply not established any compelling reason to disregard this or other data you are referring to. As I said, this is just an exercise in intellectual masturbation on your behalf.

It's clear you are on a mission to prove something, but you are failing miserably. Your endlessly lengthy posts don't demonstrate your intelligence. They do obfuscate your lack of cogent position.

We see through that however.

BDWoody · Aug 9, 2019

svart-hvitt said:
I agree that themes have been shifting, to appear as if it’s all random. This is much like a creative process where researchers come up with new ideas. Research and science by diktat never worked, so some degree of freedom is a good, not a bad, right?

The overarching theme of norms and standards of discourse has come up several times, though. In the latest post I showed how an omission in Toole (1986) could arguably be key to our understanding of that work; and omission was a point I started out with. So a certain tendency of concepts used underway can be seen in all the «randomness»?

Having said that, some of the concepts discussed or developed in course of the thread are:

=> Vox populi research
=> Truth speaker
=> Preference speaker
=> Camps of audiophiles and their behaviour
=> «Basket of deplorables» in listening tests
=> Existence of gold standard
=> Is audio science Hard or Soft?
=> Statistics and the power of median to mediocrity or truth
=> Review of a pillar in audio research, Toole (1986)

There’s probably more.

So I hope there have been both intended and unintented effects and side effects as the thread mounted in size.

As for the overarching aim of being more open to potentially important ideas without statistical significance support, can the above mentioned themes and concepts be seems as a terminology to open doors that used to be closed?

Floyd Toole said:
You said: "It would have saved you a lot (33%) of work if those 14 “deplorables” were taken out prior to testing, if not the purpose of the measurements was to show that “deplorables” have irrelevant opinions too."

The tests were conducted for the Canadian Broadcasting Corporation, assisting them in choosing small, medium and large monitor loudspeakers for use nationwide. Naturally they wanted to use their own recording engineers and producers as judges. I included some of my regular - experienced, but untrained as now - listeners who included interested colleagues and local audiophiles. Back then, and still, pro audio people think that they can learn to listen around their hearing damage. They were proved to be wrong. So, although it was not intended to include "deplorables" (please stop using that inappropriate and revolting term) it turned out to be a very valuable learning experience for all of us. Everybody went away with a new respect for hearing protection.

In fact, at the time I don't know how I would have been able to recognize the low-performing listeners in advance because nobody had published any data indicting a relationship between hearing loss and listening performance. I think it was a first - new knowledge.

So, listeners with hearing loss exhibit less consistency in their opinions of sound quality, and, perhaps, bias in those opinions. They still enjoy, and in many cases, create, music, but their opinions of reproduced sound quality are their own, not to be shared. I was an excellent listener for decades and continued to participate in listening tests from time to time while in my management position. However around age 60 I stopped. At Harman we track the performance of both loudspeakers and listeners (the F statistic). I was deteriorating, and frankly I was finding the task more challenging. I still very much enjoy music and I still have opinions but they are my own. Some people in these forums should do the same

March Audio said:
For goodness sake. The research was continued beyond the NRC as you fully well know.

Can you quit your clear and repeated anti Toole research bias?

It's tedious and unjustifiable. The “truth" correlated with preference. End of.

Also STOP saying that we are not open minded. It is patronising and plain wrong. You have simply not established any compelling reason to disregard this or other data you are referring to. As I said, this is just an exercise in intellectual masturbation on your behalf.

It's clear you are on a mission to prove something, but you are failing miserably. Your endlessly lengthy posts don't demonstrate your intelligence. They just obfuscate your lack of cogent position. We see through that however.

I've learned a lot in this thread from the incredible depth of people who actually chose to give the OP a lot more mental energy than he deserved.

Unfortunately, I had to wade through an almost indescribably self important, self indulgent stream of consciousness from someone with that dangerous, tenuous grasp of too few facts, combined with some inner belief system that almost makes it manic...but I digress.

Thanks to all of the responders. Learned a lot from you.

March Audio · Aug 9, 2019

svart-hvitt said:
I am quite surprised you and others don’t get the point.

On ASR, subjectivists and people who question research like Toole (1986) are often met with hostility. This is a reaction to be understood in a social context; we’re animals, after all..

We get your points, well after decoding them from the diahorea of text. We just don't agree with them.

No they aren't met with hostility.

However, as with any other subject, if they have a different POV or if they disagree they are expected to provide cogent and demonstrable reasons why.

If they can't then they may well receive a "stop wasting our time" response.

Currently you are getting that response. Is there anyone in this thread agreeing with your position? Perhaps you should reflect on on that. I can assure you it has nothing to do with social context.

NORMS AND STANDARDS FOR DISCOURSE ON ASR

Major Contributor

Grand Contributor

Grand Contributor

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Major Contributor

Grand Contributor

Addicted to Fun and Learning

Grand Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Master Contributor

Chief Cat Herder

Master Contributor

Similar threads