• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. We now measure and review equipment for free! Click here for details.

Four Speaker Blind Listening Test Results (Kef, JBL, Revel, OSD)

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
This has been the topic of countless research projects and peer reviewed papers. The correlation between key aspects of objective anechoic measurements and listener preference is strong. Ignore it at your own peril just as you would some medication because it did not have 100% uniform efficacy for the entire population!
My job is in clinical trial issues. You can tell me about psychoacoustics but please refrain from teaching me about trials.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
Yep. The term is "medical reversal." And in medicine there are also lots of examples where the results of smaller studies ARE subsequently confirmed by larger ones. The hesitation to act and change practice based on results of the smaller studies, can in some cases, deprive patients of the benefits of an otherwise effective therapy. And even once definitive trials are published, it can often take years before it translates into common medical practice (17 years is the commonly cited figure). So let's not automatically dismiss the results of this listening test on the basis of its sample size alone - well know that it isn't perfect, but as someone else pointed out, this is a hobby, not life and death.
I’m sure that you know how much work is necessary for validation of results, why the FDA requires two large and well controlled studies, etc.

My point is that you have to be careful when interpreting smaller and poorly controlled experiments. I did find this one interesting but not overwhelming to me.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
This has been the topic of countless research projects and peer reviewed papers. The correlation between key aspects of objective anechoic measurements and listener preference is strong. Ignore it at your own peril just as you would some medication because it did not have 100% uniform efficacy for the entire population!
It is still PREFERENCE.
 

Semla

Active Member
Joined
Feb 8, 2021
Messages
169
Likes
303
I’m sure that you know how much work is necessary for validation of results, why the FDA requires two large and well controlled studies, etc.

My point is that you have to be careful when interpreting smaller and poorly controlled experiments. I did find this one interesting but not overwhelming to me.
The sample size needed for a given test is related to both the size of the effect and the variability of the outcome. The sample size for this particular outcome in this study is not small - more than 200 observations made by 10 listeners (the effective sample size is a bit smaller due to the grouping structure). That's why there is a statistically significant different score between certain speaker pairs.

In addition this is not the first experiment in this field - it builds on and confirms the results from all those earlier studies at Harman.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
@Semla. Glad you commented. How did you calculate the scores per speaker? Did you use the 50 observations per speaker or you used the average per person?
You also know that you want to have a stat plan BEFORE the experiment to make sure that you don't find the results you want by peeling the onion.

Again, I am pleased with the results. But they are not conclusive to me.

Harman/Toole's work is also interesting. The premise was that a speaker with a flat frequency response in an anechoic chamber and with certain dispersion characteristics when placed in a simulated living room would be preferred by a set of "educated" listeners. A preference score is reached by scoring various different attributes. I don't know if Toole used step wise regression or whatever. But he states that certain characteristics contribute to the preference score and that the score is not "perfect". It gives a pretty good number.

I FOLLOW that preference in my speaker choice because I (hope) think that in the recording process they will use speakers with those characteristics to produce it. Thereby, I will get in my room the sound that the "musicians and producers" chose for their recordings.

But to follow those guidelines doesn't mean that I am not aware of the potential limitations of the work.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
...within the confines of a bell curve. Don't make it sound like it's wildly all over the map. The research shows people mostly like the same attributes in sound - that is, neutral and uncolored.
Who said it was a bell curve?
And no, most people do NOT like the same attributes. This is why Toole published the different curves.
Certain speaker manufacturers make their speakers sound like the "uneducated" preference curve because of commercial reasons. There are probably more uneducated listeners in the world...
 

beefkabob

Major Contributor
Forum Donor
Joined
Apr 18, 2019
Messages
1,056
Likes
1,278
@Semla. You also know that you want to have a stat plan BEFORE the experiment to make sure that you don't find the results you want by peeling the onion.

Well, if you work for a drug company, you have a stat plan before the experiment to make sure that the results you want are guaranteed.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
Well, if you work for a drug company, you have a stat plan before the experiment to make sure that the results you want are guaranteed.
Nope. That is not the reason you do this. ANY experiment will have a stat plan BEFORE you initiate it. You can never "guarantee" the results. This is one more reason you want the stat pan. You want to make sure that you are doing the proper analysis. How many times have you seen publications when the primary outcome failed and then state that any other analysis is speculative because it is secondary and the primary failed? It happens a lot. Look at studies in lupus. Most fail.
 

Semla

Active Member
Joined
Feb 8, 2021
Messages
169
Likes
303
@Semla. Glad you commented. How did you calculate the scores per speaker? Did you use the 50 observations per speaker or you used the average per person?
See post #64 for a full description of the model.

You also know that you want to have a stat plan BEFORE the experiment to make sure that you don't find the results you want by peeling the onion.
Ideally, yes. But in reality even late-stage and other confirmatory studies are designed for a limited number of primary outcomes. Despite the regulated context lots of ad-hoc analyses are performed both on primary and secondary outcomes, even though from a purely theoretical point of view they should be considered worthless. That is clearly not the case.

In an ideal world every experiment would be properly designed and planned for every single outcome. But one needs to recognize that this is not the reality. Interesting things have been learned from badly designed experimental setups, studies without a formal statistical analysis plan, even from weird lab accidents etc. At the very least new hypotheses will be generated. The goal of a statistical analysis is to enable a decision under uncertainty - it needs to be good enough, it does not need to be perfect (and it will never be perfect).

Again, I am pleased with the results. But they are not conclusive to me.
Experiments are always performed in a context, in this case the Harman studies. One needs to look at evidence in this wider context, maybe perform a meta-analysis if you want to be quantitative. If you're still not convinced you can now design your own confirmatory experiment thanks to @MatthewS's hard work. He has shared all the data you need to come up with an appropriate design, model and sample size requirements. I hope that this work serves as an inspiration for other members.
 

Chromatischism

Major Contributor
Forum Donor
Joined
Jun 5, 2020
Messages
2,466
Likes
1,847

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
1,287
Likes
827
Location
Sweden
See post #64 for a full description of the model.


Ideally, yes. But in reality even late-stage and other confirmatory studies are designed for a limited number of primary outcomes. Despite the regulated context lots of ad-hoc analyses are performed both on primary and secondary outcomes, even though from a purely theoretical point of view they should be considered worthless. That is clearly not the case.

In an ideal world every experiment would be properly designed and planned for every single outcome. But one needs to recognize that this is not the reality. Interesting things have been learned from badly designed experimental setups, studies without a formal statistical analysis plan, even from weird lab accidents etc. At the very least new hypotheses will be generated. The goal of a statistical analysis is to enable a decision under uncertainty - it needs to be good enough, it does not need to be perfect (and it will never be perfect).


Experiments are always performed in a context, in this case the Harman studies. One needs to look at evidence in this wider context, maybe perform a meta-analysis if you want to be quantitative. If you're still not convinced you can now design your own confirmatory experiment thanks to @MatthewS's hard work. He has shared all the data you need to come up with an appropriate design, model and sample size requirements. I hope that this work serves as an inspiration for other members.

Agreed. The results of this small study follows previous ones perhaps with some exceptions, and while a correctly made meta-study is of highest value, there are also logical factors involved having speakers with a smooth frequency response. Comparisons are not without complications though since speakers may be designed with different room placement requirements. For example, comparison of Carlsson speaker with a conventional speaker would not be fair if the prerequisite is that the speakers must be placed in the same position.
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
@Semla. I agree with what you have stated. No worries. As I said many times, this is an interesting observation. I have seen many studies that had no p[lan and found new things, of course. I also know of studies that have been called milestone and were never able to be replicated. I am sure you are aware of Amgen's project to replicate many cancer studies and found that too many could not be replicated.

Dom Purpura, who was the Dean of the Albert Einstein School of Medicine told me this joke.

God sent an application for a grant to the NIH. After three years he gets a letter back from the NIH that states.
"Your proposal was fascinating, the advisory groups really like what you proposed but after careful consideration, unfortunately, we will not support your work for three reasons.
1. You published your work but it was not in a peer review journal.
2. Your work was done a very long time ago.
3. No one has replicated your results."

It encompasses a lot of real criticism of scientific work.
 

KEFCarver

Member
Joined
Mar 29, 2021
Messages
18
Likes
21
Location
Tucson, AZ
Sounds Like you had a fun time doing this and thanks for sharing your process and results. I have no complaints with how you did it or your results, but I only took one course in statistics and probability in college, and that was many, many moons ago. Hopefully you will try it again with some larger speakers :D
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
39
Likes
244
Location
Greater Seattle
@SemlaI also know of studies that have been called milestone and were never able to be replicated. I am sure you are aware of Amgen's project to replicate many cancer studies and found that too many could not be replicated.

The level of "thumbing the scales" that occurs in drug trials during cohort selection is not to be underestimated.

Anyone that wants to learn a lot about how to review medical studies might want to start here: Studying Studies: Part I – relative risk vs. absolute risk
 

CtheArgie

Active Member
Forum Donor
Joined
Jan 11, 2020
Messages
237
Likes
292
Location
Agoura Hills, CA.
@MatthewS , funny you bring this up. I had a huge issue with one company because they were designing a study using relative risk and I was trying to convince them that the absolute risk made the study unlikely to be successful. Too many patients with low absolute risk. So even if they reached the relative risk reduction it wasn’t going to work. They were unhappy with me. The study failed because of this. They didn’t want to enter “riskier” patients, those with a higher absolute risk as it would have delayed enrollment.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
34,981
Likes
130,229
Location
Seattle Area
My job is in clinical trial issues. You can tell me about psychoacoustics but please refrain from teaching me about trials.
Psychoacoustics science is created using listening tests. Since you don't seem to know that I suggest not making remarks like that. Your work has little in common with this research.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
34,981
Likes
130,229
Location
Seattle Area
Harman/Toole's work is also interesting. The premise was that a speaker with a flat frequency response in an anechoic chamber and with certain dispersion characteristics when placed in a simulated living room would be preferred by a set of "educated" listeners.
For the third time, no. People from every background have been part of this research which well predates Harman.
 
Top Bottom