• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Four Speaker Blind Listening Test Results (Kef, JBL, Revel, OSD)

OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
Are you able to post the raw scores? Like to do a bit of statistical analysis on it.

I've attached the Excel file with the raw results.

Please note, Listener 2 and Listener 5 did not rank on a 10 point scale. They ranked them in order of preference 1-4. I excluded their data from my analysis and graphs. There were 12 total participants, but only 10 participants data was used.
 

Attachments

  • BlindSpeakerTestResults.xlsx.zip
    15.9 KB · Views: 133

FeddyLost

Addicted to Fun and Learning
Joined
May 24, 2020
Messages
752
Likes
543
To EQ the speakers as good as possible (FR not just level) and re-test, to see how much audible difference remains.
I think it's a good idea for research center of some manufacturer (i.e. choosing dsp preset) , not for evaluation of commercially available speakers with mono downmix at reasonable level.
Difference will be too small for good results.
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
Did you measure the speakers in the room? Just curious as to what that might look like. Very enjoyable to read about. Thanks for sharing.

I didn't--I briefly considered it, but ran out of time. If I set the test up again for other folks, I'll make sure I do.

I do have this graph of the Revels playing with my 2 subwoofers engaged. I used MSO to optimize their response. There isn't any EQ applied above 180hz. The room is only about 11x11 so it suffers some pretty miserable room modes. The EQ on the multiple subwoofers really works some magic. I've graphed the predicted in room response on top of the measured response. At some point, I've got another big post that details the build and measurements.

Spinpredictvsmeasured.jpg
 

respice finem

Major Contributor
Joined
Feb 1, 2021
Messages
1,867
Likes
3,777
I think it's a good idea for research center of some manufacturer (i.e. choosing dsp preset) , not for evaluation of commercially available speakers with mono downmix at reasonable level.
Difference will be too small for good results.
I mean EQing for FR in the given room, and then finding out how much difference remains. This is something manufacturers could do in their facilities, comparing with competitions' products, and maybe even do, but probably not many would want to publish the results (just a speculation).
I think this would be interesting, because a) most of us will be listening in "normal", untreated or partly treated rooms, b) to music and not test signals, and c) I guess many of us are using room EQ/DSP, or at least have the possibility to do it.
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
I mean EQing for FR in the given room, and then finding out how much difference remains.

Above the transition frequency, we wouldn't want to apply EQ based upon room measurements. We can EQ off the anechoic data though--which we have for all of these speakers. It's on my to-do list to build an above the transition frequency EQ based upon the spin data for the Revels and do a blind comparison in room.

I tried applying EQ to the OSD speakers for giggles and it was a disaster. If you try to fix some of the issues you end up with some scary sounds coming out--it might be distortion, but it sounds more like a dying animal.

I've already mostly EQed the room modes as best as I can with only 2 subwoofers. You can see the results above.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
Pretty awesome experiment.
Unfortunately it demonstrates that untrained/inexperienced listeners were unable to reliably differentiatet their loudspeaker preferences. In other words, the predicted pref scores didn't really predict their blinded preferences. The predicted pref scores were only predictive with a subset of your listener sample.
My takeaway was that predicted preference scores may not be terribly helpful for predicting the preferences of the average consumer. Ouch.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
Average rating across all songs and participants:
Revel W553L: 6.6
KEF Q100: 6.2
JBL Control X: 5.4
OSD 650: 5.2


Plotted:
View attachment 147692

You can see that the Kef and Revel were preferred and that the JBL and OSD scored worse.

No I don't see that. The medians are so close and theres so much overlap on your box and whiskers plot that my initial interpretation was that your listeners were unable to reliably differentiate between the 4 speakers under blind conditions. You'd have to show statistics to say that any speaker was preferred.
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
No I don't see that. The medians are so close and theres so much overlap on your box and whiskers plot that my initial interpretation was that your listeners were unable to reliably differentiate between the 4 speakers under blind conditions. You'd have to show statistics to say that any speaker was preferred.

I posted the raw data in post 22.

I think @amirm is going to run some additional analysis. Please weigh in with your own analysis as well.
 

sprellemannen

Active Member
Forum Donor
Joined
Jul 21, 2018
Messages
259
Likes
554
The "results", that is the bar graphs, of the blind test is of very little value: There is no statistical analysis and the results may be due to random variation. I do not think there is a statistical justification of the test's "quasi-statistical setup".
As a statistician, I advice Amir to remove this "test" from ASR's list of reviews.
 

MCH

Major Contributor
Joined
Apr 10, 2021
Messages
2,642
Likes
2,252
Very interesting indeed, but in principle I agree with preload and sprellemannen: If one wants to extract conclusions, better carry an statistical analysis. Otherwise at first sight that graph.... could mean there is no difference.
But in any case, very interesting and surely good to initiate a debate about speakers from which I am sume many of us (at least I) will learn a thing or two! :D
 

DuncanTodd

Active Member
Joined
Nov 2, 2020
Messages
226
Likes
145
Well done!
I'm curious why was Hunter was added. Is it a track that frequents in other listening tests or chosen for certain characteristics?
It's my own go to track that I picked fairly at random years ago as I do 0 listening to typical audiophile tracks. For me, the LF and strings parts in it are what makes it an interesting test track
 

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,873
Likes
16,833
Here is a photo of what the setup looked like after we unblinded and presented results back to the participants:
index.php
Nice work and effort, so first of all sincerely thank you for that, on the other hand what I have to criticise is the placement of the loudspeakers on the inner side of the table, please place them at least on its front edge next time (even better would be on seperate stands) as the large close and reflective horizontal surface will muddle up their FR and imaging.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,595
Likes
239,612
Location
Seattle Area
I think @amirm is going to run some additional analysis. Please weigh in with your own analysis as well.
I ran a quick F statistics test between a couple of samples. First was JBL Control X against KEF Q100. That difference is quite statistically valid with p value of 0.001.

Same test against KEF and Revel gives a p value of 0.1 so 90% chance that the difference is genuine. So it fails the typical p = 0.05 or 95% confidence.
 

FeddyLost

Addicted to Fun and Learning
Joined
May 24, 2020
Messages
752
Likes
543
mean EQing for FR in the given room, and then finding out how much difference remains
Then we would have to decide how exactly and what will be EQ'd and this just adds up a lot of variables. Automatic DRC will measure and apply different filters with unpredictable results.
For example, it can boost LF in sealed speakers and with low spl this make them more preferable than they are, but at some higher spl they will bottom out.
Manual EQ will require good understanding what is going on in this room and IMO you need to have speakers at the same point in room.
So, briefly - i think this will take a lot of effort and move us far away from speakers as main subject of investigation.

I guess many of us are using room EQ/DSP, or at least have the possibility to do it.
Not much really as i can see.
Auto DRC in AVR most probably. Conscious customer is a rare species now.

Unfortunately it demonstrates that untrained/inexperienced listeners were unable to reliably differentiatet their loudspeaker preferences
For sure. They don't have hard reference (live sound), soft reference (good studio sound) or any understanding what is "good mastering". Speakers are artistic tools for them.
I think good idea is to use record of familiar voices. Otherwise newbies will be unable to define "neutrality".
 

maty

Major Contributor
Joined
Dec 12, 2017
Messages
4,596
Likes
3,167
Location
Tarragona (Spain)
KEF Q100 sounds better with bass-reflex closed. And some mods too, but the first is easy as the foam plugs come with the loudspeakers.

If you do no have a subwoofer, you can compensate for the loss of bass with a little equalization.

IMG_2589.png
 

PeteL

Major Contributor
Joined
Jun 1, 2020
Messages
3,303
Likes
3,846
I'm curious as to whether A weighting wouldn't be a better choice. After all, if Speaker A had a bit more bass than Speaker B, using C weighting will mean the sensitive range 1-5k will probably be quieter for Speaker A during the listening tests. Wouldn't that (in general, if not every single time) lead to a preference for Speaker B, by dint of being set to play louder in the ear's sensitive band?

Applying my theory to the in-room responses you show, I would predict a preference order of KEF first, then Revel (close), then JBL, then OSD (with its suppressed 1-3 kHz).

That holds pretty close to your listening test result. Which means the preference order might have been due to the use of C weighting instead of A weighting for the level matching.

Interesting?
In a discussion that I can't find anymore Amir was quoting some excerpts from an Olive study recommending C weighting for level matching, but I don't know what's the theory behind that.
Edit: sorry B weighting was mentioned. It's a bit more unusual, not meters have that.
 
Last edited:

Ellebob

Senior Member
Forum Donor
Joined
Nov 21, 2020
Messages
368
Likes
573
I would be curious to see rankings with a sub to reduce the difference of bass between the speakers.
 
Top Bottom