• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Four Speaker Blind Listening Test Results (Kef, JBL, Revel, OSD)

uwotm8

Senior Member
Joined
Jul 14, 2020
Messages
406
Likes
463
Thanks! Great job done here.
However, I'd change some aspects of test.
First of all, music. Such tracks/generes generally sound good (or acceptable at least) on almost every speaker.
Test Tracks:
Fast Car – Tracy Chapman
Just a Little Lovin – Shelby Lynne
Tin Pan Alley – Stevie Ray Vaughan
Morph the Cat – Donald Fagen
Hunter – Björk

The test tracks were all selected from Harman’s list of recommended tracks except for Hunter
Add some metal (or stoner) tracks and enjoy half of speakers (if not all) failing in variety of aspects.

The second: one-speaker listening tests. No offence but IMO it's just not_right. You're totally missing both real spatial image differencens (provided by directivity, reflections, etc; whatever, it just exists and you can't ignore it). At the same time, tonality is affected too (1 speaker vs 2 speakers in real world/room).
 

Bear123

Addicted to Fun and Learning
Joined
Nov 27, 2019
Messages
796
Likes
1,370
The second: one-speaker listening tests. No offence but IMO it's just not_right.

Do you have any data to back up this opinion? There is a lot of data that shows mono listening accurately predicts listener preference under all conditions(mono and stereo), but reveals preference more easily and accurately. Are you familiar with the research? Just saying, this topic has been covered by folks who spent their life's work researching subjective loudspeaker preference….it may be a bit brazen to say "nuh-uh" based simply on….."I just don't think so"……..
 

sweetsounds

Active Member
Joined
Apr 24, 2019
Messages
141
Likes
283
Thanks so much for this test and proper execution!
Also the choice of music is right, although I agree with @uwotm8 that I like to have one hard rock piece inside, even if it is AC/DC.

From my experience for BookShelf speakers you will first of all hear the bass extension. Here, the Revel being an in-wall speaker should have advantages, also it apparently was mounted higher making it stand out from the crowd.

I was nevertheless shocked at the results. It means that only trained, musical individuals are able to discern a good speaker from a bad speaker; the OSD has shameful measurements.

My few disagreements with Toole, who nevertheless is the reference in objective assessments:
- all testers are the same: people don't have similar taste (see horns vs. electrostats). People's preference is derived from attention to different elements of music (details vs punch, voice vs drums) with many casual listeners preferring a bath-tub shape of frequency response for punch and detail. Listener will rate a reproduction favorably when they find their preferred element pronounced.

- definition of good reproduction: the highest compliment happened when a guest in the kitchen told me. "I just walked into your living room because I thought someone is playing piano there but it was just the stereo". This realism is what I look after, which not necessarily is the best sound.

- capability to identify good reproduction: perception how music should be reproduced is skewed by the omnipresence of sound through phones, head-phones and car radios, which have defined the norms how reproduced music should sound. Artefacts like compression and loudness become sought features.
One nice quote from one of the Audiophiliac interviews with a sound engineer was, that at concerts musicians asked him "to make me sound on stage like I sound on the recording" :)
Prime example of preferred reproduction for me are records, which I rarely used and sound odd to me with all their noise floor. Yet, they are the reference for many.
As such, the Harman's usage of their own employees as judges is doubtful: aren't they looking for the house sound in a speaker reproduction?
Is it OK to re-use the same listener specimen over and over again in blind tests to classify results as representative?

- usage of mono only: starting at a certain quality of speakers, I personally can better discern them on their spatial and timing differences, not on the spectral behavior. For many musicians it might be different. So for me stereo is the way to test speakers and the room and position become very important.
 

Semla

Active Member
Joined
Feb 8, 2021
Messages
170
Likes
328
It's actually much better than it looks like at first sight. You must adjust for repeated measures for listeners though.

A quick and dirty analysis shows that you can find statistically significant differences between the Revel and the JBL (p = 0.0068), Revel and OSD (p = 0.0008) and the KEF and OSD (p = 0.028). p-values from an additive mixed model with speaker and song as covariates, KR degrees of freedom and Bonferroni post-hoc correction.
 

PeteL

Major Contributor
Joined
Jun 1, 2020
Messages
3,303
Likes
3,846
Do you have any data to back up this opinion? There is a lot of data that shows mono listening accurately predicts listener preference under all conditions(mono and stereo), but reveals preference more easily and accurately. Are you familiar with the research? Just saying, this topic has been covered by folks who spent their life's work researching subjective loudspeaker preference….it may be a bit brazen to say "nuh-uh" based simply on….."I just don't think so"……..
And this research is?
 

magicscreen

Senior Member
Joined
May 21, 2019
Messages
300
Likes
177
So was this a failed blind test?
I see nobody was able to reach the 10/10 result.
So all speakers sound the same?
 

PeteL

Major Contributor
Joined
Jun 1, 2020
Messages
3,303
Likes
3,846
So was this a failed blind test?
I see nobody was able to reach the 10/10 result.
So all speakers sound the same?
The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?
 

sweetsounds

Active Member
Joined
Apr 24, 2019
Messages
141
Likes
283
Some analysis of the data. Here are the max, min and average scores for each speaker per listener:
Score by Listener.gif


Score by speaker and song w/o the wrong rating listeners:
Score by Song.gif


Quite a spread, but Kef and Revel preferred, but a preferential pattern per listener visible.
 

uwotm8

Senior Member
Joined
Jul 14, 2020
Messages
406
Likes
463
Do you have any data to back up this opinion
My personal listening experience with one and two speakers. It sounds different for both mono and stereo recordings.
How can I evaluate virtual scene/spatial qualities with one speaker, one point (not really but ok) source?
What's more important, in typical real environment tonality may and will change too - with two speakers.
Are you familiar with the research? J
Unfortunately yes and I find it rather made-up to justify their vision. Same as Harman target for headphones - it made things better for 1 of 4 headphones I tried it on (peaking AKG 701), others turned dull, bass-heavy and boring. That said, thx Harman, a Samsung company, but I don't like what you're selling me:)

To be clear: I'm neutral to Harman researches but not to zealots who promote it like it's a Holy Grail and The Only Way of True Sound:p
Btw as a commercial sound-for-the-masses I find Harman preference target just perfect.
 
Last edited:

FeddyLost

Addicted to Fun and Learning
Joined
May 24, 2020
Messages
752
Likes
543
Add some metal (or stoner) tracks and enjoy half of speakers (if not all) failing in variety of aspects
For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre.
Both issues are really troublesome.
To discern record effects vs speaker distortion some sympho metal with live orchestra and choir and decent DR might be preferable.
 

Audioagnostic

Member
Joined
Dec 1, 2018
Messages
88
Likes
115
This is very interesting but there are imho some methodological concerns.

There is an incredible spread in scores between the different reviewers. Some reviewers consistently rate lower than others. This could be corrected by normalizing the scores in some way I guess. However, the different " calibration" of the listeners is a problem that makes interpretation difficult.

If you are doing a follow up experiment I suggest to do a triangular test. Play the same track once through 1 speaker and twice through another speaker. Have the listeners decide which is the odd one out. If you take speakers with comparable Klippel preference scores this would allow you to answer the question if these scores are indeed predictive of preference.
 

dualazmak

Major Contributor
Forum Donor
Joined
Feb 29, 2020
Messages
2,850
Likes
3,045
Location
Ichihara City, Chiba Prefecture, Japan
Full orchestra in ppp to fff, some tracks with soprano solo and nice choir; how about this CD?

Schubert "Rosamunde (Complete)", Kurt Masur (conductor), Gewandhausorchester Leipzig, with Elly Ameling (sopraso) and Rundfunkechor Leipzig (Leipzig Radio Chorus), Philips ASIN: B00000E2SS;
WS002413.JPG


I always use this CD as a reference of full orchestra sound with soprano solo and chorus.

This album was recorded in December 1983 with really amazing recording quality. The first track "Overture" is pp to fff full orchestra sound with very nice 3D perspectives, the sound is really amazing; I feel as if I am sitting on the best seat in the Neues Gevanndhouse Leipzig. Among the very comfortable silky but vivid and gentle orchestra sound, we can identify each individual violinist in the full orchestra. Really excellent sound quality would be given the well tuned audio system, and again amazing recording quality.

Elly Ameling sang only in track-5 "Romance" for just 3 min 47 sec and her posture and beloved voice were best seen and heard...

The famous ppp orchestra piece of track-7 "Ent'acte to Scene 3" is always a great challenge to audio system for orchestra string sound in good S/N without distortion.

Track-9 "Chorus of Shepherds" is another challenge to audio system for the balance of 4-part chorus and orchestra in the nice acoustic hall. In the middle of the track, each of the solo singer from soprano, alto, tenor and bass sing in the center-back of the stage, and we need excellent 3D perspectives and sound resolution for very much impressive listening experience.
 

Semla

Active Member
Joined
Feb 8, 2021
Messages
170
Likes
328
For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre.
Both issues are really troublesome.
To discern record effects vs speaker distortion some sympho metal with live orchestra and choir and decent DR might be preferable.

I checked it for this particular experiment.

There is no difference between songs; and none of the speakers are particularly better for any specific song (i.e., neither the main effect nor the interaction term for song was significant). You could say that at least for this experiment there is no effect for genre.
 

Semla

Active Member
Joined
Feb 8, 2021
Messages
170
Likes
328
There is an incredible spread in scores between the different reviewers. Some reviewers consistently rate lower than others. This could be corrected by normalizing the scores in some way I guess. However, the different " calibration" of the listeners is a problem that makes interpretation difficult.

That's why you need to adjust for repeated measures (technically speaking, include listener as a random intercept in the statistical model). There is a difference between listeners, but they are quite consistent in their own ratings.
 

uwotm8

Senior Member
Joined
Jul 14, 2020
Messages
406
Likes
463
For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre
Not really. If there's some MF-HF coloration you'll note it immediately and it sometimes turns out just unacceptable while you still can enjoy non-heavy genres on that speakers. The closer we come to pink noise in terms of spectre saturation the harder it gets:)
 

magicscreen

Senior Member
Joined
May 21, 2019
Messages
300
Likes
177
The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?
Thanks. Now I understand everything.
So you need only 10/10 result when you compare cheap basic and expensive snake-oil cables.
 

beaRA

Active Member
Joined
Apr 16, 2021
Messages
223
Likes
315
The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?
Don't feed the troll that doesn't understand this isn't an ABX test. They are not asking these questions in good faith.
 

PeteL

Major Contributor
Joined
Jun 1, 2020
Messages
3,303
Likes
3,846
Thanks. Now I understand everything.
So you need only 10/10 result when you compare cheap basic and expensive snake-oil cables.
I am no statistician, but it's not correct neither, the goal is not the same, here we wanted to rank speaker from the least preferred to the most preferred. In the other case you are trying to see if they are distinguishable. You would need numbers that are significantly higher than pure luck, according to mathematical probabilities, but not 10 out of 10.
 

ROOSKIE

Major Contributor
Joined
Feb 27, 2020
Messages
1,935
Likes
3,520
Location
Minneapolis
I'd suggest some amount of tricking the listener in the testing.
Sometimes playing the same speaker more than once, sometimes adjusting the same speakers location and replaying.
I'd wager something really interesting would happen.
There is no reason every listener needs to hear each speaker.
Also the aforementioned ABX testing could be really fun. Can the listener determine which speaker is "X", (X is either the 2nd playing of A or B)
That would actually allow you to create a much simpler version of this test and perhaps more accurate in terms of listeners ability to hear variation.
In terms of level matching when I do testing I level match the midrange using pink noise generated by REW that is limited to 500hrz-2500hrz. I record the unweighted SPL and double check that the responses overlay by eye on a MMM RTA measurement.
I find this works well.
Obviously in the end level matching is actually nearly impossible with all the variables in play and choosing how this is done is a choice not an exact science. I doubt anyone has a method that doesn't have some issues. (As far as speaker testing goes)
 
Top Bottom