Four Speaker Blind Listening Test Results (Kef, JBL, Revel, OSD)

uwotm8 · Aug 16, 2021

Thanks! Great job done here.
However, I'd change some aspects of test.
First of all, music. Such tracks/generes generally sound good (or acceptable at least) on almost every speaker.

MatthewS said:
Test Tracks:
Fast Car – Tracy Chapman
Just a Little Lovin – Shelby Lynne
Tin Pan Alley – Stevie Ray Vaughan
Morph the Cat – Donald Fagen
Hunter – Björk

The test tracks were all selected from Harman’s list of recommended tracks except for Hunter

Add some metal (or stoner) tracks and enjoy half of speakers (if not all) failing in variety of aspects.

The second: one-speaker listening tests. No offence but IMO it's just not_right. You're totally missing both real spatial image differencens (provided by directivity, reflections, etc; whatever, it just exists and you can't ignore it). At the same time, tonality is affected too (1 speaker vs 2 speakers in real world/room).

Bear123 · Aug 16, 2021

uwotm8 said:
The second: one-speaker listening tests. No offence but IMO it's just not_right.

Do you have any data to back up this opinion? There is a lot of data that shows mono listening accurately predicts listener preference under all conditions(mono and stereo), but reveals preference more easily and accurately. Are you familiar with the research? Just saying, this topic has been covered by folks who spent their life's work researching subjective loudspeaker preference….it may be a bit brazen to say "nuh-uh" based simply on….."I just don't think so"……..

sweetsounds · Aug 16, 2021

Thanks so much for this test and proper execution!
Also the choice of music is right, although I agree with @uwotm8 that I like to have one hard rock piece inside, even if it is AC/DC.

From my experience for BookShelf speakers you will first of all hear the bass extension. Here, the Revel being an in-wall speaker should have advantages, also it apparently was mounted higher making it stand out from the crowd.

I was nevertheless shocked at the results. It means that only trained, musical individuals are able to discern a good speaker from a bad speaker; the OSD has shameful measurements.

My few disagreements with Toole, who nevertheless is the reference in objective assessments:
- all testers are the same: people don't have similar taste (see horns vs. electrostats). People's preference is derived from attention to different elements of music (details vs punch, voice vs drums) with many casual listeners preferring a bath-tub shape of frequency response for punch and detail. Listener will rate a reproduction favorably when they find their preferred element pronounced.

- definition of good reproduction: the highest compliment happened when a guest in the kitchen told me. "I just walked into your living room because I thought someone is playing piano there but it was just the stereo". This realism is what I look after, which not necessarily is the best sound.

- capability to identify good reproduction: perception how music should be reproduced is skewed by the omnipresence of sound through phones, head-phones and car radios, which have defined the norms how reproduced music should sound. Artefacts like compression and loudness become sought features.
One nice quote from one of the Audiophiliac interviews with a sound engineer was, that at concerts musicians asked him "to make me sound on stage like I sound on the recording"

Prime example of preferred reproduction for me are records, which I rarely used and sound odd to me with all their noise floor. Yet, they are the reference for many.
As such, the Harman's usage of their own employees as judges is doubtful: aren't they looking for the house sound in a speaker reproduction?
Is it OK to re-use the same listener specimen over and over again in blind tests to classify results as representative?

- usage of mono only: starting at a certain quality of speakers, I personally can better discern them on their spatial and timing differences, not on the spectral behavior. For many musicians it might be different. So for me stereo is the way to test speakers and the room and position become very important.

Semla · Aug 16, 2021

It's actually much better than it looks like at first sight. You must adjust for repeated measures for listeners though.

A quick and dirty analysis shows that you can find statistically significant differences between the Revel and the JBL (p = 0.0068), Revel and OSD (p = 0.0008) and the KEF and OSD (p = 0.028). p-values from an additive mixed model with speaker and song as covariates, KR degrees of freedom and Bonferroni post-hoc correction.

PeteL · Aug 16, 2021

Bear123 said:
Do you have any data to back up this opinion? There is a lot of data that shows mono listening accurately predicts listener preference under all conditions(mono and stereo), but reveals preference more easily and accurately. Are you familiar with the research? Just saying, this topic has been covered by folks who spent their life's work researching subjective loudspeaker preference….it may be a bit brazen to say "nuh-uh" based simply on….."I just don't think so"……..

And this research is?

magicscreen · Aug 16, 2021

So was this a failed blind test?
I see nobody was able to reach the 10/10 result.
So all speakers sound the same?

PeteL · Aug 16, 2021

magicscreen said:
So was this a failed blind test?
I see nobody was able to reach the 10/10 result.
So all speakers sound the same?

The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?

sweetsounds · Aug 16, 2021

Some analysis of the data. Here are the max, min and average scores for each speaker per listener:

Score by speaker and song w/o the wrong rating listeners:

Quite a spread, but Kef and Revel preferred, but a preferential pattern per listener visible.

uwotm8 · Aug 16, 2021

Bear123 said:
Do you have any data to back up this opinion

My personal listening experience with one and two speakers. It sounds different for both mono and stereo recordings.
How can I evaluate virtual scene/spatial qualities with one speaker, one point (not really but ok) source?
What's more important, in typical real environment tonality may and will change too - with two speakers.

Bear123 said:
Are you familiar with the research? J

Unfortunately yes and I find it rather made-up to justify their vision. Same as Harman target for headphones - it made things better for 1 of 4 headphones I tried it on (peaking AKG 701), others turned dull, bass-heavy and boring. That said, thx Harman, a Samsung company, but I don't like what you're selling me

To be clear: I'm neutral to Harman researches but not to zealots who promote it like it's a Holy Grail and The Only Way of True Sound

Btw as a commercial sound-for-the-masses I find Harman preference target just perfect.

FeddyLost · Aug 16, 2021

uwotm8 said:
Add some metal (or stoner) tracks and enjoy half of speakers (if not all) failing in variety of aspects

For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre.
Both issues are really troublesome.
To discern record effects vs speaker distortion some sympho metal with live orchestra and choir and decent DR might be preferable.

Audioagnostic · Aug 16, 2021

This is very interesting but there are imho some methodological concerns.

There is an incredible spread in scores between the different reviewers. Some reviewers consistently rate lower than others. This could be corrected by normalizing the scores in some way I guess. However, the different " calibration" of the listeners is a problem that makes interpretation difficult.

If you are doing a follow up experiment I suggest to do a triangular test. Play the same track once through 1 speaker and twice through another speaker. Have the listeners decide which is the odd one out. If you take speakers with comparable Klippel preference scores this would allow you to answer the question if these scores are indeed predictive of preference.

dualazmak · Aug 16, 2021

Full orchestra in ppp to fff, some tracks with soprano solo and nice choir; how about this CD?

Schubert "Rosamunde (Complete)", Kurt Masur (conductor), Gewandhausorchester Leipzig, with Elly Ameling (sopraso) and Rundfunkechor Leipzig (Leipzig Radio Chorus), Philips ASIN: B00000E2SS;

I always use this CD as a reference of full orchestra sound with soprano solo and chorus.

This album was recorded in December 1983 with really amazing recording quality. The first track "Overture" is pp to fff full orchestra sound with very nice 3D perspectives, the sound is really amazing; I feel as if I am sitting on the best seat in the Neues Gevanndhouse Leipzig. Among the very comfortable silky but vivid and gentle orchestra sound, we can identify each individual violinist in the full orchestra. Really excellent sound quality would be given the well tuned audio system, and again amazing recording quality.

Elly Ameling sang only in track-5 "Romance" for just 3 min 47 sec and her posture and beloved voice were best seen and heard...

The famous ppp orchestra piece of track-7 "Ent'acte to Scene 3" is always a great challenge to audio system for orchestra string sound in good S/N without distortion.

Track-9 "Chorus of Shepherds" is another challenge to audio system for the balance of 4-part chorus and orchestra in the nice acoustic hall. In the middle of the track, each of the solo singer from soprano, alto, tenor and bass sing in the center-back of the stage, and we need excellent 3D perspectives and sound resolution for very much impressive listening experience.

Semla · Aug 16, 2021

FeddyLost said:
For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre.
Both issues are really troublesome.
To discern record effects vs speaker distortion some sympho metal with live orchestra and choir and decent DR might be preferable.

I checked it for this particular experiment.

There is no difference between songs; and none of the speakers are particularly better for any specific song (i.e., neither the main effect nor the interaction term for song was significant). You could say that at least for this experiment there is no effect for genre.

Semla · Aug 16, 2021

Audioagnostic said:
There is an incredible spread in scores between the different reviewers. Some reviewers consistently rate lower than others. This could be corrected by normalizing the scores in some way I guess. However, the different " calibration" of the listeners is a problem that makes interpretation difficult.

That's why you need to adjust for repeated measures (technically speaking, include listener as a random intercept in the statistical model). There is a difference between listeners, but they are quite consistent in their own ratings.

beeface · Aug 16, 2021

just want to say that Hunter is a great track choice

uwotm8 · Aug 16, 2021

FeddyLost said:
For proper evaluation of any metal you'll need 1) good recordings and 2) listeners, familiar with this genre

Not really. If there's some MF-HF coloration you'll note it immediately and it sometimes turns out just unacceptable while you still can enjoy non-heavy genres on that speakers. The closer we come to pink noise in terms of spectre saturation the harder it gets

magicscreen · Aug 16, 2021

PeteL said:
The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?

Thanks. Now I understand everything.
So you need only 10/10 result when you compare cheap basic and expensive snake-oil cables.

beaRA · Aug 16, 2021

PeteL said:
The answer is no, but did you read the article? The goal was to give a note to each speaker from 1 to 10.. Why would having a 10/10 result mean anything?

Don't feed the troll that doesn't understand this isn't an ABX test. They are not asking these questions in good faith.

PeteL · Aug 16, 2021

magicscreen said:
Thanks. Now I understand everything.
So you need only 10/10 result when you compare cheap basic and expensive snake-oil cables.

I am no statistician, but it's not correct neither, the goal is not the same, here we wanted to rank speaker from the least preferred to the most preferred. In the other case you are trying to see if they are distinguishable. You would need numbers that are significantly higher than pure luck, according to mathematical probabilities, but not 10 out of 10.

ROOSKIE · Aug 16, 2021

I'd suggest some amount of tricking the listener in the testing.
Sometimes playing the same speaker more than once, sometimes adjusting the same speakers location and replaying.
I'd wager something really interesting would happen.
There is no reason every listener needs to hear each speaker.
Also the aforementioned ABX testing could be really fun. Can the listener determine which speaker is "X", (X is either the 2nd playing of A or B)
That would actually allow you to create a much simpler version of this test and perhaps more accurate in terms of listeners ability to hear variation.
In terms of level matching when I do testing I level match the midrange using pink noise generated by REW that is limited to 500hrz-2500hrz. I record the unweighted SPL and double check that the responses overlay by eye on a MMM RTA measurement.
I find this works well.
Obviously in the end level matching is actually nearly impossible with all the variables in play and choosing how this is done is a choice not an exact science. I doubt anyone has a method that doesn't have some issues. (As far as speaker testing goes)

Four Speaker Blind Listening Test Results (Kef, JBL, Revel, OSD)

Senior Member

Addicted to Fun and Learning

Active Member

Active Member

Major Contributor

Senior Member

Major Contributor

Active Member

Senior Member

Addicted to Fun and Learning

Member

Major Contributor

Active Member

Active Member

Senior Member

Senior Member

Senior Member

Active Member

Major Contributor

Major Contributor

Similar threads