• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

vole-boy

Member
Forum Donor
Joined
Feb 22, 2020
Messages
23
Likes
67
Lovely to see an attempt at a decent experimental set up. Just a note on the statistics you used - you used ANOVA and I'm not sure that's 100% correct - it'll approximate the correct result but wouldn't stand up to peer review (at least not in my discipline). That's because ANOVAs are parametric tests that make assumptions about the underlying data that aren't actually supported when you use a likert-type preference scale. It appears that the speakers were rated on a 1-10 scale, and so the units are ordinal (in that they are ordered but the size differences between the numbers might vary - for example 2 is larger than 1 and three is larger than 2, BUT 2 is not necessarily twice the size of 1, and 3 is not necessarily three times larger than 1 etc). The correct statistics in this case are ordinal logistic regressions (you can use package ordinal by Christensen in program R for this https://cran.r-project.org/web/packages/ordinal/ordinal.pdf). These can be performed as repeated measures analysis if needed. I work a LOT with survey data in my day job, and this is the correct statistical method for data derived from preference scores. I'll stop being a smart a*se now - thanks again for all the reviewing. Much appreciated!
 

vole-boy

Member
Forum Donor
Joined
Feb 22, 2020
Messages
23
Likes
67
Lovely to see an attempt at a decent experimental set up. Just a note on the statistics you used - you used ANOVA and I'm not sure that's 100% correct - it'll approximate the correct result but wouldn't stand up to peer review (at least not in my discipline). That's because ANOVAs are parametric tests that make assumptions about the underlying data that aren't actually supported when you use a likert-type preference scale. It appears that the speakers were rated on a 1-10 scale, and so the units are ordinal (in that they are ordered but the size differences between the numbers might vary - for example 2 is larger than 1 and three is larger than 2, BUT 2 is not necessarily twice the size of 1, and 3 is not necessarily three times larger than 1 etc). The correct statistics in this case are ordinal logistic regressions (you can use package ordinal by Christensen in program R for this https://cran.r-project.org/web/packages/ordinal/ordinal.pdf). These can be performed as repeated measures analysis if needed. I work a LOT with survey data in my day job, and this is the correct statistical method for data derived from preference scores. I'll stop being a smart a*se now - thanks again for all the reviewing. Much appreciated!
I'm happy to supply some code, or do a bit of analysis if helpful!
 

DJBonoBobo

Major Contributor
Forum Donor
Joined
Jan 21, 2020
Messages
1,391
Likes
2,919
Location
any germ
Very interesting, thanks!

The room is a limiting factor, though. My experience is that my own KH310 sound really bad without room treatment, but the more treatment i use, the more even subtle nuances become important.
You can see that room influences are dominating the FR of every speaker.
So I wonder if the results were clearer and differences between speakers more important in a more treated room? Still an interesting experiment, of course, because all speakers have the same conditions.
 

jae

Major Contributor
Joined
Dec 2, 2019
Messages
1,209
Likes
1,514
2. High pass each speaker (say at 80hz) to eliminate the variable bass output.
Thought this as well. Would also be interested in the average age of participants in each trial, and/or the average age for participants who chose each speaker as their #1.
 

computer-audiophile

Major Contributor
Joined
Dec 12, 2022
Messages
2,565
Likes
2,884
Location
Germany
Very interesting, thanks!

The room is a limiting factor, though. My experience is that my own KH310 sound really bad without room treatment, but the more treatment i use, the more even subtle nuances become important.
You can see that room influences are dominating the FR of every speaker.
So I wonder if the results were clearer and differences between speakers more important in a more treated room? Still an interesting experiment, of course, because all speakers have the same conditions.
In the direct near field, the influence of the room is relatively small. I compared my KH120 and the JBL 305p MkII in the near field, that's what they are both made for.
Thought this as well. Would also be interested in the average age of participants in each trial, and/or the average age for participants who chose each speaker as their #1.
You could also bring in some women. (My wife, who is a trained audiophile, also finds the JBL rather better than the Neumann).
 

Omid

Member
Joined
Nov 8, 2019
Messages
22
Likes
14
You may have already discussed your methodology elsewhere, so sorry if I repeat something you’ve already considered.

For the next test may I suggest having each listener rate the same speaker on 3 different occasions, in blinded fashion? So if you test 5 speakers, you’d run the test 15 times randomly choosing speakers so they each get 3 turns (but the tester isn’t aware whether he has already rated a given speaker).

It makes the test even more tedious, but it allows you to look for internal consistency. If the same listener assigns scores that vary from 4 to 6 for the same speaker, it would put the validity of the test in question. If each tester consistently score the same speaker with same approx score you can feel more confident in the results.

Perhaps this is too impractical…
 

JohnBooty

Addicted to Fun and Learning
Forum Donor
Joined
Jul 24, 2018
Messages
637
Likes
1,595
Location
Philadelphia area
The JBL 305/306 basically snapped me out of this hobby.

Sometimes the treble sounds a little rough to me, maybe, but they are just so correct for so little money. I have a small and weirdly shaped room with a lack of ideal seating positions so their polite off-axis behavior makes them the clear winners in that room for me.

Crossed over to a sub or two, you have an Extremely Correct™ full-range system for well under $1000 and possibly under $500 if you chase sales and don't need the lowest octave.

Listened to some great systems at an audio show in 2019, expecting to rack up a case of upgrade-itis and a big credit card bill. Some systems pulled off impressive tricks that surpassed the JBLs in various ways. But my conclusion was that I'd need a bigger and better listening room to really reap the benefits and that even in a "better" room, the JBLs would still sound better off-axis.

If I ever have a surplus of money I'll think about replacing them with Genelecs that have similarly pristine off-axis behavior. But that would probably be the only contender for me.
 

nowonas

Member
Joined
Jan 24, 2017
Messages
22
Likes
28
Thank you for sharing this, and for the effort of producing a good test!

A couple of things I found interesting :
-Speakers were difficult to separate from each other
-Even with a fairly good frequency response, "They all sound terrible" ( comment at the end )

I know that there are many factors ( like the room) that can contribute to the bad sound. But, personally I would find it very interesting to redo the same test, but with speakers that are not so identical. (small and ported focusing on the relatively good frequency response). It would for instance be very interesting to see how an active DSP controlled, closed box speaker having less group delay and for instance with a perfect step response would compare to the more "traditional" speakers in this test.

This is also a fairly common critique of Floyd Toole's Research which also used very "similar" speakers when developing his preference scale. To me it would be interesting to see anyone addressing that critique and added more modern speaker designs (DSP controlled and closed) in the mix as well, which were not available or common when Floyd Toole did his research.
 

Koeitje

Major Contributor
Joined
Oct 10, 2019
Messages
2,309
Likes
3,976
I know what you mean. And it is indeed the case that the JBL makes a better bass, according to my impression. I see it as an advantage, as I tried to describe in my older post. For example Pinao sounds much more realistic with more 'body' in my impression. After this test here, I like my little JBLs even more. :);)
Low-end extension is very important. So much so that I think that 5" is the absolute minimum for any speaker you want to use without a subwoofer. Below that you simply don't have the extension and SPL.
 

DJBonoBobo

Major Contributor
Forum Donor
Joined
Jan 21, 2020
Messages
1,391
Likes
2,919
Location
any germ
In the direct near field, the influence of the room is relatively small. I compared my KH120 and the JBL 305p MkII in the near field, that's what they are both made for.
You can see the influence of the room in the measurements, for example the peak around 300Hz.
 

Dennis_FL

Addicted to Fun and Learning
Forum Donor
Joined
Feb 21, 2020
Messages
535
Likes
424
Location
Venice, FL
I wonder if we will eventually give ChatGBT a voice?
 

Sokel

Master Contributor
Joined
Sep 8, 2021
Messages
6,234
Likes
6,361
. It would for instance be very interesting to see how an active DSP controlled, closed box speaker having less group delay and for instance with a perfect step response would compare to the more "traditional" speakers in this test.

To me it would be interesting to see anyone addressing that critique and added more modern speaker designs (DSP controlled and closed) in the mix as well, which were not available or common when Floyd Toole did his research.
KH80 was in the test also.It has DSP and everything but room demolish it too.
The interesting think would be the same speaker corrected.
 

Eetu

Addicted to Fun and Learning
Forum Donor
Joined
Mar 11, 2020
Messages
763
Likes
1,182
Location
Helsinki
Good work! Can't wait for you to do more tests :p
an active DSP controlled, closed box speaker
Any examples? D&D, Kii, Buchardt A500 come to mind but not only are they a lot bigger but also significantly more expensive.

The Neumann KH80 is an active DSP design btw.
 

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,240
Location
Manchester UK
High-pass all speakers at say 80~100 Hz, and see how much the low end determines the result
The one thing that pops out from looking at the in-room responses is that the JBL was managing to produce a fair amount of energy at 40Hz whereas the others are well down by that frequency. It might be worth investigating how much of an effect that has on the preference score.

Of course that means even more work, but that's science for you: every well-performed experiment just lands you with more questions to answer.
 

PeteL

Major Contributor
Joined
Jun 1, 2020
Messages
3,303
Likes
3,849
Thanks much, quite an effort Indeed.
I have to admit that I did not watch the whole video yet, so maybe this is answered, but could it be possible to add the testing conditions? Listening distance, SPL, room size and characteristics are the firsts that appear relevant as data.
Thanks.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,492
Likes
2,509
Location
Sweden
Shortly after completing the first blind listening test, @Inverse_Laplace and I started thinking about all the ways we’d like to improve the rigor and explore other questions. Written summary follows, but here is a video if you prefer that medium:

Speakers (preference score in parentheses):

Test Tracks:

  1. Fast Car – Tracy Chapman
  2. Bird on a Wire – Jennifer Warnes
  3. I Can See Clearly Now – Holly Cole
  4. Hunter – Björk
  5. Die Parade der Zinnsoldaten – Leon Jessel (Dallas Wind Symphony)

Unless noted below, we used the same equipment, controls, and procedures as last time, review that post for details.
  • Motorized turntable: 1.75s switch time between any two speakers
  • ITU R 1770 loudness instead of C weighting
  • Significantly larger listening room
  • 5 powered bookshelf/monitors (preference ratings from 2.1 to 6.2)
  • Room measurements of each speaker at multiple listening position
By far the most significant improvement was the motorized turntable. We were able to rotate to any speaker in 1.75 seconds and keep the tweeter in the same location for each speaker. The control board also randomized the speakers for each track automatically and was controllable remotely from an iPad.

View attachment 275371
View attachment 275372


We only had time to conduct the listening test with a small number of people and ended up having to toss out data on three individuals. The test was underpowered. We did not achieve statistical significance (p-value < .05). That said, here are the results we collected:

View attachment 275373

Spinorama of speakers:


View attachment 275374

In-room response plotted against estimated:

View attachment 275375

Our biggest takeaways were:
  • Recruit a larger cohort
  • Schedule on a weekend
  • Well controlled experiments are hard
Some personal thoughts:

Once you get into well-behaving studio monitors, it becomes extremely difficult to tease apart the differences. It takes a lot of listening and tracks that excite small issues in each speaker. A preference score of 4 vs 6 appears to be a significant difference but depending on the nature of the flaws it can be extremely challenging to hear the difference. It is easy to hear that the speakers sound different but picking out the better speaker gets very difficult.

Running a well-controlled experiment is extremely difficult. We had to measure groups on different days and getting the level matching and all the bugs worked out was a challenge. We learned a lot and will apply it to our next set of tests.

Comments from the individual that ran the statistical analysis:
A repeated measures analysis of variance (ANOVA) found no significant difference in sound ratings for the 5 different speaker types, F(4, 16) = 1.68, p = .205, partial eta-squared = .295.

Paired samples t-tests were then run to compare the average sound ratings between each possible pair of speakers. For the most part, speakers showed no significant differences in sound ratings, ps > .12. However, there was a significant difference between sound ratings for the JBL versus EdifierEQ speakers, t(4) = 3.88, p = .018, such that participants reported significantly better sound ratings for the JBL speaker (M = 6.18, SE = 0.31) over the EdifierEQ speaker (M = 5.64, SE = 0.40).

An interesting observation: for one group of listeners, we had to level match the speakers again and in our haste, we used pink noise instead of the actual material. This excites all frequencies equally which isn’t necessarily representative of the musical selections. The Neumann KH80 was a full 3db lower (ITU R 1770) when using the music tracks than most of the other speakers (we measured after the test and we clearly could hear differences in the volume of each speaker.) We threw out this data for our analysis, but the speaker with the lowest level was universally given awful ratings by each listener.

We are looking to conduct another test with a larger group, possibly this spring.
Very nice work.

Would be fun if you had the opportunity to do a binaural recording with in-ear microphones of a test session, and link the file here.
 
Top Bottom