• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Well for one thing, the 24bit/96kHz apparatus is spatially ignorant but your ears are not.
That doesn't apply to resonances originating in the speakerbox.
Normalizing them to 1 kHz would also give the JBL ... That said, if music content had a lot of 50 Hz energy, you could also probably pick the JBL as better.
Energy not. Do you have some daily experience with the said speaker model? If not, the spinorama might ask you for a decision. Either the spinorama is valid, and the test here was performed incorrectly, or vice versa. Levelmatching? Just have a short glance at the 1,8kHz resonance. Look further into the sagging mids, the up-step in tweeter response, let alone the harmonic distortion and intermodulation.
 
How on earth could the ear detect resonances, while a 24bit/96KHz apparatus cannot?

Do we need some new method to evaluate the sound detected by a microphone, replicating the data compilation done by the ear? No more Fourier, but bands and all?
Just a quick response: We have two ears, not one, and it is the binaural processing in the brain that allows listeners to discriminate among sounds arriving at different times from different directions in the listening room. An omni microphone simply sums them all together. A "dummy head" microphone is no help because the "dummy" lacks a brain to process the two sounds, and a sufficiently good model of binaural processing does not exist.

That is why no amount of processing of 24 bit/96kHz (or any) data measured in a room can replicate human perception, and that is what is relevant because we listen in rooms. However, comprehensive 360 deg. anechoic data that is post processed to allow us to anticipate the direct and reflected sounds arriving at a listener in a room turns out to be very highly correlated to subjective ratings of sound quality perceived in normally reflective rooms. At present the spinorma is the best guidance we have.

The spinorma also allows us to separate evidence of acoustical interference (usually relatively innocent) from evidence of resonances (which are annoying). The "secret" is in spatial averaging of multiple curves.

It's in "the book".
 
[replying to Dr Toole’s post] Thank you so much for coming back to this. At least it clarified for me, that e/q for the PIR or the LW ain't that promising. The reasoning behind the listeners' preferrence appears to me personally a bit far fetched. How on earth could the ear detect resonances, while a 24bit/96KHz apparatus cannot?
I am bemused by the take-home points you got from Dr Toole’s post.

Firstly, Dr Toole’s post explicitly states that what you call “a 24bit/96kHz apparatus” can detect resonances, and I quote, “Resonances are easily identified in comprehensive anechoic measurements”.

Secondly, the new learning for you personally, that I think you would have done better to focus on in Dr Toole’s post, was the paragraph beginning with “The second surprise was that the choice of program material mattered much less than was anticipated.” This contradicts your earlier insistence that listeners can’t evaluate speakers because of an ‘information gap’ where the listener doesn’t know how good the recording is, so only the sound engineer who made the recording could really evaluate speakers by listening to that recording.

An information gap which, by the way, I tried to point out earlier, has been tested and is not the insurmountable barrier you think. Which you immediately replied with the suggestion that it would take magic. And demanded references. Notwithstanding the fact that you have Toole’s book “under your pillow” and claim to have read it, so you already have to hand all the references you need.
 
I am bemused by the take-home points you got from Dr Toole’s post.
You're welcome!
Just a quick response: ...
I didn't expect a reply, but thanks again. The "why" regarding tonal balance, that it can be detected listening to arbitrary (?) program material was in parts (?) explained with the ability of human hearing to find resonances in the speakerbox itself. As you exemplified already the detection of resonances needs a wide time window, presumably because otherwise the frequency resolution won't suffice. So far we are confronted with mathematical rules of our measuring procedures. The two ears thing won't help too much with is, I assume.

By what algorithm is the ear enabled to get beyond the limitations of Fourier analysis? Regarding a resonance originating (sitting) in a single loudspeakerbox.

First the human tested can't identify the resonance at an instance. It takes time listening to flowing variations of the signal. And so forth ... it is not my field, obviously.

Second, I proposed to make an initially good speaker bad with setting some resonant filters. I did that involuntarily, which anecdotally confirmed the case. Is it possible to make one speaker sound, in the direct field, as bad as another with replicating the resonances as they are displayed by the spinorama and evaluated for the ranking? Just for a cross-check.

May I remind us all, that the OP admitted the insufficient test procedures in this case? In my "book" it is a polite way to show interest and respect when I'm truely looking for what comes next. Looking forward, if you will.
 
First the human tested can't identify the resonance at an instance. It takes time listening to flowing variations of the signal. And so forth ... it is not my field, obviously.
In a multiple-loudspeaker double-blind comparison experienced listeners find the dominant resonances remarkably quickly. The borderline ones take longer to find in ever changing program. If you read the resonance papers you will find that adding resonances to good loudspeakers was how the detection thresholds were determined. Reflections and reverberation increase our sensitivity to resonances - they are repetitions, giving the listener multiple "views". In recordings adding a bit of "reverb" enriches the sound.
 
In a multiple-loudspeaker double-blind comparison experienced listeners find the dominant resonances remarkably quickly. The borderline ones take longer to find in ever changing program. If you read the resonance papers you will find that adding resonances to good loudspeakers was how the detection thresholds were determined. Reflections and reverberation increase our sensitivity to resonances - they are repetitions, giving the listener multiple "views". In recordings adding a bit of "reverb" enriches the sound.
O/k, finding patterns. Not impossible today, me thinks. But not yet done, the modeling.
 
That doesn't apply to resonances originating in the speakerbox.
You'll need to clarify that, because as I read it, you are saying: "our ears can localize any sound except for a speaker box resonance."
 
... - they are repetitions, giving the listener multiple "views". In recordings adding a bit of "reverb" enriches the sound.
You'll need to clarify that, because as I read it, you are saying: "our ears can localize any sound except for a speaker box resonance."
So, you reminded me of Dr. Griesinger. As a former DIYer I have tons of spare parts. Let's see if I make a 5-2-5 system. I'm a represenative of "do not obey the platonic single seated stereo command" camp.

Thanks!
 
That doesn't apply to resonances originating in the speakerbox.

Energy not. Do you have some daily experience with the said speaker model? If not, the spinorama might ask you for a decision. Either the spinorama is valid, and the test here was performed incorrectly, or vice versa. Levelmatching? Just have a short glance at the 1,8kHz resonance. Look further into the sagging mids, the up-step in tweeter response, let alone the harmonic distortion and intermodulation.
Not sure what you are after. I said the same thing with other words.
 
Hi,

First of all thanks a lot to this very interesting comparison beween speakers.

It may have already been reported, but I wonder if there is not a mistake in the first post about the ranking of the speakers. In #1 we can read:
Speakers (preference score in parentheses):
Neuman KH80 (6.2)
JBL 305P Mark II (5.2)
RCF Arya Pro5 (3.9)
Edifier R1280T (2.1)
Edifier R1280T w/ EQ (4.7)

But from the video, it seems that JBL 305P Mark II comes first, then Neuman KH80. If confirmed, I don't know if it is possible to correct.

Best regqards,

JMF
 
Hi,

First of all thanks a lot to this very interesting comparison beween speakers.

It may have already been reported, but I wonder if there is not a mistake in the first post about the ranking of the speakers. In #1 we can read:
Speakers (preference score in parentheses):
Neuman KH80 (6.2)
JBL 305P Mark II (5.2)
RCF Arya Pro5 (3.9)
Edifier R1280T (2.1)
Edifier R1280T w/ EQ (4.7)

But from the video, it seems that JBL 305P Mark II comes first, then Neuman KH80. If confirmed, I don't know if it is possible to correct.

Best regqards,

JMF
It is correct. No mistake here. There are two different ways of „classifying“ speakers:

A) Preference scores are calculated values based on an algorithm which „only“ takes the measured characteristics of the speaker (spinorama) as input.

B) In comparison the listening tests produced the preference (score) of this particular group. And yes the JBL is slightly ahead. However as pointed out by the OP, there is not enough data, plus the Neumann and the JBL are so close to one another that it is easily within the margin of error.

The „only“, but still very valuable conclusion you can take from the test is that the speakers with the highest (calculated) preference values also came out on top. And that alone is a remarkable achievement and confirmation.
 
Last edited:
The Neumann starts rolling off at 70hz. The JBL starts rolling off at about 55hz. As @Floyd Toole, has mentioned earlier, bass accounts for about 30% of the preference rating indicated by listeners.
That was my criterion why I switched from KH120a to JBL 305p II. I could compare both in peace over months at home. The KH120 go a little deeper than the KH80 but the JBL provide a even fuller and more complete sound than the KH120 at me. They replaced the KH120 at a much lower price. It may be hard to believe, but I came to this conclusion even though I have an affinity for good studio equipment and don't need to be so frugal.
 
Last edited:
Could have added Adam Audio T5V or Sony SSCS5. But testing speakers in mono? Ew. What a horrible way to listen to music.
Ironically I've looked at those Edifier bluetooth based on price and reviews, but don't trust Amazon reviews. They give all audio equipment 4.5 stars. It doesn't mean anything. They do look cool though. Years ago I use to use Roland MA line with the computer, but they definitely aren't reference speakers. It might actually be interesting to see how users would respond to a blind test of inexpensive Mackie, Presonus and M-Audio. All ok. But hardly Adam.
 
Last edited by a moderator:
But testing speakers in mono? Ew. What a horrible way to listen to music.
It’s a way to test speakers, not a way to listen to music. That should have been obvious.
 
Last edited:
That was my criterion why I switched from KH120a to JBL 305p II. I could compare both in peace over months at home. The KH120 go a little deeper than the KH80 but the JBL provide a even fuller and more complete sound than the KH120 at me. They replaced the KH120 at a much lower price. It may be hard to believe, but I came to this conclusion even though I have an affinity for good studio equipment and don't need to be so frugal.
for years (decades actually) - when friends and family have asked me about 'speakers' for their home - I've always made a point to never recommend studio monitors - ever - for precisely the same reason (I'm guessing) that you changed from the k+h120 to the jbl305p...
 
Shortly after completing the first blind listening test, @Inverse_Laplace and I started thinking about all the ways we’d like to improve the rigor and explore other questions. Written summary follows, but here is a video if you prefer that medium:

Speakers (calculated preference score in parentheses):

Test Tracks:

  1. Fast Car – Tracy Chapman
  2. Bird on a Wire – Jennifer Warnes
  3. I Can See Clearly Now – Holly Cole
  4. Hunter – Björk
  5. Die Parade der Zinnsoldaten – Leon Jessel (Dallas Wind Symphony)

Unless noted below, we used the same equipment, controls, and procedures as last time, review that post for details.
  • Motorized turntable: 1.75s switch time between any two speakers
  • ITU R 1770 loudness instead of C weighting
  • Significantly larger listening room
  • 5 powered bookshelf/monitors (preference ratings from 2.1 to 6.2)
  • Room measurements of each speaker at multiple listening position
By far the most significant improvement was the motorized turntable. We were able to rotate to any speaker in 1.75 seconds and keep the tweeter in the same location for each speaker. The control board also randomized the speakers for each track automatically and was controllable remotely from an iPad.

View attachment 275371
View attachment 275372


We only had time to conduct the listening test with a small number of people and ended up having to toss out data on three individuals. The test was underpowered. We did not achieve statistical significance (p-value < .05). That said, here are the results we collected:

View attachment 275373

Spinorama of speakers:


View attachment 275374

In-room response plotted against estimated:

View attachment 275375

Our biggest takeaways were:
  • Recruit a larger cohort
  • Schedule on a weekend
  • Well controlled experiments are hard
Some personal thoughts:

Once you get into well-behaving studio monitors, it becomes extremely difficult to tease apart the differences. It takes a lot of listening and tracks that excite small issues in each speaker. A preference score of 4 vs 6 appears to be a significant difference but depending on the nature of the flaws it can be extremely challenging to hear the difference. It is easy to hear that the speakers sound different but picking out the better speaker gets very difficult.

Running a well-controlled experiment is extremely difficult. We had to measure groups on different days and getting the level matching and all the bugs worked out was a challenge. We learned a lot and will apply it to our next set of tests.

Comments from the individual that ran the statistical analysis:
A repeated measures analysis of variance (ANOVA) found no significant difference in sound ratings for the 5 different speaker types, F(4, 16) = 1.68, p = .205, partial eta-squared = .295.

Paired samples t-tests were then run to compare the average sound ratings between each possible pair of speakers. For the most part, speakers showed no significant differences in sound ratings, ps > .12. However, there was a significant difference between sound ratings for the JBL versus EdifierEQ speakers, t(4) = 3.88, p = .018, such that participants reported significantly better sound ratings for the JBL speaker (M = 6.18, SE = 0.31) over the EdifierEQ speaker (M = 5.64, SE = 0.40).

An interesting observation: for one group of listeners, we had to level match the speakers again and in our haste, we used pink noise instead of the actual material. This excites all frequencies equally which isn’t necessarily representative of the musical selections. The Neumann KH80 was a full 3db lower (ITU R 1770) when using the music tracks than most of the other speakers (we measured after the test and we clearly could hear differences in the volume of each speaker.) We threw out this data for our analysis, but the speaker with the lowest level was universally given awful ratings by each listener.

We are looking to conduct another test with a larger group, possibly this spring.

EDIT:

REW In-Room Measurements
Attached raw data of listener preference for anyone that wants to look at it.
Amazing job! Do more!!!
 
for years (decades actually) - when friends and family have asked me about 'speakers' for their home - I've always made a point to never recommend studio monitors - ever - for precisely the same reason (I'm guessing) that you changed from the k+h120 to the jbl305p...
Why?
 
Back
Top Bottom