Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Scott Lanterman · Mar 28, 2023

MatthewS said:
MatthewS said:

I didn't explain this in the video--but it was actually the musical selection that sounded terrible. We wanted something classical and this piece was on Harman's list. But it wasn't recorded well and was extremely hard to pick out differences. It was more the recording sounded terrible, not the speakers.

The Neumann on "I Can See Clearly Now" was mindbogglingly good.

Click to expand...

Excellent! This is what I was thinking ASR is all about. I wish that ASR could do something like at AXPONA. Would any manufacturer be brave enough to put their speakers on the stand? I would love to see Wilson, Kef, Genelec, JBL, Neuman and others go into blind competitions so we can bring some reality to the audio scene. I suspect that we might see something like a medium getting busted when people turn the lights on at a seance. We all like head to head tests right, we watch Olympics for just this.

For near field tests it would be interesting to see if a dual concentric ranked higher since lobing effects may be more of a near field thing.

I hope to see more of this and the test set up perfected and standardized.

Bravo!

VintageFlanker · Mar 28, 2023

MatthewS said:

Tremendous work and good production for a first video about Audio. Subscribed. Wanna see more !

uwotm8 · Mar 28, 2023

Wait.
Am I the only who sees KEF LSX on the photo?
...
But what's their score then?

BDWoody · Mar 28, 2023

D700 said:
You tested studio quality speakers without Mark Knopfler and Steely Dan? Is that legal? I'm pretty sure that's on the warning label...

That's what I've always understood. The story of Cousin Dupree is often told around these parts, and is generally the first in the queue when I'm hooking up new gear. Knopfler is also definitely in that lineup.

Tangband · Mar 28, 2023

Interesting !
Just for the record , the jbl 305 is much worse sounding than Genelec 8330 ( I have had them both at the same time doing measurements ) , so you cant expect a very cheap speaker to be very good .

And - in normal cases , you always listen to two speakers in stereo, making those quality differences more ” different ” in some ways, than listening with only one speaker.

Talisman · Mar 28, 2023

I think the most important and most obvious fact to extrapolate from this test is the devastating influence of the room. Just look at the responses in the room to notice that they practically have a consistency and specific response patterns that flatten out any small differences between monitors. It really doesn't make sense to pay thousands more for a genelec or neumann for their incredible flat response when the room then totally devastates this response and nullifies any small advantage.
Even before the speakers, the first audio component to improve is the listening room.
The upside is that even a cheap but decent speaker can sound very good in a treated environment.

HarmonicTHD · Mar 28, 2023

Wow. Very nice.

Great to see despite the statistical limitations you pointed out that there seems to be a trend of speakers with high preference values also being among the top in your test (of course to be further confirmed).

Maybe you can use the exact tracks which Harman used to even exclude this variable as well from the tests. The majority of the tracks are part of the Harman study anyhow, but I noticed a few deviations eg Björk. Or was there a reason to deviate?

Thread 'Critical (Best) Music Tracks for Speaker and Room EQ Testing'
https://audiosciencereview.com/foru...sic-tracks-for-speaker-and-room-eq-testing.6/

uwotm8 · Mar 28, 2023

Tangband said:
in normal cases , you always listen to two speakers in stereo, making those quality differences more ” different ” in some ways, than listening with only one speaker

There's a risk of being hunted down and burned as a witch after saying that here

But I absolutely agree of course

MatthewS · Mar 28, 2023

uwotm8 said:
Wait.
Am I the only who sees KEF LSX on the photo?
...
But what's their score then?

Yes, the LSX was supposed to be in the test but we found out my LSX had a faulty analog input and we couldn’t use it. That photo was from the start. We tried using a digital to analog converter to keep it in but it had insane distortion and was very audible. I measured it with my Moto M2 and it would probably be the lowest performing device measured to date in ASR.

The JBL took its place.

ROOSKIE · Mar 28, 2023

Talisman said:
I think the most important and most obvious fact to extrapolate from this test is the devastating influence of the room. Just look at the responses in the room to notice that they practically have a consistency and specific response patterns that flatten out any small differences between monitors. It really doesn't make sense to pay thousands more for a genelec or neumann for their incredible flat response when the room then totally devastates this response and nullifies any small advantage.
Even before the speakers, the first audio component to improve is the listening room.
The upside is that even a cheap but decent speaker can sound very good in a treated environment.

You don't 'hear' the steady state In room responce.
If you did making speakers would be a cake walk.
Read Toole's book. 75% of it is dedicated to explaining in various ways with massive amounts of detail data that you do not hear that way.
He even states something along the lines of it being foolish to think we do.

Blumlein 88 · Mar 28, 2023

Tangband said:
Interesting !
Just for the record , the jbl 305 is much worse sounding than Genelec 8330 ( I have had them both at the same time doing measurements ) , so you cant expect a very cheap speaker to be very good .

And - in normal cases , you always listen to two speakers in stereo, making those quality differences more ” different ” in some ways, than listening with only one speaker.

Of course the available research indicates the intuitive idea listening in stereo is more revealing to be false. I understand the difficulty accepting this idea. It appears single speaker evaluation is more discriminating.

Talisman · Mar 28, 2023

ROOSKIE said:
You don't 'hear' the steady state In room responce.
If you did making speakers would be a cake walk.
Read Toole's book. 75% of it is dedicated to explaining in various ways with massive amounts of detail data that you do not hear that way.
He even states something along the lines of it being foolish to think we do.

Of course not, as you recognize your wife's voice whether she's speaking in the living room or in the bathroom, where the two responses from the room are totally different.
The point is not to recognize or not a timbre, the point is that there is no point in looking at all costs for a perfect flat answer if your room will devastate this answer.
When the room shuffles the cards on the table, making one speaker or another more pleasant is just randomness.

MatthewS · Mar 28, 2023

Talisman said:
Of course not, as you recognize your wife's voice whether she's speaking in the living room or in the bathroom, where the two responses from the room are totally different.
The point is not to recognize or not a timbre, the point is that there is no point in looking at all costs for a perfect flat answer if your room will devastate this answer.
When the room shuffles the cards on the table, making one speaker or another more pleasant is just randomness.

This is not at all the conclusion anyone that participated would draw. The Edifier without EQ in most material stood out as significantly worse. The room dominates the bass response and the best fix for that is multiple subwoofers.

The room acts on the speakers in the same way. A better speaker is still a better speaker--why not start from something good. Could a room mode possibly repair a deficiency in a particular speaker? Yes, I guess it could--but above the transition frequency it is going to have a lot less impact.

Even in spite of the room modes, the listeners mentioned better bass on the JBL on their notes. They could hear that extra reach even if that measurement looks ugly.

I'll post a psychoanalytically smoothed image of the room modes when I have a minute. I added the raw data from REW.

Talisman · Mar 28, 2023

MatthewS said:
This is not at all the conclusion anyone that participated would draw. The Edifier without EQ in most material stood out as significantly worse. The room mess with bass response and the best fix for that is multiple subwoofers.

The room acts on the speakers in the same way. A better speaker is still a better speaker--why not start from something good. Could a room mode possibly repair a deficiency in a particular speaker? Yes, I guess it could--but above the transition frequency it is going to have a lot less impact.

Even in spite of the room modes, the listeners mentioned better bass on the JBL on their notes. They could hear that extra reach even if that measurement looks ugly.

I'll post a psychoanalytically smoothed image of the room modes when I have a minute. I will also post the raw data from REW.

In fact you are quoting the extremes, and many have suggested a high pass filter to take bass out of the equation, bass, no matter the room response the more bass present is clearly audible and affects the taste.
the edifier was obviously the busiest speaker of the bunch, and in a context of random response it still had an overall messier response.
In the worst room in the world you'll always prefer a genelec to a radio, but when the differences get smaller it becomes more complicated.

Now, if purely hypothetically, you could hear the edifier in a perfectly acoustically treated room, and immediately blind the neumann with the test room response, would you still be certain that your preference would be for the neumann?

And I ask you one more question, in an over treated room, with excessive absorption, almost anechoic, would you prefer a neutral speaker or one heavily pumped in the high frequencies, or in low frequencies?

Spocko · Mar 28, 2023

MatthewS said:
Shortly after completing the first blind listening test, @Inverse_Laplace and I started thinking about all the ways we’d like to improve the rigor and explore other questions. Written summary follows, but here is a video if you prefer that medium:

Speakers (preference score in parentheses):

Neuman KH80 (6.2)

JBL 305P Mark II (5.2)

RCF Arya Pro5 (3.9)

Edifier R1280T (2.1)

Edifier R1280T w/ EQ (4.7)

Test Tracks:

Fast Car – Tracy Chapman

Bird on a Wire – Jennifer Warnes

I Can See Clearly Now – Holly Cole

Hunter – Björk

Die Parade der Zinnsoldaten – Leon Jessel (Dallas Wind Symphony)

Unless noted below, we used the same equipment, controls, and procedures as last time, review that post for details.

Motorized turntable: 1.75s switch time between any two speakers

ITU R 1770 loudness instead of C weighting

Significantly larger listening room

5 powered bookshelf/monitors (preference ratings from 2.1 to 6.2)

Room measurements of each speaker at multiple listening position

By far the most significant improvement was the motorized turntable. We were able to rotate to any speaker in 1.75 seconds and keep the tweeter in the same location for each speaker. The control board also randomized the speakers for each track automatically and was controllable remotely from an iPad.

View attachment 275371
View attachment 275372

We only had time to conduct the listening test with a small number of people and ended up having to toss out data on three individuals. The test was underpowered. We did not achieve statistical significance (p-value < .05). That said, here are the results we collected:

View attachment 275373

Spinorama of speakers:

View attachment 275374

In-room response plotted against estimated:
View attachment 275375

Our biggest takeaways were:

Recruit a larger cohort

Schedule on a weekend

Well controlled experiments are hard

Some personal thoughts:

Once you get into well-behaving studio monitors, it becomes extremely difficult to tease apart the differences. It takes a lot of listening and tracks that excite small issues in each speaker. A preference score of 4 vs 6 appears to be a significant difference but depending on the nature of the flaws it can be extremely challenging to hear the difference. It is easy to hear that the speakers sound different but picking out the better speaker gets very difficult.

Running a well-controlled experiment is extremely difficult. We had to measure groups on different days and getting the level matching and all the bugs worked out was a challenge. We learned a lot and will apply it to our next set of tests.

Comments from the individual that ran the statistical analysis:
A repeated measures analysis of variance (ANOVA) found no significant difference in sound ratings for the 5 different speaker types, F(4, 16) = 1.68, p = .205, partial eta-squared = .295.

Paired samples t-tests were then run to compare the average sound ratings between each possible pair of speakers. For the most part, speakers showed no significant differences in sound ratings, ps > .12. However, there was a significant difference between sound ratings for the JBL versus EdifierEQ speakers, t(4) = 3.88, p = .018, such that participants reported significantly better sound ratings for the JBL speaker (M = 6.18, SE = 0.31) over the EdifierEQ speaker (M = 5.64, SE = 0.40).

An interesting observation: for one group of listeners, we had to level match the speakers again and in our haste, we used pink noise instead of the actual material. This excites all frequencies equally which isn’t necessarily representative of the musical selections. The Neumann KH80 was a full 3db lower (ITU R 1770) when using the music tracks than most of the other speakers (we measured after the test and we clearly could hear differences in the volume of each speaker.) We threw out this data for our analysis, but the speaker with the lowest level was universally given awful ratings by each listener.

We are looking to conduct another test with a larger group, possibly this spring.

Amazing and THANK YOU, a few of my thoughts on scoring listening preferences in general:

"It is easy to hear that the speakers sound different but picking out the better speaker gets very difficult." You raised an excellent point with many nuanced issues!
- If the source was mastered in a way that does not appeal to the listener, a less accurate speaker that ends up "fixing" the source through happenstance will be scored as an improvement.
- What were your directions to the judges when scoring to determine the "better" speaker? Subjective preference between speakers or reference to memory of how the source "should sound"?
Bass of JBL looks to be near flat down to 50Hz, whereas all other speakers started falling off between 60Hz and 70Hz, do you think that extra mid-bass power made a difference in sound quality, so maybe this alone would nudge the JBL ahead, all else being near equal.
- Would you have to cut off frequency response of the source material to take this into account otherwise you end up benefitting the speaker that has the best bass frequencies?

jensgk · Mar 28, 2023

Great test!
I would love to see a graph with the measured in-room response for each speaker in the same graph.
It would give an idea about:
* how similar they are or how they diverge from each other.
* if the volume level is mostly the same.
* how much influence the room has.

Geert · Mar 28, 2023

Talisman said:
I think the most important and most obvious fact to extrapolate from this test is the devastating influence of the room. Just look at the responses in the room to notice that they practically have a consistency and specific response patterns that flatten out any small differences between monitors. It really doesn't make sense to pay thousands more for a genelec or neumann for their incredible flat response when the room then totally devastates this response and nullifies any small advantage.

Strange conclusion, as the results show the speakers with the flattest anechoic response score best. Above the Schroeder frequency I also don't see the difference between the speakers being flattened out in the measurements.

AdamG · Mar 28, 2023

A most excellent effort and adhesion to the science of blind testing. Interesting how close the scores were. I expect that a repeat of this test, but sighted would result in completely different scores. Not only are there Charlatans out there trying to fool us, but we are constantly fooling ourselves with all sorts of biases.

Bravo Zulu to you and your colleagues for the effort and sharing your experience with us. And thanks to Amir for providing his support to get you the speakers to test. A secondary use of donated tested products is fantastic. Hopefully this is a new emerging trend.

Talisman · Mar 28, 2023

Geert said:
Strange conclusion, as the results show the speakers with the flattest anechoic response score best. Above the Schroeder frequency I also don't see the difference between the speakers being flattened out in the measurements.

The point is, you're rolling the dice, statistically a speaker with flatter response, in a room with random clutter, will maintain better response, but precisely, statistically, because specific interactions from a different room can skew these conclusions.
I'm not saying the neumann can't turn out better even in an untreated room, I'm saying that in such a level of clutter (comparing klippel response vs room response) it's hard to justify hundreds or thousands of euros more for a marginally flatter. Better results are obtained with a less accurate diffuser but an environmental treatment. And I don't see how that surprises you

Floyd Toole · Mar 28, 2023

Just plain brilliant, you guys! Of course I am flattered to be credited with being a motivating factor, but mostly I am greatly encouraged that there are people willing to engage what is not a simple task - and succeed. Compared to the 99.9% of "listening tests" and "reviews" that are done without even the most basic controls in place this is a masterpiece. I see in the discussion some of the anticipated complaints that it was not done in stereo - I address this in my papers and books, but will include a different version below. Because the authors started with accurate measurements - something truly rare in the history of audio evaluations - they saw that one major factor in preferences was bass extension. This was revealed in my very early evaluations, published in JAES way back in 1986. More recently Sean Olives correlations revealed that bass alone accounts for about 30% of overall sound quality ratings. Woofer design determines bass extension, and room acoustics dominate bass sound quality, so when evaluating loudspeakers with limited bandwidth, which varies, bass alone is more influential than if all systems being compared were "full bandwidth", had subwoofers, or as has been suggested, high-pass filtered to limit all to the same bass extension. Decisions, decisions . . . Anyone looking for great satisfaction from any of these small speakers would be advised to add subwoofer(s), preferably with bass management (which includes high-pass filtering allowing them to play louder).

Beyond bass, the Neumann clearly shows where its higher price went - a significantly smooth frequency response. The inexpensive JBL is compromised, but the residual irregularities and resonances are at or below the threshold of detection. Broadband trends are quite often quickly adapted to as they are not perceived as timbral colorations. This is the essence of good engineering, balancing the cost-determining factors to achieve maximum listener satisfaction at any price level. Pay more, get more, but only if the bass extension is sufficient. Properly integrated subwoofers are the great equalizers among the vast majority of loudspeakers. The next and final step is to tame the room resonances - a totally different topic.

Here is a little dissertation on stereo vs. mono. Sorry if it repeat things I have posted elsewhere.

Adding levels of complexity to the decision process is the awkward fact that stereo - two channel record/reproduction through loudspeakers - is fundamentally flawed. The default format for music, stereo, is not capable of delivering the sound from loudspeakers to our ears without significantly modifying the recorded waveforms. Amplitude and phase responses (the impulse response) are corrupted for the sounds generating all sound images appearing between the loudspeakers on the soundstage. Notions of “purity" in stereo listening are fanciful.

In "live" experiences there is only one direct sound arriving at each ear from a single sound source. In stereo all sound images between the loudspeakers are phantoms, created from identical sounds radiated by each loudspeaker – double-mono - with inter-channel amplitude or time differences to provide location cues for image position. Each ear receives two versions of the same sound separated by a delay and modified by head diffraction. The only exceptions are the hard-panned sounds emerging from the left and right of the soundstage; these are monophonic components and are timbrally and spatially distinctive. So stereo listening is a hybrid experience, partly mono but mostly double-mono with the inherent corruptions.

The panned-image “soundstage” is the dominant factor, and whether the “panning” is done with the common interchannel amplitude-difference pan pots (so-called multichannel mono), or by amplitude and/or time differences generated by the microphone arrays (the so-called ‘purist’ approach) the result is the same. Two time-separated sounds arriving in each ear generate acoustical interference, resulting in an audible dip around 2 kHz (enough to degrade speech intelligibility for the center image - usually the featured artist). Any notions of pristine waveforms, impulse response, amplitude and phase response in the direct sounds arriving at the ears can only exist for hard-panned mono left and right images. The inherent sound quality of the loudspeakers has been degraded for all “soundstage” images including the featured artist. Timbral perfection has been rendered impossible. But, is it good enough? Obviously, yes, because we have derived enormous pleasure from stereo reproduction for decades.

In addition, all direct sounds arrive from about +/- 30 deg. which provides HRTF characterization for the wrong incident angle - generating an unavoidable timbral error as well as possible localization confusion for familiar sounds. Put it all together and it is clear that the human brain has subconsciously adapted to accept multiple acoustic and psychoacoustic errors that exist only because of stereo reproduction. “Perfect” loudspeakers and electronics cannot remove them.

As has been well publicized, listeners are far more critical in their assessments of sound quality when listening to a single loudspeaker. This fact has generated a fairly constant stream of flak from those thinking that stereo is a more rigorous test and that listening in mono was all but irrelevant. However, decades of double-blind tests show that listeners hear problems in loudspeakers more readily when listening in mono. The superior sound quality was less clearly reflected in scores in stereo comparison tests, and even less in multichannel evaluations. Monophonic components exist in stereo and multichannel programs, so designing loudspeakers to meet the most stringent (mono) test was considered worthwhile. But,the question remains; why? Were the audible defects more clearly revealed because the spatial complexity and inherent amplitude/phase (linear) distortions of stereo were absent? Is this why headphone listening has such an almost magical clarity? One sound to each ear, not two. It does seem reasonable.

This is the background within which the question is being asked. That there are people who think stereo is somehow a naturally superior form of reproduction is a testament to human tolerance and adaptability.

To this analysis of direct sound must be added the contributions of reflected sounds, and my instincts tell me that there may well be advantages to some added confusion - a sense of ambiguous spaciousness. Anechoic chamber stereo is not especially flattering.

Genuine envelopment – the impression of being in a different, larger, room, requires long-delayed sounds arriving from further to the side than either left or right stereo loudspeaker locations. This is what multichannel systems were created for by the film industry to persuade audiences that they were in the acoustic spaces shown on the screen. In the decades of double-blind listening tests in normally reflective rooms there is evidence of a generalized preference for loudspeakers with well-behaved off-axis dispersion. It seems that some reflected sound is desirable, but we lack definitive guidance about exactly what “well-behaved” means. Gross irregularities in frequency-dependent directivity should be avoided, but what are the tolerances? In fact, what is the preferred directivity? How wide should the dispersion be?

Multichannel does not completely solve the problem because there are still phantom images across the front soundstage, and elsewhere, but a real center loudspeaker is a start. That delivers "pristine" sound from three locations: center and hard-panned left and right. But the fact that we have adapted to the corrupted sound associated with phantom images remains a confounding factor. I notice that many programs deliver the "center" sound from all three loudspeakers across the front, and some recordings ignore the center speaker. Adaptability is clearly a required feature of human perception.

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Member

​

Major Contributor

Senior Member

Chief Cat Herder

Major Contributor

Addicted to Fun and Learning

Major Contributor

Senior Member

Member

Major Contributor

Grand Contributor

Addicted to Fun and Learning

Member

Addicted to Fun and Learning

Major Contributor

​

Speakers (preference score in parentheses):​

Test Tracks:​

Spinorama of speakers:​

In-room response plotted against estimated:​

Active Member

Major Contributor

Helping stretch the audiophile budget…

Addicted to Fun and Learning

Senior Member

Similar threads

Speakers (preference score in parentheses):

Test Tracks:

Spinorama of speakers:

In-room response plotted against estimated: