• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
I’d like to see for next time:
1. Add Genelec speaker if you can
2. High pass each speaker (say at 80hz) to eliminate the variable bass output. Perhaps run this as trial #2.
3. More test subjects to make data more statistically significant.
4. Test subjects should be sitting at the same ear height, to eliminate this variable

  1. If someone can get a Genelec to the Seattle, WA area, we'll include it. I'm not sure if @amirm has one.
  2. This was something we thought about, but went back and forth on the utility--also just didn't have time.
  3. Definitely!
  4. Maybe we can bring a bunch of seat cushions.
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
I'm happy to supply some code, or do a bit of analysis if helpful!
I uploaded the raw data to the first post if you want to play with it! There are only 6 users, though a lot of individual tests for each person. We actually had 9, but 3 were tossed out because of the SPL issue I mentioned.

Just because I don't think I mentioned it anywhere else, every song was a new randomization. Every time we started a new song, it randomized the speakers. Basically, no individual or song ever had the same speaker randomization. The control board automated all of this.
 
OP
M

MatthewS

Member
Forum Donor
Joined
Jul 31, 2020
Messages
95
Likes
862
Location
Greater Seattle
What were your directions to the judges when scoring to determine the "better" speaker? Subjective preference between speakers or reference to memory of how the source "should sound"
  1. Bass of JBL looks to be near flat down to 50Hz, whereas all other speakers started falling off between 60Hz and 70Hz, do you think that extra mid-bass power made a difference in sound quality, so maybe this alone would nudge the JBL ahead, all else being near equal.
    • Would you have to cut off frequency response of the source material to take this into account otherwise you end up benefitting the speaker that has the best bass frequencies?

We just asked them to rate on a scale of 1 to 10 where 1 is the worst and 10 is the best.

The bass response on the JBL definitely helped it. @Floyd Toole already weighed in on the thread that bass accounts for about 30% of our preference.

The score of the neumann and the jbl seem to have swapped between the text at the start of the post and the graph of preference rating

The first list is the calculated preference score based on the spinorama data. The graph is the results of the listening test. It's just coincidence that some of the numbers were the same.
 

Tangband

Major Contributor
Joined
Sep 3, 2019
Messages
2,994
Likes
2,799
Location
Sweden
This testing is interesting, but have two major flaws that can make people draw the wrong conclusion:

1. In a normal room you always install the speaker with the correct ( for best sound ) distance to the frontwall behind the speaker, and also the correct distance between the two stereo speakers . This is very important. 10 cm can make real differences. This means that a speaker with little baffle step correction needs to be placed nearer the frontwall to gain a correct frequency response, and a speaker with full baffle step correction will be placed freestanding away from the wall.

Doing this the wrong way, you can end up with very false listening results. Is every speaker in the test made with full baffle step correction ?


2. One other ” trap” when testing loudspeakers in this way, is that the listener only score the prefered, most ”likable” sound - the listener have no idea how the recordings really sounded in recording place or how the real instruments sounded in the concert hall.
As a semi-pro recording engineer and musician, real instruments like strings or flute can sometimes sound rather hard in real life . A softer sound from the tested loudspeakers will be prefered in those cases .
 
Last edited:

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,901
Likes
16,911
This testing is interesting, but have two major flaws that can make people draw the wrong conclusion:

1. In a normal room you always install the speaker with the correct ( for best sound ) distance to the frontwall behind the speaker, and also the correct distance between the two stereo speakers . This is very important. This means that a speaker with little baffle step correction needs to be placed nearer the frontwall to gain a correct frequency response, and a speaker with full baffle step correction will be placed freestanding away from the wall.

Doing this the wrong way, you can end up with very false listening results. Is every speaker in the test made with full baffle step correction ?
In the spinorama data of each he has posted in the OP the mid bass level is similar or even higher than the lower mids level so I would say yes. Of course like some others have written in another experiment someone could equalise the loudspeakers in the modal region but this would examine a different question.
 

Tangband

Major Contributor
Joined
Sep 3, 2019
Messages
2,994
Likes
2,799
Location
Sweden
In the spinorama data of each he has posted in the OP the mid bass level is similar or even higher than the lower mids level so I would say yes. Of course like some others have written in another experiment someone could equalise the loudspeakers in the modal region but this would examine a different question.
Yes - and it brings out a further question if klippel system testing always favour loudspeakers with a full baffle step correction, in the listening test ?
Its mandatory to place a speaker correct in a room when listening to it. Some speakers are constructed to be placed near a frontwall.
 

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,901
Likes
16,911
Yes - and it brings out a further question if klippel system testing always favorizise loudspeakers with a full baffle step correction.
The Klippel NFS just measures frequency responses from which the spinorama and Harman score can be computed, it is preference agnostic, the interpretation of the measurements is fully up to the users.

As Dr. Toole wrote above:
The preference score was an academic exercise to test an hypothesis - namely that there was a correlation between spinorama data and double-blind listener evaluations. It served its purpose, but in real life I know of nobody in Harman who uses it - including me - we use two eyes and a brain to examine the family of curves and arrive at conclusions that embrace trade-offs. For example, simple bass extension - a limited bass extension can result in a lower calculated score, but in the real world a properly integrated subwoofer system can elevate it to an even higher level. A single significant resonance or a spectral tilt can be equalized to attenuate its audibility, but many resonances would present a serious problem, yet both could yield similar scores. And so on. Look at the curves, learn to interpret them, forget about the all-embracing "score".
 

LTig

Master Contributor
Forum Donor
Joined
Feb 27, 2019
Messages
5,833
Likes
9,575
Location
Europe
Maybe.....
Those differences in output impedance interaction are usually smaller than the difference in one speaker having much deeper response or having some hump in response at either end of the spectrum. Or having a sloping treble or a hot treble etc. I think that is why Toole suggested using the more limited pink noise. I've done it both ways and the bandwidth limited way seems to work better.
Ok. I trust you because I have no experience with such tests.
I fear with speakers there may be no perfect way to match levels because unless FR is matched you really cannot fully match levels.
I totally agree. Either method (bandlimited or not) will lead to preference of one device over the other.
An example with a more precise device. Comparing microphone pres once, I think they had 9 of them. Anyway, one of them had a slope starting about 8 khz and was down 3 db at 20 khz. It also rolled off below 30 hz. Another had reduced treble though less than the previous one. All the others were flat enough to 20 khz not to matter. Those two were the only one's detected reliably as different unsighted. If using pink noise, you'd slightly boost the one with roll offs at both ends to get a match on voltage. So through the midrange where most music is it would be louder. It still sounds different, but probably would get preferred due to the louder midrange. In this test they matched a tone at 1 khz which I think was the right call. Speakers are more complex, but I think matching the core midrange of the music makes sense.

Now this is my opinion (at least somewhat informed), I'm in no position to say it is best or only way.
I fear you are right.
 

maty

Major Contributor
Joined
Dec 12, 2017
Messages
4,600
Likes
3,170
Location
Tarragona (Spain)
@Floyd Toole

These comparisons are mainly based on the frequency response curve, on tonality, but they miss an important factor: how is timbre measured? Or the sound layers in the room, which very few are able to appreciate due to their audio systems and the usually bad placement of the loudspeakers, very close/close to the walls?
 

sweetsounds

Active Member
Joined
Apr 24, 2019
Messages
143
Likes
284
Shortly after completing the first blind listening test, @Inverse_Laplace and I started thinking about all the ways we’d like to improve the rigor and explore other questions. Written summary follows, but here is a video if you prefer that medium:

Some personal thoughts:

Once you get into well-behaving studio monitors, it becomes extremely difficult to tease apart the differences. It takes a lot of listening and tracks that excite small issues in each speaker. A preference score of 4 vs 6 appears to be a significant difference but depending on the nature of the flaws it can be extremely challenging to hear the difference. It is easy to hear that the speakers sound different but picking out the better speaker gets very difficult.

Running a well-controlled experiment is extremely difficult. We had to measure groups on different days and getting the level matching and all the bugs worked out was a challenge. We learned a lot and will apply it to our next set of tests.

First of all, let me congratulate you. This is an awesome piece of work and I can't even imagine the discipline it took to execute.

First observation is, that the distribution is closer than the preference score suggests. So one hypothesis could be, that listeners might prefer a certain house sound. Yet, when looking at the data, the variations of some listeners between different songs on the same speaker can be large:

1680073096159.png


At first, listeners #5 and #6 look more consistent, but actually they simply scored average for all sessions, so they didn't differentiate a lot.
Take a look at their overall variation vs. the within-speaker variation. For a discerning individual, the overall variation should be larger than within-speaker.

At first sight also the music doesn't make a big differentiator:
1680074918086.png


My take-away: Humans have difficulties in consistently differentiating sound of speakers. We have overconfidence in our own listening capabilities.
And why should I fare better than these 6 individuals? Yet, I, too, still believe that I do hear strong differences between speakers. Oh my.
 

MAB

Major Contributor
Joined
Nov 15, 2021
Messages
2,152
Likes
4,847
Location
Portland, OR, USA
This testing is interesting, but have two major flaws that can make people draw the wrong conclusion:

1. In a normal room you always install the speaker with the correct ( for best sound ) distance to the frontwall behind the speaker, and also the correct distance between the two stereo speakers . This is very important. 10 cm can make real differences. This means that a speaker with little baffle step correction needs to be placed nearer the frontwall to gain a correct frequency response, and a speaker with full baffle step correction will be placed freestanding away from the wall.

Doing this the wrong way, you can end up with very false listening results. Is every speaker in the test made with full baffle step correction ?


2. One other ” trap” when testing loudspeakers in this way, is that the listener only score the prefered, most ”likable” sound - the listener have no idea how the recordings really sounded in recording place or how the real instruments sounded in the concert hall.
1680073611852.png

But seriously, what do you propose here? Have the live band right there on the carousel? Tower of Power, just rotate them in and make sure they're level matched.:) I think you state the obvious problem with all music reproduction, what are you reproducing? The original performance? If so, these are 5" monitors, very unrealistic goal. And, Hunter sounds the way Björk and her team decided it should sound, with lots of airy effects and all those neat things she does. And her live performance is great too, but I'm not sure I would or could get her involved with an experiment at the local hifi club.;)

I guess I don't really agree that any of your views are "flaws" in the OP's approach here. Maybe your ideas are extensions. Although referencing to a live instrument, with any repeatability, and then providing your audience with speakers to compare to, huh???o_O What are you proposing? Unless you can spell out what I seem to lack the imagination to cook up, it ends up a straw man. I think maybe you are talking about two different things. The ability to tell things apart by their technical qualities, and the quest for realism.
 

computer-audiophile

Major Contributor
Joined
Dec 12, 2022
Messages
2,565
Likes
2,881
Location
Germany
The bass response on the JBL definitely helped it.
Yes, the bass range is very pleasant. Since this test is about the complete speaker, I don't think it would be useful to cut the frequencies to judge only the midrange and treble. I don't use my JBL 305p MKII that way either.
the listener only score the prefered, most ”likable” sound
The opposite would be if the sound were unappealing. Can that be the goal? I don't think so. In my opinion, JBL does many things right. Training the ear on the original sound is of course enormously important with classical instruments, so that you can judge it at all.

By the way: when I bought my Neumann KH120 ten years ago, I could listen to many speakers of this class for hours alone in a studio and compare them. At that time I found them better (more right) than Genelec, which of course are also very good. In the studios I know and visit, also in the sound labs of music colleges etc. I find many Genelec and Neumann.
 

Palladium

Addicted to Fun and Learning
Joined
Aug 4, 2017
Messages
666
Likes
816
I got my pair of 305 Mk1 back in 2014. One speaker arrived at my house a day before the other.

And I was like "Wow, this sounds amazing even in mono and hooked to PC onboard!"
 

NeoZs99

Member
Joined
Jun 14, 2021
Messages
80
Likes
109
I got my pair of 305 Mk1 back in 2014. One speaker arrived at my house a day before the other.

And I was like "Wow, this sounds amazing even in mono and hooked to PC onboard!"
I still got mine.. While I no more use it, I'm still fond of it. My first proper bookshelves
 

computer-audiophile

Major Contributor
Joined
Dec 12, 2022
Messages
2,565
Likes
2,881
Location
Germany
I got my pair of 305 Mk1 back in 2014. One speaker arrived at my house a day before the other.

And I was like "Wow, this sounds amazing even in mono and hooked to PC onboard!"
And the MKIIs are even better. :)

It's nice that you don't have to spend so much money on usable hi-fi equipment nowadays. Even DACs, for example, as you can see in the ASR. (Of course, I have spent a lot in the past).
 

MyCuriosity

Member
Joined
Jan 25, 2023
Messages
85
Likes
41
Read this post about that.
I get the argument of stereo vs mono but my hobby is listening music. Not measuring speakers. As a listener the mono measurements have merit but not enough to evaluate the overall performance of a set of speakers in a 2-channel stereo arrangement.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,469
Likes
2,466
Location
Sweden
Just plain brilliant, you guys! Of course I am flattered to be credited with being a motivating factor, but mostly I am greatly encouraged that there are people willing to engage what is not a simple task - and succeed. Compared to the 99.9% of "listening tests" and "reviews" that are done without even the most basic controls in place this is a masterpiece. I see in the discussion some of the anticipated complaints that it was not done in stereo - I address this in my papers and books, but will include a different version below. Because the authors started with accurate measurements - something truly rare in the history of audio evaluations - they saw that one major factor in preferences was bass extension. This was revealed in my very early evaluations, published in JAES way back in 1986. More recently Sean Olives correlations revealed that bass alone accounts for about 30% of overall sound quality ratings. Woofer design determines bass extension, and room acoustics dominate bass sound quality, so when evaluating loudspeakers with limited bandwidth, which varies, bass alone is more influential than if all systems being compared were "full bandwidth", had subwoofers, or as has been suggested, high-pass filtered to limit all to the same bass extension. Decisions, decisions . . . Anyone looking for great satisfaction from any of these small speakers would be advised to add subwoofer(s), preferably with bass management (which includes high-pass filtering allowing them to play louder).

Beyond bass, the Neumann clearly shows where its higher price went - a significantly smooth frequency response. The inexpensive JBL is compromised, but the residual irregularities and resonances are at or below the threshold of detection. Broadband trends are quite often quickly adapted to as they are not perceived as timbral colorations. This is the essence of good engineering, balancing the cost-determining factors to achieve maximum listener satisfaction at any price level. Pay more, get more, but only if the bass extension is sufficient. Properly integrated subwoofers are the great equalizers among the vast majority of loudspeakers. The next and final step is to tame the room resonances - a totally different topic.

Here is a little dissertation on stereo vs. mono. Sorry if it repeat things I have posted elsewhere.

Adding levels of complexity to the decision process is the awkward fact that stereo - two channel record/reproduction through loudspeakers - is fundamentally flawed. The default format for music, stereo, is not capable of delivering the sound from loudspeakers to our ears without significantly modifying the recorded waveforms. Amplitude and phase responses (the impulse response) are corrupted for the sounds generating all sound images appearing between the loudspeakers on the soundstage. Notions of “purity" in stereo listening are fanciful.

In "live" experiences there is only one direct sound arriving at each ear from a single sound source. In stereo all sound images between the loudspeakers are phantoms, created from identical sounds radiated by each loudspeaker – double-mono - with inter-channel amplitude or time differences to provide location cues for image position. Each ear receives two versions of the same sound separated by a delay and modified by head diffraction. The only exceptions are the hard-panned sounds emerging from the left and right of the soundstage; these are monophonic components and are timbrally and spatially distinctive. So stereo listening is a hybrid experience, partly mono but mostly double-mono with the inherent corruptions.

The panned-image “soundstage” is the dominant factor, and whether the “panning” is done with the common interchannel amplitude-difference pan pots (so-called multichannel mono), or by amplitude and/or time differences generated by the microphone arrays (the so-called ‘purist’ approach) the result is the same. Two time-separated sounds arriving in each ear generate acoustical interference, resulting in an audible dip around 2 kHz (enough to degrade speech intelligibility for the center image - usually the featured artist). Any notions of pristine waveforms, impulse response, amplitude and phase response in the direct sounds arriving at the ears can only exist for hard-panned mono left and right images. The inherent sound quality of the loudspeakers has been degraded for all “soundstage” images including the featured artist. Timbral perfection has been rendered impossible. But, is it good enough? Obviously, yes, because we have derived enormous pleasure from stereo reproduction for decades.

In addition, all direct sounds arrive from about +/- 30 deg. which provides HRTF characterization for the wrong incident angle - generating an unavoidable timbral error as well as possible localization confusion for familiar sounds. Put it all together and it is clear that the human brain has subconsciously adapted to accept multiple acoustic and psychoacoustic errors that exist only because of stereo reproduction. “Perfect” loudspeakers and electronics cannot remove them.

As has been well publicized, listeners are far more critical in their assessments of sound quality when listening to a single loudspeaker. This fact has generated a fairly constant stream of flak from those thinking that stereo is a more rigorous test and that listening in mono was all but irrelevant. However, decades of double-blind tests show that listeners hear problems in loudspeakers more readily when listening in mono. The superior sound quality was less clearly reflected in scores in stereo comparison tests, and even less in multichannel evaluations. Monophonic components exist in stereo and multichannel programs, so designing loudspeakers to meet the most stringent (mono) test was considered worthwhile. But,the question remains; why? Were the audible defects more clearly revealed because the spatial complexity and inherent amplitude/phase (linear) distortions of stereo were absent? Is this why headphone listening has such an almost magical clarity? One sound to each ear, not two. It does seem reasonable.

This is the background within which the question is being asked. That there are people who think stereo is somehow a naturally superior form of reproduction is a testament to human tolerance and adaptability.

To this analysis of direct sound must be added the contributions of reflected sounds, and my instincts tell me that there may well be advantages to some added confusion - a sense of ambiguous spaciousness. Anechoic chamber stereo is not especially flattering.

Genuine envelopment – the impression of being in a different, larger, room, requires long-delayed sounds arriving from further to the side than either left or right stereo loudspeaker locations. This is what multichannel systems were created for by the film industry to persuade audiences that they were in the acoustic spaces shown on the screen. In the decades of double-blind listening tests in normally reflective rooms there is evidence of a generalized preference for loudspeakers with well-behaved off-axis dispersion. It seems that some reflected sound is desirable, but we lack definitive guidance about exactly what “well-behaved” means. Gross irregularities in frequency-dependent directivity should be avoided, but what are the tolerances? In fact, what is the preferred directivity? How wide should the dispersion be?

Multichannel does not completely solve the problem because there are still phantom images across the front soundstage, and elsewhere, but a real center loudspeaker is a start. That delivers "pristine" sound from three locations: center and hard-panned left and right. But the fact that we have adapted to the corrupted sound associated with phantom images remains a confounding factor. I notice that many programs deliver the "center" sound from all three loudspeakers across the front, and some recordings ignore the center speaker. Adaptability is clearly a required feature of human perception.
I could add that those blind challenges with binaural recordings in stereo setups made ASR showed quite clear which speakers that were preferred. The first one har a clear winner in the top line Revel speaker and the other one the Grimm speaker. Both with very good on and off-axis character.

So stereo may well be used even if used in binaural testing which I find quite interesting approach.
 

tktran303

Addicted to Fun and Learning
Forum Donor
Joined
Mar 27, 2019
Messages
685
Likes
1,199
.
The preference score was an academic exercise to test an hypothesis - namely that there was a correlation between spinorama data and double-blind listener evaluations. It served its purpose, but in real life I know of nobody in Harman who uses it - including me - we use two eyes and a brain to examine the family of curves and arrive at conclusions that embrace trade-offs. For example, simple bass extension - a limited bass extension can result in a lower calculated score, but in the real world a properly integrated subwoofer system can elevate it to an even higher level. A single significant resonance or a spectral tilt can be equalized to attenuate its audibility, but many resonances would present a serious problem, yet both could yield similar scores. And so on. Look at the curves, learn to interpret them, forget about the all-embracing "score".

Thank you for the clarification Floyd.

It’s wonderful to have your perspective.

The preference score has taken a whole new life of its own; with some people using it to guide their purchase decisions. I was confident it was never designed to do that and was always leery of claims that e.g. A 4” 2-way was purportedly as good as a much bigger speaker... simply because it had a very similar preference score.

My own designs and experiences have not shown that to be true. There’s still some unanswered science that I find fascinating e.g. what’s the ideal vertical directivity (if any), and what measurements can show why <insert favourite> tweeter may be better (or preferred). And do cone materials make a statistically significant difference (and why). And if distortion doesn’t make a difference, can we just get rid of the shorting rings?

Finally, do you have any inside info when the Salon 3 will be released? I tried to purchase a Salon 2 last year but there’s was not a single one available in Australia. Even the AU distributor did not have any information.

Best regards,
Thanh
 
Last edited:

computer-audiophile

Major Contributor
Joined
Dec 12, 2022
Messages
2,565
Likes
2,881
Location
Germany
I get the argument of stereo vs mono but my hobby is listening music. Not measuring speakers. As a listener the mono measurements have merit but not enough to evaluate the overall performance of a set of speakers in a 2-channel stereo arrangement.
I can assure you that they sound great in stereo as well. :)

7tage1.jpg
 
Top Bottom