Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Newman · Mar 30, 2023

sejarzo said:
My guess is that many folks who don't listen to classical music, both small and large ensembles and in various venues, score speakers simply on "what sounds good to me" based on how they believe a multi-miked rock/pop recording should sound. I know that's what I did when I bought my first "hifi" speakers (BIC Venturi!) in 1976.

As my interests broadened, and I attended more classical performances, I almost quit using anything but classical and some small ensemble acoustic jazz to judge speakers. I know that if the timbre of instruments and overall balance isn't right, I will be unhappy with the speaker in the long run. Unfortunately, that means when I listen to rock/pop, I often wonder what the artist and producer were thinking because the tilt can vary between ridiculously dark or bright, and sometimes even on the same album. Yes, the circle of confusion is real and very big.

However, it is interesting that Dr Toole's research shows that speaker preferences don't really vary with the type of music used, or the type of music that the listener is used to. If they did vary, then his book would have said "this is what you need to look for in a speaker for classical music, and this (different criteria) is what you need for pop/rock". Instead, his book actually mocks the whole idea.

The real issue is judging speakers blind and level matched. Which practically no-one does.

Omid · Mar 30, 2023

MediumRare said:
I understand you screened out some participants after reviewing their scores, which is good.

This is just a general comment on experiments, not on this particular test (the data was excluded due to improper level matching), but when it comes to this comment above, it’s not strictly correct:
If you want a valid experiment, data cannot be screened out once a study has started. All answers need to be tabulated. Otherwise you can inadvertently start picking and choosing to get the answers you ‘re looking for.

This type of bias is why the person conducting the experiment is also blinded to the results (in a double blind test) until the statistics are done.

However, it is fair to have exclusion criteria that have been set out before the test starts to allow for the exclusion of a tester if their results do not meet certain standards.

I know it’s being nitpicky but I thought I’d mention it if anyone is interested in these technicalities.

Sernyl · Mar 30, 2023

Very interesting test, but it lacks some classical music pieces ; orchestral, and chamber music with winds (e.g. Schubert Octet).

Thomas_A · Mar 30, 2023

MatthewS said:
I'm concerned that this method is actually significantly more work and it isn't clear what question we are trying to answer. I'd want to start with the hypothesis we are trying to prove, then design a tests that support or disprove that. To rank speakers with this method would require a massive amount of permutations.

Addtiinally, focusing only on one track makes it very challenging. Not every track excites all parts of the frequency spectrum and from our experience (and I think prior research) you need to run through a few different bits of material to start to zero in on a ranking. Unless you have absolutely terrible speakers in the mix, you're going to see some speakers perform differently on different material.

We also have found we tend to get maybe an hour of someone's time (not everyone is as nuts as all of us). I'd be concerned this method would take more than an hour.

Why do you think this is more efficient?

We actually end up getting 4-5 "tests" per speaker out of each listener because we use multiple musical selections. If we got 20 people, that would be 100 data points on each speaker.

One can always do pairwise and take the winner against next contender. Will be four sessions for five speakers (or nx4 sessions if you split it with n different music). But then there is no real need of a random carousel, if used it would need special programming to allow for keeping the winner after each round. Not perfect and you need many listeners get statistics. You can also test the same person several times to see repeatability

computer-audiophile · Mar 30, 2023

Thomas_A said:
One can always do pairwise and take the winner against next contender. Will be four sessions for five speakers (or nx4 sessions if you split it with n different music).

This is the kind of technically simpler blind-hearing test method that I have already had the pleasure of taking. Such listening sessions have also helped me a lot already.
No great computer input was necessary for this either. But it's typical that young people do it that way today. (No criticism!)

Thomas_A · Mar 30, 2023

computer-audiophile said:
This is the kind of technically simpler blind-hearing test method that I have already had the pleasure of taking. Such listening sessions have also helped me a lot already.
No great computer input was necessary for this either. But it's typical that young people do it that way today. (No criticism!)

One drawback is that you only get a binary result and no ”scores” for the other speakers for a session. So it needs many tests and/or test subjects. But it can be a rather fast session time actually.

Sernyl · Mar 30, 2023

IIRC, Toole included "scores" as well.
They should be maintained, at least for the fun.

Newman · Mar 30, 2023

People overstate the limitations of the single-number score, because they imagine others are overstating its applications.

MediumRare · Mar 30, 2023

Omid said:
This is just a general comment on experiments, not on this particular test (the data was excluded due to improper level matching), but when it comes to this comment above, it’s not strictly correct:
If you want a valid experiment, data cannot be screened out once a study has started. All answers need to be tabulated. Otherwise you can inadvertently start picking and choosing to get the answers you ‘re looking for.

This type of bias is why the person conducting the experiment is also blinded to the results (in a double blind test) until the statistics are done.

However, it is fair to have exclusion criteria that have been set out before the test starts to allow for the exclusion of a tester if their results do not meet certain standards.

I know it’s being nitpicky but I thought I’d mention it if anyone is interested in these technicalities.

I agree with you, but consider this: you’re testing lipstick color preference and a test participant (candidate) is color blind. Do you continue with the expense of the test and keep their scores?

MyCuriosity · Mar 30, 2023

computer-audiophile said:
Understood!

But this test is about listening.

Well, then should be measured as intended. Stereo. This test tells how the single speaker disperses sound. Not how two speakers will work together in a stereo configuration. Anyway. All good and useful

Tom C · Mar 30, 2023

MyCuriosity said:
Well, then should be measured as intended. Stereo. This test tells how the single speaker disperses sound. Not how two speakers will work together in a stereo configuration. Anyway. All good and useful

The tests have been done, by educated, paid, professional, experienced researchers who do audio research for a living, and publish their results and compare to other researchers who publish their results. The conclusion is, that speakers that perform well in mono tests also perform well in stereo tests. And that when tests are done in stereo, the results are not different, and do not change, when compared to results obtained when testing in mono. You don’t have to wonder. The good news is that stereo testing has been considered, empirically evaluauted, and found to not give any advantage over mono testing.

sejarzo · Mar 30, 2023

Newman said:
However, it is interesting that Dr Toole's research shows that speaker preferences don't really vary with the type of music used, or the type of music that the listener is used to. If they did vary, then his book would have said "this is what you need to look for in a speaker for classical music, and this (different criteria) is what you need for pop/rock". Instead, his book actually mocks the whole idea.

The real issue is judging speakers blind and level matched. Which practically no-one does.

I'm not surprised. Other than overly strident strings in some recordings, the balance of classical music is more consistent, thus a neutral speaker would be preferable. Likewise, a more neutral speaker would best hit the midpoint for people who've become accustomed to the strangely broad mix of bass-heavy to treble-heavy mixes in pop/rock. I tend to find myself cutting treble more than cutting bass on those, but it does go both ways.

MyCuriosity · Mar 30, 2023

Tom C said:
The tests have been done, by educated, paid, professional, experienced researchers who do audio research for a living, and publish their results and compare to other researchers who publish their results. The conclusion is, that speakers that perform well in mono tests also perform well in stereo tests. And that when tests are done in stereo, the results are not different, and do not change, when compared to results obtained when testing in mono. You don’t have to wonder. The good news is that stereo testing has been considered, empirically evaluauted, and found to not give any advantage over mono testing.

Ok, you are right on that front. After all even the most respected reviewer (for me) who does measurements, uses similar methodology with a single speaker. I was just expecting something different that will provide information about real life characteristics when we use a stereo setup.

Omid · Mar 30, 2023

MediumRare said:
I agree with you, but consider this: you’re testing lipstick color preference and a test participant (candidate) is color blind. Do you continue with the expense of the test and keep their scores?

Yes, that totally makes sense it would reduce time and effort to exclude some participants.

I’m just saying that ideally you would state your exclusions [for example colour blind people] before you start the test. And then your conclusion will be that lipstick X is preferred by non-colorblind people.

If you want your testers to represent all comers, then you wouldn’t exclude anyone, including less discriminating people.

Not a big deal either way, we’re doing this just for fun…

ROOSKIE · Mar 30, 2023

Tom C said:
The tests have been done, by educated, paid, professional, experienced researchers who do audio research for a living, and publish their results and compare to other researchers who publish their results. The conclusion is, that speakers that perform well in mono tests also perform well in stereo tests. And that when tests are done in stereo, the results are not different, and do not change, when compared to results obtained when testing in mono. You don’t have to wonder. The good news is that stereo testing has been considered, empirically evaluauted, and found to not give any advantage over mono testing.

Howdy, these are some pretty large definitive claims, prolly better back them up with links rather than nods to professionalism.

I surely understand why Mono was chosen - or at least why I would use it. Relative Simplicity. Especially when speakers are to be used in any number of multichannel from 2-20.
Listening in Mono and conducting a double blind controlled testing on a large number of subjects in order to study a wide variety of sound reproduction traits is already a staggering task. Going beyond mono complicates this in unbelievable ways. (think 3 body problem)

Mono is a good way to ensure that the study is mangeable at best. I can also see how generally speaking a speaker that sounds great alone ought to sound great in stereo or multichannel and I beleive Toole that he did test this enough to feel comfortable sticking with mono.
That said it seems there has been only a little published on the effects of stereo reproduction, nothing like the large mono studies we refer to here at ASR.

I have read Toole's book and he only touches on the subject now and then.
The results of the limited tests I have seen published did change in stereo vs mono, it was not a linear increase. In my view your claim that they did not is false. However Toole is clear that the winner still won but typically by less. Section 3.4
I think this is important as it pertains to how the completed system sounds which as a consumer is what I will be actually using. Some speakers will be much less enjoyable in mono vs stereo and others will actaully not improve as much going from mono to stereo. Interesting stuff.

Why they changed did not seem to me to be able at the time to be deeply investigated (& understandbly so)

Additionally the speakers were not always able to placed in optimized positions based on the individual speaker nor were all listeners always in an optimized seating position. Dispersion alone is a huge factor if the speaker's position is not optimized or even if it is if the listener is off axis and or the dispersion is narrow vs wide. Playback SPL is a factor as the tonality is not perceived as increasing in a linear fashion as the volume goes up or down. Which can really affect bass and as bass is some 30ish% of perceived SQ then that is big.
Age of listeners/hearing ability such as people who are actually confused by the stereo soundstage in the same way they would be in a crowded restarount.

This and more. Plus as you implied money is involved. At some point this has to make someone money in our society, it wasn't charity work. There was a budget and constaints and limited time and finacial goals involved. All understandable aspects of a complex study(multiple studies), and it is conceivable a lot could still be addressed in further investigation & certainly traits that were or were not issues in this study may present differently in a finished, known install.

Newman said:
However, it is interesting that Dr Toole's research shows that speaker preferences don't really vary with the type of music used, or the type of music that the listener is used to. If they did vary, then his book would have said "this is what you need to look for in a speaker for classical music, and this (different criteria) is what you need for pop/rock". Instead, his book actually mocks the whole idea.

The real issue is judging speakers blind and level matched. Which practically no-one does.

Section 3.5.1.7 is a basic discussion on program material.

In a nutshell it does matter what program material is chosen for conducting accurate, repeatable testing.

and @sejarzo figure 3.15 & 3.16 in Toole book may be helpful as well as the summary that," complex productions with broadband, relatively constant spectra aid listeners in finding problems". He also says there are no hard and fast rules.

He mentions in section 3.4 that imaging(and thus listner preference) in particular is very different from say classical(lots of recorded ambiance) on a narrow or wide dispersion speaker. Or Jazz with lots of close miking and hard panning(highly controlled manufactured soundstage stage without real ambiance). This difference is true in mono and stereo.

At any rate I highly doubt that some specific music track or whole album(track not, genre) doesn't present better on some speakers vs others.
The circle of confusion almost requires it and this would require a decision to be made about testing that may or may not satisfy all.

That said,(sighted listening) within a reasonable range of variation my favorite speakers generally sound excellent on nearly everything I like and the worst ones I have heard tend to sound fairly bad often with a few tracks that seem to hit just right. (and a big lot of tracks where it is not really easy to form a solid preference)

ROOSKIE · Mar 30, 2023

Omid said:
Yes, that totally makes sense it would reduce time and effort to exclude some participants.

I’m just saying that ideally you would state your exclusions [for example colour blind people] before you start the test. And then your conclusion will be that lipstick X is preferred by non-colorblind people.

If you want your testers to represent all comers, then you wouldn’t exclude anyone, including less discriminating people.

Not a big deal either way, we’re doing this just for fun…

MediumRare said:
I agree with you, but consider this: you’re testing lipstick color preference and a test participant (candidate) is color blind. Do you continue with the expense of the test and keep their scores?

How would someone allow a color blind participant to get involved in a test of color? Would you want any info from a 'test' that allowed that to become an issue during the testing? Totally agree with @Omid , discarding results has to be carefully done and with established protocol that is followed. Seems OP did it for good reasons by the way.

No offense intended with the following statement.
This reminds me that I personally couldn't care less what a large section of the population thinks about sound quality.
I am not trying to sell 'X" product to the most consumers.

Many folks around now are relishing 'analog' sound via bluetooth turntables with no idea of the catch 22 that presents. All good fun of course, though it begs me to ask who is a good subject?

Upon showing off my hifi gear, among the impressed, I have had people tell something is wrong as different sounds are coming out of each speaker.
Multiple friends have on occasion thought the sound was distorted and it was determined they had never heard the nuances and details present in albums they were familiar with. To them it came across not as higher fidelity but rather as an issue. It also turns out some folks can not stand realistic bass or realistic treble levels.
These are just issues of familiarity.
Including a fun big one, I realized many casual listeners - even if big musicfiles - were not very familiar with imaging and soundstaging. When it didn't confuse them, it often blew them away a bit to much. I feel they would need time to acclimate to the nature of a high performace rig before I give a shyte what they think outside of hoping to offer them a cool experience.

What about hearing issues, plus some people are just poor listeners both in terms of music/content and just a good ol' conversation. Also some people just don't enjoy music as much as others. Same for hifi sound itself as some folks just don't perceive it as all that cool. Some folks are very insincere and others will surely be as earnest and as participatory as they possibly can.

Folks should be familiar with the room as well and given time to acclimate there.

Who knows what else, but who has a friend/aquaintance group big enough?

Anyway OP I estimate you are likley just happy to have people involved, beyond that what are your general interests here in terms of what population of people are you interested in gathering?

Floyd Toole · Mar 30, 2023

Floyd Toole said:
What would I change in future tests of this kind? Probably I would not sit with a hard reflective wall a foot behind my head.

I understand the pressures that the testers were under - they were not paid to do this, without elaborate custom designed facilities such as I and my colleagues had over the years - the generosity of Canadian taxpayers (while I was at the National Research Council) for 27 years and, the multimillion dollar investment that Harman International made for 17+ years - allowing the results to be published for our competitors to share. How many companies would do this? Altruism? Some, believe it or not, out of respect for the scientific tradition of sharing results, but also a realization that competently done research reflects well on the people involved and the brands associated with the sponsor. Sadly, a few people think it was just marketing. Sigh . . . The research group at Harman designed no products, it generated knowledge. Now, in my retirement and upon reflection, it is gratifying to see the NRC/Harman ideas and measurement methodology widely used and even "standardized" in the industry - all decided by Harman competitors. Would this have happened if we were not able to publish? Rising waters lift all boats.

The effort reported here reflects well on Matthew et al - it is through this kind of sincere and serious work that progress is made. Again, congratulations. I hope you can find the energy and time to do more, and that others may join in. As I said, and as was found, bass extension alone is a significant deciding factor among small loudspeakers, so having accurate spinorama data on the products is essential to a balanced interpretation of the results. Properly-integrated subwoofers are the "great equalizer" among smaller loudspeakers.

This comment was a suggestion for future improvements, but also a reminder to all audio enthusiasts that not only the locations of the loudspeakers matter. I hear and read frequent discussions and debates about loudspeaker location, but little attention to where the ears end up. Both matter, obviously. Strong reflections from behind the listener affect both imaging and sound quality; although, frankly, I find it remarkable the extent to which one can adapt to such a situation. Several times, though, I have placed a pillow or cushion behind a listener's head and heard exclamations of how much the soundstage improved. One needs at least 4 inches of fluff for best effect.

An additional thought: there are many different experimental methods, and approaches to statistical analysis. As I have mentioned a few times, in achieving trustworthy subjective ratings one need comparisons. The "take it home and listen to it" approach of many reviewers is seriously flawed. Paired, A v.s B, comparisons are classic, and many randomized pairs from a population of loudspeakers can produce useful data, but it is definitely labor and time intensive. By chance, I stumbled into my first evaluation using multiple comparisons - three or four at a time. Multiple comparisons are more efficient, and, importantly, they permit listeners to adapt to the listening space. Loudspeaker position remains a variable to be randomized in repeated presentations, which is still labor intensive. However, positional substitution, as in using a turntable or the elaborate "shuffler" at Harman, renders the room effect even more constant, and makes the test even more efficient. But then one is talking about a "facility" of some kind, even if it is the clever little turntable used here. Few people are willing or able to go this far. The result is less reliable data, and endless discussions.

One can be rightly concerned about positioning with respect to room boundaries. It matters, to be sure, but it appears to be one of the factors that humans adapt to in dealing with room acoustics, which we do all the time in real life. See Section 7.6.2 in the 3rd edition for perspective on room effect in sound quality evaluations. In a final setup, these adjacent boundary issues can be significantly alleviated by both positional adjustments and equalization - all discussed in Chapter 9 of the 3rd edition of my book. In the real world there are usually limitations on available loudspeaker positions, so EQ is necessary - which is needed to address room resonances in any event. Equalization at low frequencies is almost unavoidable.

As always, it is best to see an audiogram before putting faith in anyone's opinion about sound quality.

Enough for now - good luck, and more power to the elbow!

sejarzo · Mar 30, 2023

ROOSKIE said:
Multiple friends have on occasion thought the sound was distorted and it was determined they had never heard the nuances and details present in albums they were familiar with. To them it came across not as higher fidelity but rather as an issue. It also turns out some folks can not stand realistic bass or realistic treble levels.

Not surprising at all!

I was working a contract job far from home in 2006, and a coworker there was a huge classical fan. He had a private office and was playing classical CDs all day on his small Altec Lansing computer speakers. He talked about how he grew up playing the clarinet, living in the Philadephia area where attended live concerts all the time, both orchestra and small ensemble. He frequently told me how much he missed that.

One day I brought in my own laptop, DAC, headamp, and HD600s so he could hear what a highly rated set of cans sounded like. It took no more than 10 seconds for him to exclaim "Wow, you hear EVERYTHING with these!" but after listening to a variety of tracks over 20 minutes or so, he said "You might find this odd, but I am not sure that I like hearing everything."

I know that some folks just don't care for headphones, and that acclimating to the different perspective is not immediate, but I wonder how he would have reacted to speakers with a wide and flat response.

goat76 · Mar 30, 2023

Floyd Toole said:
The panned-image “soundstage” is the dominant factor, and whether the “panning” is done with the common interchannel amplitude-difference pan pots (so-called multichannel mono), or by amplitude and/or time differences generated by the microphone arrays (the so-called ‘purist’ approach) the result is the same. Two time-separated sounds arriving in each ear generate acoustical interference, resulting in an audible dip around 2 kHz (enough to degrade speech intelligibility for the center image - usually the featured artist). Any notions of pristine waveforms, impulse response, amplitude and phase response in the direct sounds arriving at the ears can only exist for hard-panned mono left and right images. The inherent sound quality of the loudspeakers has been degraded for all “soundstage” images including the featured artist. Timbral perfection has been rendered impossible. But, is it good enough? Obviously, yes, because we have derived enormous pleasure from stereo reproduction for decades.

In addition, all direct sounds arrive from about +/- 30 deg. which provides HRTF characterization for the wrong incident angle - generating an unavoidable timbral error as well as possible localization confusion for familiar sounds. Put it all together and it is clear that the human brain has subconsciously adapted to accept multiple acoustic and psychoacoustic errors that exist only because of stereo reproduction. “Perfect” loudspeakers and electronics cannot remove them.

The dip around 2 kHz is something that I, @Thomas_A, and many others have discussed a lot at the Swedish forum Faktiskt, and even if it's a little step outside the topic of the thread, I like to take the opportunity to ask you one question about it if you don't mind.

With modern multi-mono recordings nowadays, it's common that every single sound element in the mix is equalized individually until they sound exactly the way the mixing engineer wants them to sound while monitoring the mix on a pair of loudspeakers set up in a regular stereo configuration (which of course should reveal the problem in the same way as any other stereo setup should be able to do). Isn't it then a pretty big chance that the dip at 2 kHz is (almost automatically) taken care of if this flaw is obvious and sounds audible wrong?

I mean, the mixing engineer doesn't even need to know why this particular stereo flaw occurs, if it's an obvious audible problem he should clearly hear it and try to fix it among all the other things he already equalizes in the mix. I do understand why this stereo flaw can never be addressed in the speakers themself because that would also affect the hard-panned sound, not just the phantom sounds in the mix. But do you see any reason why this particular dip can't be addressed in the mix itself, and just for the phantom sounds that are affected by it?

Cote Dazur · Mar 30, 2023

ROOSKIE said:
Including a fun big one, I realized many casual listeners - even if big musicfiles - were not very familiar with imaging and soundstaging. When it didn't confuse them, it often blew them away a bit to much. I feel they would need time to acclimate to the nature of a high performance rig before I give a shyte what they think outside of hoping to offer them a cool experience.

So true, what we have access to, in a dedicated room with a well set up stereo system, is quite glorious and rich. Too rich for many, for non initiated and even here at ASR, when looking at some desktop system or speakers obviously not located where they can perform anything spectacular, makes me wonder how our supposed to be common experience is any relevant to any of us.

ROOSKIE said:
Folks should be familiar with the room as well and given time to acclimate there.

When knowing how strongly the room affect the sound that we hear, even if some think we acclimate (over time) that should be a must. Depending on the room, its effect might be so strong, that all tiny measured difference may very well be totally obliterated.

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Major Contributor

Member

Member

Major Contributor

Major Contributor

Major Contributor

Member

Major Contributor

Major Contributor

Member

Major Contributor

Addicted to Fun and Learning

Member

Member

Major Contributor

Major Contributor

Senior Member

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Similar threads