Speaker Blind Listening Comparisons: Methodology Discussion

RayDunzl · Nov 11, 2019

Blumlein 88 said:
How about this Ray? Place your other 308 behind the first one, and firing backward in reverse phase. Turn it into a LSR308 dipolar speaker to match the ML.

I've been thinking about doing that with the pair of stacked subs on each side.

They each have a 2x4HD channel, so playing with timing/polarity etc isn't a problem.

I can't seem to become motivated, though.

And if that doesn't work any better, then attempt a source/sink at the front/rear of the room.

I have purchased four little furniture dollies to make it easy, but still...

---

It could be because the JBL are using the subs as a speaker stand, and I would lose that little convenience..

echopraxia · Nov 11, 2019

I think even playing even the same channel on two different speakers simultaneously as a test would probably raise more questions than answers, if posted with the intent of being a generally useful result online.

If you could prove via an extensive blind study that it correlates well to stereo listening pairs (as Harman did with the mono speaker test procedure), that would be really cool, but even if that were theoretically likely I don't think any of us have the resources to do a study with the sample sizes necessary to prove such a thing.

Snarfie · Nov 11, 2019

Since i use roomcorrection all speakres that i listen too i have always now in mind that such speaker could sound much better than they do with out room correction in the current room acoustics where they have to perform in. I did some room corrections for some freinds an they where stund regarding the results such that they consider now to buy a minidsp or something similar to listen to their vinyl. I would suggest to measure each speaker system do a propper room correction on them always with the same room correction software and/or in the room that is treated correcly. After that you could get the most out of each speaker comparison i suppose. If you compare them without room correction it is very difficult to compare them because the defiation in certain frequencies could be (in db's) to large to compare them correctly. Enclosed my own IFM compact II monitors see the defiation around 5000 hz (grey line) almost 15db how to make a propper comparison if other speaker handel the acoustic better it will not tell you if they are better or worse. If you are looking to the neutral corrected white line it creates a level playing field for all speakers. With this methode/comparrison i ended up (for now) with the Vandersteen model 1 speakers instead of the JK optima 3 speakers where I thought of that they where sounding great.

The same speakers but than measured in a anechoric chamber. It shows the same speakers but with a totaly different but balanced sound.

FrantzM · Nov 11, 2019

Hi

Much as been debated and criticized about @echopraxia attempts at speakers blind test. I took it from the start it wasn't going to perfect and it could not be deemed "scientific". It is however a valid and approachable attempt. IOW other enthusiasts could try such and derive a good vibe about eventual purchase. It is flawed but less so than any sighted tests could be. I applaud him for that and believe he should be congratulated. At the very least we can propose a better protocol or how he can derive better results from such attempts..
Peace!

Juhazi · Nov 11, 2019

Making eq and using subs makes listening tests almost useless for most other people! Vast majority of people don't have any eq and perhaps some random poor quality subs.

Listening tests should be done in same room, same placement, standard spl level without eq, and several listeners giving opinions! A single and stereo pair should be auditioned and evaluated separately. Only this way a reader can make her own opinion by comparing results and making a guess how that applies to her own room and taste of sound etc.

I would like to see quantitative subjective ratings of certain aspects of sound, not just AB which is better.

Sean Olive's blog tells some guidelines, go to year 2009. http://seanolive.blogspot.com/

Snarfie · Nov 11, 2019

Juhazi said:
Making eq and using subs makes listening tests almost useless for most people! Vast majority of people don't have any eq and perhaps some random poor quality subs.

If you mean with eq roomcorrection this software is for free to download for all like Mathaudio Room EQ or REW. Both could be used with some add-ons in Foobar2000. You only need a propper measuring mic.

echopraxia · Nov 11, 2019

FrantzM said:
Hi

Much as been debated and criticized about @echopraxia attempts at speakers blind test. I took it from the start it wasn't going to perfect and it could not be deemed "scientific". It is however a valid and approachable attempt. IOW other enthusiasts could try such and derive a good vibe about eventual purchase. It is flawed but less so than any sighted tests could be. I applaud him for that and believe he should be congratulated. At the very least we can propose a better protocol or how he can derive better results from such attempts..
Peace!

I don't think of it as very much criticism, or at least most of it has been constructive enough that I don't think of it or remember it as such

In any case, that's what I hope to get from this thread: Constructive criticism. This way, the next time I do a blind test, the only real criticism left to its general applicability is that the sample size is too small -- which is something I fully expect to be the case since I am not a billion dollar company

That said, I don't think it's really possible to please everyone, so I'll just aim to optimize for the majority as best I can. For example if I use a subwoofer someone will complain my bass isn't tuned correctly. Or if I do not use a subwoofer, most audiophiles I know (including me) would complain that all it's comparing is bass extension, not mid/treble quality.

Juhazi · Nov 11, 2019

Snarfie said:
If you mean with eq roomcorrection this software is for free to download for all like Mathaudio Room EQ or REW. Both could be used with some add-ons in Foobar2000. You only need a propper measuring mic.

My sister, wife or neighbours wouldn't understand anything of what you wrote!

echopraxia · Nov 11, 2019

Juhazi said:
Making eq and using subs makes listening tests almost useless for most people! Vast majority of people don't have any eq and perhaps some random poor quality subs.

Interesting... most audiophiles I know say quite the opposite -- at least for the price range above $1k (I'd understand if we're talking about speakers costing a few hundred dollars).

But still, I'm open to other thoughts on this, so let me ask you a specific question to illustrate:

My next blind test planned will be comparing the Ascend Sierra 2EX vs Neumann KH120A. To my ears subjectively, these are some of the best sounding speakers in this price range I have ever heard. But let's look at their bass extension:

The Neumann KH120A's response is at -15db at 40hz.
The Ascend Sierra 2EX's response is at -3db at 40hz.

This is a huge difference. I almost guarantee that because of this, the Ascend will dominate any blind test against the Neumann where a subwoofer isn't involved. I can say this because on some songs without a subwoofer, the Ascends will play notes that are practically inaudible on the Neumanns! Without a subwoofer, the Neumann is at a huge disadvantage, unless all test songs were to be limited to those that don't have strong and deep bass lines.

Here's my question to you: If I did such a blind test, and the Ascend's won by a huge margin, would you accept this victory?

Or would you (or others) question it, with the very valid critique that the results likely reflect little more than the differences in bass extension? I know I would question this, because I've experienced many times the subjective difference bass extension makes.

Yet to my ears, the Neumann's have some of the best midbass, mids, and treble of any speaker I've ever heard, right up there with my Ascends. They very well could win in a blind test (I don't know until I try), but only if I remove this huge variation factor of bass extension.

What do you think?

Snarfie · Nov 11, 2019

Juhazi said:
My sister, wife or neighbours wouldn't understand anything of what you wrote!

You are probably correct but this is a forum for people who are into audio with an open mind an want to learn or share knoledge.

Snarfie · Nov 11, 2019

For the discusion does it make sence before doing a listning test that all speakers under go a roomcorrection measurment en could be adjusted/corrected before the final comparison will be done. Would that creat a level playing field to start with.? I did that now for some months with 4 speakers systrems. Basicly my impression was that they sounded a bit similar after correction. Bigest difference was with staging there i found a larger difference in peformance between them. There is a change that i could compare martin logans ESl's versus the Vandersteens model 1. Will do the same measurments/corrections before comparing.

Juhazi · Nov 11, 2019

Echo and Snarfie, of course you can run your test any way you like. Just carefully describe your methods! It is my problem, that I want to see both standardized measurements and listening impression of speakers, without any eq or subs.

Using eq and subs will just eliminate perhaps 90% of audible differencies between loudspeakers, IMHO.

Snarfie · Nov 11, 2019

Juhazi said:
Echo and Snarfie, of course you can run your test any way you like. Just carefully describe your methods! It is my problem, that I want to see both standardized measurements and listening impression of speakers, without any eq or subs.

Using eq and subs will just eliminate perhaps 90% of audible differences between loudspeakers, IMHO.

As you could read in my previous (basicly) questions i'm not quite knolegeble as most here on the forum has way more knoledge and (proffesional) experience. I'm more of a try an error person ha ha trying to comprehend what i can find on ASR. Basicly i don't care if using roomcorrection that for instance transients will be lost basicly the result for my ears is te most important thing. I'm 60 now an suddenly when i was using Room correction Abbey Road from the Beatles came alive. For the first time i heard the bass of paul Mcchartney in balance with the other instruments voices. Basicly i rediscoverd my old music collection again. So IMO Eq/roomcorrection is a blessing taking in acount my horrible room acoustics. I also was planning to use subwoofers because of the lack of bass because of my horrible room acoustic after eq/roomcorrection there was no need to do so. I don't say that we all will have the same experience because all of our rooms sound probably different. But at least you can give it a try an listen to the results.

BDWoody · Nov 11, 2019

echopraxia said:
Here's my question to you: If I did such a blind test, and the Ascend's won by a huge margin, would you accept this victory?

Or would you (or others) question it, with the very valid critique that the results likely reflect little more than the differences in bass extension? I know I would question this, because I've experienced many times the subjective difference bass extension makes.

It depends on the purpose of the test... For me, I would want to compare them set up as I would be listening to them. If you would definitely not use a sub with either, then it's a reasonable thing to listen and see how they compare, and see how much you would miss that bass extension.

If you would use a sub with one and not the other, test it that way...etc. Remember this is your test to answer questions you have in your space. As much as that's relevant to me, I would certainly be interested in your impressions.

I agree, though...that lack of bass extension is not a subtle difference. If all you happen to listen to is flute solo's...would you miss it? Extreme, obviously, but your music is what matters.

By the way, I appreciate you going through this effort. Make sure it's to answer your questions first of all...not give that up for a 'better' test that doesn't help you at the end of the day.
Cheers.

Blumlein 88 · Nov 12, 2019

Hope I haven't been perceived as criticizing @echopraxia . Never my intent. What he is doing is educational experience. And fun if you have the right mindset and it seems he does.

How about this for a test. Pick sighted or otherwise which you think is better. Neumann or Sierra. Then EQ the loser to match as closely as possible the winner. See if you can get it close. Again an education what if done in the physical world.

echopraxia · Nov 12, 2019

Nope don’t worry, I don’t know of any criticizing in this thread

Regarding EQ; both the Sierra 2EX and the Neumann KH120 are very flat. I’d imagine most of the difference will come from off-axis response, not on-axis.

Regarding what I would do for this blind test: Honestly the only thing that really motivates me to do a methodical test is if the results will be useful more generally than to just me. I want to “donate” some of my time and speaker purchase budget to help out the (objectivist) audiophile community, by providing useful information far more reliable than subjective reviews.

So ultimately that is why I’m asking for ideas here — mostly to understand what kind of setup other people will most find useful from such a test.

I don’t plan on doing any sort of EQ, because that doesn‘t seem realistic. But I also thought it was realistic to expect that everyone would be using subwoofers, at least at some point after buying their bookshelf speakers. But it seems that’s also not universally considered true.

Perhaps then another important part of the methodology would be to gather information on how most people will be using such speakers first, to know how to test them. Unfortunately that doesn’t make my job any easier

echopraxia · Jun 23, 2020

From what I’m seeing in the responses to my last two blind tests, I’m starting to think that the following components will be essential for future blind tests:

All speakers being compared should be equalized to virtually the same in-room response as much as possible — including bass extension.
- Bass extension could be normalized by removing bass from both?
- Or, bass extension could be normalized by integrating a sub, but this may be much more difficult to achieve and may leave open questions like sub localizeability and phase integration questions.
A standardized list of test tracks should be used.
- We can have a community effort to add a few extra suggestions beyond the lists published by Harman research. This way those who don’t like Harman’s track list can choose to disregard it in favor of other songs tested.
- However I’m sure it would be impossible to test all songs suggested by everyone, so in practice achieving an uncontroversial extended song list may be impossible.

Any other thoughts?

R Swerdlow · Jun 23, 2020

echopraxia said:
Because I plan to be doing a few more blind listening comparisons, I wanted to ask here for advise on methodology to maximize the generalize-ability of such results. You can read the results of my first blind test (where I used the methodology described below) here where I recently compared KEF R3 vs Ascend Sierra 2EX.

However, I would greatly appreciate as much feedback and suggestions as possible to improving the methodology here! I want to make sure as much as possible is covered. And, if this thread goes well enough, maybe we can take the combined ideas here and make it into a sort of loose informal 'standard' for in-home blind test speaker comparisons.

For any blind listening test, the results are only as good as the control tests, the negative and positive controls.

The Negative Control asks what fraction of listeners report, “I heard no difference” when there really was no difference in a test of speaker A vs. speaker A (or B vs. B). To state it conversely, the negative control measures how many false positive responses listeners made. Don’t assume there will be no false positives when you can measure how many there are. A blind listening test is a test of human perception. It isn’t an unfair trick to find out how many false positive answers are made. It’s very easy to do these negative controls, as no technical efforts are needed. But it does add to the number of blind listening trials listeners must sit through.

Each listener must be exposed to several such tests to determine what percent of his answers are false positives. For example, if a listener claimed to hear differences between A and B on 75% of his trials, but his False Positive rate was 45%, the real response rate would be 75% – 45% = 30%. (These are simply spitballed numbers, not the results of real tests.)

Similarly, Positive Control tests are needed. These are tests designed to estimate how many negative responses are false negatives. A Positive Control would ask what fraction of listeners report, “I did hear a difference” where one really existed, or how many false negatives were there.

Good positive control tests are more difficult to create than negative controls. Imagine a digital copy of well-recorded music. On top of it, add varying amounts of digital white or pink noise, so the listener would hear the music with no added noise (0%) as well a series of increasing noise (such as 2.5%, 5%, 7.5%, and 10%). What fraction of the listeners hear added noise at each of those levels? Without some testing, I’m not sure what would make useful positive controls, but I hope this illustrates what I mean.

The Positive Control would provide answers to what subtle differences listeners actually could hear. It would also be an internal measure of just how good the entire measurement system is. This would include electronic gear, speakers, room acoustics, and variables among different listeners and variable responses from a single listener over time due to burn-out, fatigue, inattention, etc. If people repeat this positive control test at different times using different gear or listeners, these positive controls could work as internal standards allowing more meaningful comparisons among these different tests.

If suitable positive control tests are found, listening results can be further judged. How many listeners hear a difference for the positive control and how many fail would be an important measure of the effectiveness of the listening test apparatus and of variability among individual listeners. The fraction of listeners who meet these conditions, might be considered as a measure of validity for the whole listening test. Ideally, all listeners would hear a difference in the positive control and none would hear a difference in the A–A or B–B test. However, it is possible to deviate from the ideal and still make useful conclusions, as long as suitable controls are included for each listener to determine the frequency of false negative and false positive responses.

Wes · Jun 23, 2020

Some have claimed that speakers not being driven must be removed from the room or they will affect the sound. Maybe they assume the extra speaker(s) will act as a passive radiator, or will absorb sound waves.

I had a dealer do this to 2 different Maggies when demoing them.

No idea if it matters, or if it matters only for cone drivers...

R Swerdlow · Jun 23, 2020

I have a few other general or minor comments that I didn't mention in my previous post.

As I discussed possible positive controls, I could have made what I meant more clear. Listeners should hear the same music passage over the same speaker as a series of A vs. B listening tests, where A is music with 0% added noise and B is music with 2.5%, 5%, 7.5% or 10% added noise. At what level of added noise do listeners hear a difference in A vs. B?

Making clear interpretations of these tests can be difficult. Keep things as simple as possible. Compare single speakers A vs. B, not stereo pairs. Do not confuse matters with adding a subwoofer or external equalization. Just compare two speakers as they were designed and made. Keep the speakers as far as reasonably possible from nearby walls, say 2 feet.

But I wouldn't worry if other inactive speakers are in the room. That's probably going too far. So is worrying about weather variations such as atmospheric pressure or humidity

. The variations due to different listeners' perceptions will probably be much greater.

Speaker Blind Listening Comparisons: Methodology Discussion

Grand Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Chief Cat Herder

Grand Contributor

Major Contributor

Major Contributor

Member

Major Contributor

Member

Similar threads