• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Binaural blind comparison test of 4 loudspeakers

Which loudspeaker sound do you personally prefer?

  • Loudspeaker A

    Votes: 7 13.5%
  • Loudspeaker B

    Votes: 42 80.8%
  • Loudspeaker C

    Votes: 0 0.0%
  • Loudspeaker D

    Votes: 7 13.5%

  • Total voters
    52
  • Poll closed .

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
That's what I thought coming into this thread, but it appears that somehow we not only have an ability to "hear through the room", but also "hear through the recording equipment and room".

The data is simply too clear. 80% picked the speaker that measured the best(the Revel), 0% of people picked the speaker that measured the worst(Klipsch), and a few people picked the speakers that measured decent, but not great(B&W, Quad).

Seems really unlikely that it's simply a coincidence. But, hopefully we can start gathering more data here soon. I'd like to start repeating this test with more speakers to see if this was just or coincidence or not.
When evaluating research, we cannot just look at the results and stop there. The methodology also has to make sense. At a most basic level, the possibility of confounders must be controlled for - otherwise you can't be sure that the results are actually demonstrating what you think they're demonstrating. In this case, while the results match what everybody was hoping, it doesn't, in turn, follow that the methodology was necessarily valid to begin with. In fact, the opposite burden is true - that is, if the methodology is suspect, you cannot necessarily accept the results, no matter how much you want to believe that the experiment "proved" your hypothesis.

In this case, the methodology, which essentially lets a listener "evaluate" the frequency response of 4 loudspeakers, but allows the listener to choose the playback speaker/headphones, each having its own frequency response deviations, is horribly flawed. I get that the majority of intelligent and thoughtful members of ASR do not evaluate experimental research in their "day jobs" but there are people that do.

There are just so many possible confounders I don't even know where to start.

The potential for an interaction between the transfer curve of the original loudspeaker and the playback transducer is large. For instance if it turns out that, for the sake of argument, a single - 3db BBC dip turns out to be a preferable and desired characteristic, and the playback speaker has a - 3db BBC dip, then a source loudspeaker without a BBC dip will sound preferable. Whereas, a source loudspeaker with a - 3db BBC dip will sound like it had a big midrange suckout when combined with that same playback speaker that also has a - 3db bbc dip (= - 6db exaggerated bbc dip). This is just one example.

Secondly, we know nothing about the placement of speakers in that room. If it was a magazine conducted test (read: not scientific in rigor or protocol), and they were testing a long list of speakers, you can be sure that little attention was paid to properly placing the speaker in that room. The differences heard in perceived playback quality could easily reflect the differences in room placement and not the speakers themselves.

Thirdly we know that the room itself can affect perception of loudspeaker quality and that different rooms can affect the relative ranking of a loudspeaker. This is Harmans. own research btw.

And I could go on and on. When considering whether these are small vs large methodological flaws, my opinion is that they are large.

I get that the need to validate one's own beliefs is strong. While it is certainly possible that speaker B would be preferred under actual live blind listening tests, the experiment here doesn't necessarily demonstrate that due to its deal-breaking poor methodology.
 
Last edited:

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,458
Likes
2,446
Location
Sweden
I don't think they were specific about that in the methodology.

I have not access to this one so I cannot look. Would be good if someone can look at that specifically. It is of outmost importance that headphone evaluations of "speakers playing in a room" has a correct binaural compensation curve in peer reviewed research of that type. Otherwise timbre will be severely coloured.
 

TheHighContemplator

Active Member
Joined
Oct 14, 2020
Messages
135
Likes
250
Location
Canada
I'd be happy to man-splain it to you. First, go to www.google.com. Then in that search bar, it's the rectangle where you enter text, go ahead and type in "mansplaining." Then hit enter on your keyboard or click the search icon using your mouse. Go ahead and click one of the many articles from reputable sources that appear on the first few pages.
And when women exhibit the exact same behaviour, which they do, what is it called? Or do we not have a name for it because feminism is the most hypocritical ideology that has ever existed?

My point was there is no such thing as "mansplaining". It is a term generated to increase the divide between men and women all influenced by the parasitic doctrines of the Frankfurt School. Maybe you need to do a lot more research on your religion of feminism.

By the way, "research" stemming from gender studies and its ilk represents some of the least credible "academic" work available, because the vast majority of it fails to be reproduced, which is why they generally have low citations. That's why a few professors (Peter Boghossian, James Lindsay, and Helen Pluckrose) created preposterously fake papers that confirmed the biases of numerous journals who published them.


I won't say anymore, as this is not the place for this discussion, nor is it a place to throw around utterly brain-dead terms such as "mansplaining". If you want to continue this, private chat would be the best place.

Peace.
 

Triliza

Senior Member
Forum Donor
Joined
Jun 23, 2021
Messages
481
Likes
577
Location
Europe
To the defense of the OP, he never said that this is a scientific experiment and therefore the result shouldn't be interpreted as such. Many participants found interesting the outcome nevertheless, and drew some conclusions about it in the given context.

It would be great if any following discussions are about how to make any following attempt similar to this more controlled. It's a given, as said above, that the fact alone that we use different headphones to listen to any such recordings complicate things, but hey, at least they are fun, as fun as audio related things can be anyway.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
I have not access to this one so I cannot look. Would be good if someone can look at that specifically. It is of outmost importance that headphone evaluations of "speakers playing in a room" has a correct binaural compensation curve in peer reviewed research of that type. Otherwise timbre will be severely coloured.
The researchers (Olive et al) used a mannikin-based binaural recording apparatus (called a KEMAR) outfitted with couplers to simulate the effect of the eardrum+canal to capture the loudspeaker recordings. Playback was via Etymotic ER-1M earphones, which deliver the recorded sound in close proximity to the eardrum. The closed-loop frequency response of this binaural recording apparatus + playback earphones was extremely flat from 20Hz to 10kHz, and is depicted below.

1633719626208.png

Hopefully this provides some comparative context for what is necessary to accurately reproduce the sound of loudspeakers in a room via headphones for the purposes of evaluating preferences.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,702
Similar to @PierreV, I'd really like to continue this with more comparisons, and hopefully try to control the other variables even better. I can purchase the binaural mics recommended earlier. Is that a good choice, or are there real gains to be had from spending a bit more?

Personally, the question I want to answer is, "How closely do recorded binaural blind comparisons correlate with in person blind comparisons?". Opportunities I see for future tests to help answer this question are:

1. Correlate them with actual in person blind tests. Last year I bought the AVA ABX Comparator to make the blind comparisons I like to do from time to time easier to do(they are a pain in the ass to setup and run :D). I could very easily make binaural recordings of the same tracks the listeners heard for ASR to then listen to. I could then posts both sets of results data and we could find the correlations, if any exist at all :p.

2. Use more tracks than just 2, with a way to vote differently per track. This might require setting up a separate poll outside the forum, but shouldn't be too hard to do. Could even make a site myself for tracking the data and providing a better UI. With only 2 tracks, it's quite possible that these just so happened to be mixed/mastered on very neutral speakers, hence why the preferences correlated so well with Harman research data.

3. Equalize the bass extension with either subs or EQ. I suppose this depends on what question you're trying to answer. If one wants to know the best speaker for a 2.0 configuration, then this is likely an unwanted alteration. Personally though, I'm more interested in bass managed comparisons. Beyond that, I can use the measured extension to decide how much "bass managed performance" I'm willing to sacrifice for the superior extension. I also just think it's more accurate, as it depends much less on the room, and it's also the configuration we have the least data for. In the two studies that Olive did comparing measured vs subjective performance, the correlation factor was much better when bass and speaker type(monopole *except Bose?) were equalized(r = .99 vs r=.86).

4. Stick to speakers we have NFS/anechoic spinorama measurements for. Not a dealbreaker, but I do think the best chance for us to learn would be by comparing the speakers for which we have the most detailed measurements. Luckily, we have pretty good measurements for these 4 speakers, so it's not really a criticism of this example. Sean Olive has talked about how the Olive score was mostly abandoned after it was made, and how if he were doing it again today, he could do it even better. I'm sure there are lots of ideas as to how one might improve it(frequency weighted penalties, positive vs minus weighted penalties, lf directivity control, etc.) It would be great if we could start to work towards a slightly more correlated formula on our own :).


We know that our brains can adapt to the flaws in our surroundings, such that we "hear through the room". Is it possible that our brains can also adapt to the flaws of the recording equipment? Prior to this test I would have said "certainly not"; now, I'm not so sure.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,702
When evaluating research, we cannot just look at the results and stop there. The methodology also has to make sense. At a most basic level, the possibility of confounders must be controlled for - otherwise you can't be sure that the results are actually demonstrating what you think they're demonstrating. In this case, while the results match what everybody was hoping, it doesn't, in turn, follow that the methodology was necessarily valid to begin with. In fact, the opposite burden is true - that is, if the methodology is suspect, you cannot necessarily accept the results, no matter how much you want to believe that the experiment "proved" your hypothesis.
I think you bring up some really great criticisms. Speaking personally, there is no doubt that there is a large element of results bias going on. Had there been little (or even negative) correlation with the established research(which is honestly what I expected), I would be taking these results much less seriously. I'd brush them off as not meaningful at all, which is how I felt before this.

I definitely wouldn't say that I was "hoping" for these results. If anything, it might be closer to the opposite. I've spent a good deal of time in youtube comments telling people that these sorts of comparisons are useless for judging anything other than relative differences. Now I'm wondering if I was in the wrong :D. Mostly, there was no "hope" either way, but rather just an "expectation". I expected there would be little to no correlation at all. My thinking was that the speaker who's flaws were best obscured by the recording equipment's flaws would win. Of course, this is still possible, which is why I'm interested in gathering more data now. The results of this test really did surprise me.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,458
Likes
2,446
Location
Sweden
I've been thinking a bit how these kind of recordings could be done, and I am sure there is a lot of research present. Practically, and without buying expensive dummy heads, there is a choice to measure just outside the ear/ear canal which I just did or with binaural microphones, e.g. Soundman or Roland, recording a bit closer to the ear canal entrance. What we would need is a transfer curve for these type of recordings vs. our most common headphones (over ear), and an EQ method so we all could adjust headphones to that curve.

We cannot replicate due to variations of our ears and ear canals, but perhaps come closer to the real timbre of the recording event.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,458
Likes
2,446
Location
Sweden
The researchers (Olive et al) used a mannikin-based binaural recording apparatus (called a KEMAR) outfitted with couplers to simulate the effect of the eardrum+canal to capture the loudspeaker recordings. Playback was via Etymotic ER-1M earphones, which deliver the recorded sound in close proximity to the eardrum. The closed-loop frequency response of this binaural recording apparatus + playback earphones was extremely flat from 20Hz to 10kHz, and is depicted below.

View attachment 157942
Hopefully this provides some comparative context for what is necessary to accurately reproduce the sound of loudspeakers in a room via headphones for the purposes of evaluating preferences.
Thanks for this. Seems that they covered the issue well, working close to the eardrum.
 

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,938
Location
Oslo, Norway
Very interesting discussion. I would also like to see more such tests. I also tend to agree that the consistency in the results - in spite of the large variation in transducers used and individual hrtf - does indicate that we can listen "through" such things, in a way. This probably aligns with research on auditory adaptation.

I'm otherwise proud of the fact that I've so far resisted my inner mansplainer who keeps pushing me to correct misinformation that has been uttered in the thread on issues which have nothing to do with audio. Viva la resistencia
 

LTig

Master Contributor
Forum Donor
Joined
Feb 27, 2019
Messages
5,814
Likes
9,526
Location
Europe
Perhaps you're not attuned to the degree of biometric variation that exists in human beings. Consider hat sizes - they have small, medium, and large. Consider, T-shirt sizes. They have XS up to 4XXL. And that's not even taking into consideration variation in external ear (pinna) shape and size. Not sure what constitutes an "average" person. How much does an "average person" weigh, pray tell?
I know about those variations but in many cases they are not distributed equally but rather according to the Gauss bell curve. The top of this curve represents the average, and many more people fall into the range around this average than to the extremes.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,458
Likes
2,446
Location
Sweden
I made some more fine-tuning of the EQ of my three songs, based on the room response of my speakers. Below is the room response at the moment, single sine sweep at LP. I am using EQ of my subwoofers (< 80 Hz) using Audyssey to bring down a 47 Hz room resonance, but the rest of the curve is just speakers. As seen there is a boost in the bass below and sloping down from 1-2 kHz.

Audyssey%20av%20LR.png

So I used "inverted EQ" of that in Audacity:
Skärmavbild 2021-10-09 kl. 23.17.07.png


And the resulting EQed file sounds a bit more balanced and more like what I hear in real life with respect to timbre.

 
Last edited:

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
I decided to finally listen to the 2 tracks recorded from the 4 speakers. Based on some of the responses here, I was beginning to doubt my original prediction that there was absolutely no way anyone could accurately reproduce the sound of speakers using binaural recordings and playing back on uncalibrated headphones. I used my HD800sdr headphones and an RME ADI-2 PRO FS.

I am pleased to confirm that all 4 speakers sounded absolutely horrible on both tracks. As in not even listenable. None of them. If this is how the Revel, B&W, and Klipschhorn actually sound in real life, I wouldn't even use them for the pc speaker that does the "beep" when you turn on your pc. Now to be fair, I don't know how the original tracks would have sounded when not re-recorded using mics, but unless the recording engineer was using a tape recorder like the police detectives use when interrogating suspects in the movies, there is no way anything the originals sounded that bad.

This was essentially an exercise in which recorded speaker sounded horrible vs which recorded speaker sounded horrible but slightly less so.

If there was anyone who also listened to those tracks and didn't buy into the kool-aid in this thread but was overwhelmed by the group think and didn't say anything, let me be first to say that you heard what I heard. Unbelievable.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,702
I decided to finally listen to the 2 tracks recorded from the 4 speakers. Based on some of the responses here, I was beginning to doubt my original prediction that there was absolutely no way anyone could accurately reproduce the sound of speakers using binaural recordings and playing back on uncalibrated headphones. I used my HD800sdr headphones and an RME ADI-2 PRO FS.

I am pleased to confirm that all 4 speakers sounded absolutely horrible on both tracks. As in not even listenable. None of them. If this is how the Revel, B&W, and Klipschhorn actually sound in real life, I wouldn't even use them for the pc speaker that does the "beep" when you turn on your pc. Now to be fair, I don't know how the original tracks would have sounded when not re-recorded using mics, but unless the recording engineer was using a tape recorder like the police detectives use when interrogating suspects in the movies, there no way anything can sound that bad.

If there was anyone who also listened to those tracks and didn't buy into the kool-aid in this thread but was overwhelmed by the group think, let me be first to say that you heard what I heard. Unbelievable.
I thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves. With that in mind, I then just judged them relative to each other, ignoring how terrible they sound compared to in person loudspeakers. While I enjoyed the Revel here the most(or hated it the least), judging it against what I hear at home/shows/dealers, it would be the worst speaker I've ever heard(by far). It was more a case of "which sounds the least terrible?" :p.

I don't think group think was a huge issue here since we voted before we started talking about what we thought. Also, we could only see the score after we voted, and couldn't change our vote. Under those conditions, the fact that most of us said the same speaker sucked the least, and it happened to correlate with measurements, is pretty cool. Very unexpected(to me).
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
I thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves.

Let me translate that for you: "the methodology was highly flawed."
 

restorer-john

Grand Contributor
Joined
Mar 1, 2018
Messages
12,674
Likes
38,770
Location
Gold Coast, Queensland, Australia
Let me translate that for you: "the methodology was highly flawed."

When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments. After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

I took it for what it was, four recordings that all sounded pretty ordinary to me- on headphones (as it was supposedly 'binaural'. Why would I bother with speakers?) I picked the least bad sounding speaker. 41 other people picked the same one. The other 14 people should hand back their Audiophile Card and sell their gear to invest in hearing aids. ;)
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,696
Likes
37,431
When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments. After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

I took it for what it was, four recordings that all sounded pretty ordinary to me- on headphones (as it was supposedly 'binaural'. Why would I bother with speakers?) I picked the least bad sounding speaker. 41 other people picked the same one. The other 14 people should hand back their Audiophile Card and sell their gear to invest in hearing aids. ;)
You notice I've not maligned the methodology. Even though almost 50 people chose incorrectly. Panel speakers speak to a connesuir,s taste in sound. So the resulting totals are not surprising. :)
 
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,871
Likes
16,826
thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves.
The recording equipment and people doing the recordings were not amateur and did also some direct comparisons back then, so I would say rather the very high typical "highend" listening distance where direct to reflected sound ratio is too low and also the used music recordings which are not too great for such tests either are the 2 main reasons.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments.

Which is exactly the appropriate response. These aren't "fun little experiments." They are harmful because due to their methodology, it's impossible to know if the results are valid or if they are false findings. But since very few people here appear to have an actual understanding of how to interpret experiments, people are misled into believing the results can be used to draw conclusions.

After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

The problem is, the op didn't actually make the recordings himself, nor do we know the conditions of the recording room. The op also did not provide proper attribution for the source of these recordings in his original post. Some people, including yourself, appear to believe that the op was the one who made the recordings himself.
 
Top Bottom