Binaural blind comparison test of 4 loudspeakers

preload · Oct 8, 2021

richard12511 said:
That's what I thought coming into this thread, but it appears that somehow we not only have an ability to "hear through the room", but also "hear through the recording equipment and room".

The data is simply too clear. 80% picked the speaker that measured the best(the Revel), 0% of people picked the speaker that measured the worst(Klipsch), and a few people picked the speakers that measured decent, but not great(B&W, Quad).

Seems really unlikely that it's simply a coincidence. But, hopefully we can start gathering more data here soon. I'd like to start repeating this test with more speakers to see if this was just or coincidence or not.

When evaluating research, we cannot just look at the results and stop there. The methodology also has to make sense. At a most basic level, the possibility of confounders must be controlled for - otherwise you can't be sure that the results are actually demonstrating what you think they're demonstrating. In this case, while the results match what everybody was hoping, it doesn't, in turn, follow that the methodology was necessarily valid to begin with. In fact, the opposite burden is true - that is, if the methodology is suspect, you cannot necessarily accept the results, no matter how much you want to believe that the experiment "proved" your hypothesis.

In this case, the methodology, which essentially lets a listener "evaluate" the frequency response of 4 loudspeakers, but allows the listener to choose the playback speaker/headphones, each having its own frequency response deviations, is horribly flawed. I get that the majority of intelligent and thoughtful members of ASR do not evaluate experimental research in their "day jobs" but there are people that do.

There are just so many possible confounders I don't even know where to start.

The potential for an interaction between the transfer curve of the original loudspeaker and the playback transducer is large. For instance if it turns out that, for the sake of argument, a single - 3db BBC dip turns out to be a preferable and desired characteristic, and the playback speaker has a - 3db BBC dip, then a source loudspeaker without a BBC dip will sound preferable. Whereas, a source loudspeaker with a - 3db BBC dip will sound like it had a big midrange suckout when combined with that same playback speaker that also has a - 3db bbc dip (= - 6db exaggerated bbc dip). This is just one example.

Secondly, we know nothing about the placement of speakers in that room. If it was a magazine conducted test (read: not scientific in rigor or protocol), and they were testing a long list of speakers, you can be sure that little attention was paid to properly placing the speaker in that room. The differences heard in perceived playback quality could easily reflect the differences in room placement and not the speakers themselves.

Thirdly we know that the room itself can affect perception of loudspeaker quality and that different rooms can affect the relative ranking of a loudspeaker. This is Harmans. own research btw.

And I could go on and on. When considering whether these are small vs large methodological flaws, my opinion is that they are large.

I get that the need to validate one's own beliefs is strong. While it is certainly possible that speaker B would be preferred under actual live blind listening tests, the experiment here doesn't necessarily demonstrate that due to its deal-breaking poor methodology.

preload · Oct 8, 2021

Blumlein 88 said:
BTW, is this a sly way of telling us you are a woman? I didn't know, nor should it in general matter.

View attachment 157798

That algorithm looks about right.

My understanding is that mansplaining can also occur between a man and another man (although not the common usage).

Thomas_A · Oct 8, 2021

preload said:
I don't think they were specific about that in the methodology.

I have not access to this one so I cannot look. Would be good if someone can look at that specifically. It is of outmost importance that headphone evaluations of "speakers playing in a room" has a correct binaural compensation curve in peer reviewed research of that type. Otherwise timbre will be severely coloured.

TheHighContemplator · Oct 8, 2021

preload said:
I'd be happy to man-splain it to you. First, go to www.google.com. Then in that search bar, it's the rectangle where you enter text, go ahead and type in "mansplaining." Then hit enter on your keyboard or click the search icon using your mouse. Go ahead and click one of the many articles from reputable sources that appear on the first few pages.

And when women exhibit the exact same behaviour, which they do, what is it called? Or do we not have a name for it because feminism is the most hypocritical ideology that has ever existed?

My point was there is no such thing as "mansplaining". It is a term generated to increase the divide between men and women all influenced by the parasitic doctrines of the Frankfurt School. Maybe you need to do a lot more research on your religion of feminism.

By the way, "research" stemming from gender studies and its ilk represents some of the least credible "academic" work available, because the vast majority of it fails to be reproduced, which is why they generally have low citations. That's why a few professors (Peter Boghossian, James Lindsay, and Helen Pluckrose) created preposterously fake papers that confirmed the biases of numerous journals who published them.

I won't say anymore, as this is not the place for this discussion, nor is it a place to throw around utterly brain-dead terms such as "mansplaining". If you want to continue this, private chat would be the best place.

Peace.

Triliza · Oct 8, 2021

To the defense of the OP, he never said that this is a scientific experiment and therefore the result shouldn't be interpreted as such. Many participants found interesting the outcome nevertheless, and drew some conclusions about it in the given context.

It would be great if any following discussions are about how to make any following attempt similar to this more controlled. It's a given, as said above, that the fact alone that we use different headphones to listen to any such recordings complicate things, but hey, at least they are fun, as fun as audio related things can be anyway.

preload · Oct 8, 2021

Thomas_A said:
I have not access to this one so I cannot look. Would be good if someone can look at that specifically. It is of outmost importance that headphone evaluations of "speakers playing in a room" has a correct binaural compensation curve in peer reviewed research of that type. Otherwise timbre will be severely coloured.

The researchers (Olive et al) used a mannikin-based binaural recording apparatus (called a KEMAR) outfitted with couplers to simulate the effect of the eardrum+canal to capture the loudspeaker recordings. Playback was via Etymotic ER-1M earphones, which deliver the recorded sound in close proximity to the eardrum. The closed-loop frequency response of this binaural recording apparatus + playback earphones was extremely flat from 20Hz to 10kHz, and is depicted below.

Hopefully this provides some comparative context for what is necessary to accurately reproduce the sound of loudspeakers in a room via headphones for the purposes of evaluating preferences.

richard12511 · Oct 8, 2021

Similar to @PierreV, I'd really like to continue this with more comparisons, and hopefully try to control the other variables even better. I can purchase the binaural mics recommended earlier. Is that a good choice, or are there real gains to be had from spending a bit more?

Personally, the question I want to answer is, "How closely do recorded binaural blind comparisons correlate with in person blind comparisons?". Opportunities I see for future tests to help answer this question are:

1. Correlate them with actual in person blind tests. Last year I bought the AVA ABX Comparator to make the blind comparisons I like to do from time to time easier to do(they are a pain in the ass to setup and run

). I could very easily make binaural recordings of the same tracks the listeners heard for ASR to then listen to. I could then posts both sets of results data and we could find the correlations, if any exist at all

.

2. Use more tracks than just 2, with a way to vote differently per track. This might require setting up a separate poll outside the forum, but shouldn't be too hard to do. Could even make a site myself for tracking the data and providing a better UI. With only 2 tracks, it's quite possible that these just so happened to be mixed/mastered on very neutral speakers, hence why the preferences correlated so well with Harman research data.

3. Equalize the bass extension with either subs or EQ. I suppose this depends on what question you're trying to answer. If one wants to know the best speaker for a 2.0 configuration, then this is likely an unwanted alteration. Personally though, I'm more interested in bass managed comparisons. Beyond that, I can use the measured extension to decide how much "bass managed performance" I'm willing to sacrifice for the superior extension. I also just think it's more accurate, as it depends much less on the room, and it's also the configuration we have the least data for. In the two studies that Olive did comparing measured vs subjective performance, the correlation factor was much better when bass and speaker type(monopole *except Bose?) were equalized(r = .99 vs r=.86).

4. Stick to speakers we have NFS/anechoic spinorama measurements for. Not a dealbreaker, but I do think the best chance for us to learn would be by comparing the speakers for which we have the most detailed measurements. Luckily, we have pretty good measurements for these 4 speakers, so it's not really a criticism of this example. Sean Olive has talked about how the Olive score was mostly abandoned after it was made, and how if he were doing it again today, he could do it even better. I'm sure there are lots of ideas as to how one might improve it(frequency weighted penalties, positive vs minus weighted penalties, lf directivity control, etc.) It would be great if we could start to work towards a slightly more correlated formula on our own

.

We know that our brains can adapt to the flaws in our surroundings, such that we "hear through the room". Is it possible that our brains can also adapt to the flaws of the recording equipment? Prior to this test I would have said "certainly not"; now, I'm not so sure.

richard12511 · Oct 8, 2021

preload said:
When evaluating research, we cannot just look at the results and stop there. The methodology also has to make sense. At a most basic level, the possibility of confounders must be controlled for - otherwise you can't be sure that the results are actually demonstrating what you think they're demonstrating. In this case, while the results match what everybody was hoping, it doesn't, in turn, follow that the methodology was necessarily valid to begin with. In fact, the opposite burden is true - that is, if the methodology is suspect, you cannot necessarily accept the results, no matter how much you want to believe that the experiment "proved" your hypothesis.

I think you bring up some really great criticisms. Speaking personally, there is no doubt that there is a large element of results bias going on. Had there been little (or even negative) correlation with the established research(which is honestly what I expected), I would be taking these results much less seriously. I'd brush them off as not meaningful at all, which is how I felt before this.

I definitely wouldn't say that I was "hoping" for these results. If anything, it might be closer to the opposite. I've spent a good deal of time in youtube comments telling people that these sorts of comparisons are useless for judging anything other than relative differences. Now I'm wondering if I was in the wrong

. Mostly, there was no "hope" either way, but rather just an "expectation". I expected there would be little to no correlation at all. My thinking was that the speaker who's flaws were best obscured by the recording equipment's flaws would win. Of course, this is still possible, which is why I'm interested in gathering more data now. The results of this test really did surprise me.

Thomas_A · Oct 8, 2021

I've been thinking a bit how these kind of recordings could be done, and I am sure there is a lot of research present. Practically, and without buying expensive dummy heads, there is a choice to measure just outside the ear/ear canal which I just did or with binaural microphones, e.g. Soundman or Roland, recording a bit closer to the ear canal entrance. What we would need is a transfer curve for these type of recordings vs. our most common headphones (over ear), and an EQ method so we all could adjust headphones to that curve.

We cannot replicate due to variations of our ears and ear canals, but perhaps come closer to the real timbre of the recording event.

Thomas_A · Oct 8, 2021

preload said:
The researchers (Olive et al) used a mannikin-based binaural recording apparatus (called a KEMAR) outfitted with couplers to simulate the effect of the eardrum+canal to capture the loudspeaker recordings. Playback was via Etymotic ER-1M earphones, which deliver the recorded sound in close proximity to the eardrum. The closed-loop frequency response of this binaural recording apparatus + playback earphones was extremely flat from 20Hz to 10kHz, and is depicted below.

View attachment 157942
Hopefully this provides some comparative context for what is necessary to accurately reproduce the sound of loudspeakers in a room via headphones for the purposes of evaluating preferences.

Thanks for this. Seems that they covered the issue well, working close to the eardrum.

oivavoi · Oct 8, 2021

Very interesting discussion. I would also like to see more such tests. I also tend to agree that the consistency in the results - in spite of the large variation in transducers used and individual hrtf - does indicate that we can listen "through" such things, in a way. This probably aligns with research on auditory adaptation.

I'm otherwise proud of the fact that I've so far resisted my inner mansplainer who keeps pushing me to correct misinformation that has been uttered in the thread on issues which have nothing to do with audio. Viva la resistencia

LTig · Oct 8, 2021

preload said:
Perhaps you're not attuned to the degree of biometric variation that exists in human beings. Consider hat sizes - they have small, medium, and large. Consider, T-shirt sizes. They have XS up to 4XXL. And that's not even taking into consideration variation in external ear (pinna) shape and size. Not sure what constitutes an "average" person. How much does an "average person" weigh, pray tell?

I know about those variations but in many cases they are not distributed equally but rather according to the Gauss bell curve. The top of this curve represents the average, and many more people fall into the range around this average than to the extremes.

Thomas_A · Oct 9, 2021

I made some more fine-tuning of the EQ of my three songs, based on the room response of my speakers. Below is the room response at the moment, single sine sweep at LP. I am using EQ of my subwoofers (< 80 Hz) using Audyssey to bring down a 47 Hz room resonance, but the rest of the curve is just speakers. As seen there is a boost in the bass below and sloping down from 1-2 kHz.

So I used "inverted EQ" of that in Audacity:

And the resulting EQed file sounds a bit more balanced and more like what I hear in real life with respect to timbre.

3 stycken ny EQ.wav

Shared with Dropbox

www.dropbox.com

preload · Oct 12, 2021

I decided to finally listen to the 2 tracks recorded from the 4 speakers. Based on some of the responses here, I was beginning to doubt my original prediction that there was absolutely no way anyone could accurately reproduce the sound of speakers using binaural recordings and playing back on uncalibrated headphones. I used my HD800sdr headphones and an RME ADI-2 PRO FS.

I am pleased to confirm that all 4 speakers sounded absolutely horrible on both tracks. As in not even listenable. None of them. If this is how the Revel, B&W, and Klipschhorn actually sound in real life, I wouldn't even use them for the pc speaker that does the "beep" when you turn on your pc. Now to be fair, I don't know how the original tracks would have sounded when not re-recorded using mics, but unless the recording engineer was using a tape recorder like the police detectives use when interrogating suspects in the movies, there is no way anything the originals sounded that bad.

This was essentially an exercise in which recorded speaker sounded horrible vs which recorded speaker sounded horrible but slightly less so.

If there was anyone who also listened to those tracks and didn't buy into the kool-aid in this thread but was overwhelmed by the group think and didn't say anything, let me be first to say that you heard what I heard. Unbelievable.

richard12511 · Oct 12, 2021

preload said:
I decided to finally listen to the 2 tracks recorded from the 4 speakers. Based on some of the responses here, I was beginning to doubt my original prediction that there was absolutely no way anyone could accurately reproduce the sound of speakers using binaural recordings and playing back on uncalibrated headphones. I used my HD800sdr headphones and an RME ADI-2 PRO FS.

I am pleased to confirm that all 4 speakers sounded absolutely horrible on both tracks. As in not even listenable. None of them. If this is how the Revel, B&W, and Klipschhorn actually sound in real life, I wouldn't even use them for the pc speaker that does the "beep" when you turn on your pc. Now to be fair, I don't know how the original tracks would have sounded when not re-recorded using mics, but unless the recording engineer was using a tape recorder like the police detectives use when interrogating suspects in the movies, there no way anything can sound that bad.

If there was anyone who also listened to those tracks and didn't buy into the kool-aid in this thread but was overwhelmed by the group think, let me be first to say that you heard what I heard. Unbelievable.

I thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves. With that in mind, I then just judged them relative to each other, ignoring how terrible they sound compared to in person loudspeakers. While I enjoyed the Revel here the most(or hated it the least), judging it against what I hear at home/shows/dealers, it would be the worst speaker I've ever heard(by far). It was more a case of "which sounds the least terrible?"

.

I don't think group think was a huge issue here since we voted before we started talking about what we thought. Also, we could only see the score after we voted, and couldn't change our vote. Under those conditions, the fact that most of us said the same speaker sucked the least, and it happened to correlate with measurements, is pretty cool. Very unexpected(to me).

preload · Oct 12, 2021

richard12511 said:
I thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves.

Let me translate that for you: "the methodology was highly flawed."

restorer-john · Oct 12, 2021

preload said:
Let me translate that for you: "the methodology was highly flawed."

When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments. After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

I took it for what it was, four recordings that all sounded pretty ordinary to me- on headphones (as it was supposedly 'binaural'. Why would I bother with speakers?) I picked the least bad sounding speaker. 41 other people picked the same one. The other 14 people should hand back their Audiophile Card and sell their gear to invest in hearing aids.

Blumlein 88 · Oct 12, 2021

restorer-john said:
When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments. After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

I took it for what it was, four recordings that all sounded pretty ordinary to me- on headphones (as it was supposedly 'binaural'. Why would I bother with speakers?) I picked the least bad sounding speaker. 41 other people picked the same one. The other 14 people should hand back their Audiophile Card and sell their gear to invest in hearing aids.

You notice I've not maligned the methodology. Even though almost 50 people chose incorrectly. Panel speakers speak to a connesuir,s taste in sound. So the resulting totals are not surprising.

thewas · Oct 12, 2021

richard12511 said:
thought they all sounded horrible too, but I assumed that was mostly due to the recording/playback equipment, and not the speakers themselves.

The recording equipment and people doing the recordings were not amateur and did also some direct comparisons back then, so I would say rather the very high typical "highend" listening distance where direct to reflected sound ratio is too low and also the used music recordings which are not too great for such tests either are the 2 main reasons.

preload · Oct 12, 2021

restorer-john said:
When anyone posts an acoustic recording, or series of recordings, the complaints start. And for what? It just discourages fun little experiments.

Which is exactly the appropriate response. These aren't "fun little experiments." They are harmful because due to their methodology, it's impossible to know if the results are valid or if they are false findings. But since very few people here appear to have an actual understanding of how to interpret experiments, people are misled into believing the results can be used to draw conclusions.

After all, people use what is at hand and do their best in varying environments. If all that changes is the speakers in the same room, comparisons can be made quite successfully.

The problem is, the op didn't actually make the recordings himself, nor do we know the conditions of the recording room. The op also did not provide proper attribution for the source of these recordings in his original post. Some people, including yourself, appear to believe that the op was the one who made the recordings himself.

Binaural blind comparison test of 4 loudspeakers

Which loudspeaker sound do you personally prefer?

Loudspeaker A

Loudspeaker B

Loudspeaker C

Loudspeaker D

Major Contributor

Major Contributor

Major Contributor

Active Member

Senior Member

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Grand Contributor

Grand Contributor

Master Contributor

Major Contributor

Similar threads