• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Binaural blind comparison test of 4 loudspeakers

Which loudspeaker sound do you personally prefer?

  • Loudspeaker A

    Votes: 7 13.5%
  • Loudspeaker B

    Votes: 42 80.8%
  • Loudspeaker C

    Votes: 0 0.0%
  • Loudspeaker D

    Votes: 7 13.5%

  • Total voters
    52
  • Poll closed .

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
I think "most people could identify the presence dip" is worded too strongly. But several people commented on the midrange of B and thought it was elevated - whereas it turns out that it was actually speaker A and to a certain extent D (the main competitors) which had less energy in the mids and/or presence region.

Below is how I framed it. It's not completely precise, of course, but I do think I perceived some objective characteristics of these loudspeakers. I perceived A and B as being most close to each others, which turned out to be correct - those are the two conventional box-speakers in the bunch. I thought D was more "airy" and "less direct" - these descriptors are not unheard of with electrostatic speakers (but I was wrong about it having wider dispersion, it rather has a different dispersion). I thought B had more midrange, and even though I didn't write it this was mainly in comparison with A, given that I went back and forth in listening between A and B after having ruled out C and D. Others had impressions of these speakers which turned to be even more precise.
I would say using this survey to validate Harman curve preference is a big stretch....

B didn't fit the low sub bass requirements of the Harman curve anyway.

So, for the choices I have left (C is not a candidate):
*I select cleaner sound, but lack of sub bass and energy in certain bass notes
Or
*I select more full range sound, but lack of bass clarity.

In this particular case, I opt for cleaner sound and settled for lack of low bass energy.

B with a good pair of room eq subs? Yes.

A with better placement to improve bass clarity? Yes, I will select A over B.
 
Last edited:

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California
And no, I selected B not because of Harman curve. B is deep bass shy. It doesn't really fit the curve anyway. I would have selected others if the bass is cleaner for A or D in the room. I blame bad room placements. A got that deep bass that B lacks.
Good point - bass response (specifically -6dB frequency) is responsible for ~30% of perceived loudspeaker sound quality (Harman research). Without controlling for the playback speakers/headphones, how can we be sure that the listeners of this "experiment" (air quotes intended) were able to properly consider the bass response of the 4 speakers? All sorts of problems.
You are correct that the paper I posted here does not directly test binaural recordings of loudspeakers over headphones. But the authors do seem to think that their results carry some generalizability, including to loudspeakers reproduced through binaural recordings (edit: as they mention such studies in the scholarly context, and implicitly present their study as a test of an underlying assumption in these studies). And as @thewas said, this has actually been a common practice among psychoacoustic researchers, including heavyweights such as Floyd Toole (https://www.aes.org/e-lib/online/browse.cfm?elib=5537) and Sean Olive (https://www.aes.org/e-lib/browse.cfm?elib=7674). I haven't read the latter two papers in a while though./quote]
Appreciate the papers, @oivavoi. The second paper appears to have examined something related - i.e. binaural recordings of loudspeakers played back via earphones. Olive et al reported that the relative ranking of 3 loudspeakers did not differ when recorded and played back on headphones vs live listening. However, it's difficult to know how to interpret that because if you look at the appendix figures, the actual preferences scores are so close together (i.e. all 3 being +/- 0.5 pref scores from each other), with speakers 2 and 3 being rated +/- 0.25 pref scores apart - so if the listeners could barely distinguish between the 3 speakers live, and they could barely distinguish between them using binaural recordings, I'm not sure what that really tells us. While super interesting, it strikes me as more of a preliminary "proof of concept" experiment than a solid confirmation that binaural recordings can substitute for live listening when rating loudspeakers.

I think the proof is in the pudding. As was acknowledged by Olive in the second paper, it is far more painstaking and time-intensive to perform listening experiments using live loudspeakers instead of binaural recordings. Yet using live loudspeakers is EXACTLY what Olive did 10 years later in his landmark multiple regression paper correlating measurements of 70 loudspeakers with blind listening preferences. I think the fact that they performed that gigantic study using loudspeakers on a rotating platform instead of using binaural recordings speaks for itself.

So the assumption that binaural recordings of loudspeakers can tell us something about actual preferences for loudspeakers is not crazy. Do we know this with absolute certainty? No, psychacoustics and loudspeaker research is a very small academic field. But it's not an unreasonable assumption, given the available evidence.

Some differences have also been shown in the literature though: One study demonstrated that listeners may prefer more reverberation with loudspeakers than with headphones. https://www.aes.org/e-lib/browse.cfm?elib=16787
I am therefore somewhat skeptical towards deciding between wide and narrow directivity loudspeakers only by using binaural recordings, for example - even though we have no direct test of that. But given that wider directivity creates stronger reverberation/reflections, that seems like a reasonable assumption to me.
But for tonality, smoothness of dispersion, bass, general issues with loudspeakers etc, I think there is some evidence that binaural recordings of loudspeakers can serve their purpose, given the existing research.

edited for clarity
I can agree with the possibility that binaural recordings of loudspeakers can "tell us something" about actual preferences - I don't think that's in question. But the degree to which they can is still in question to me, and I would encourage others to be highly skeptical, particularly if the playback transducers cannot be controlled/standardized (I think someone else may have also made this recommendation ).
 

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,940
Location
Oslo, Norway
Good point - bass response (specifically -6dB frequency) is responsible for ~30% of perceived loudspeaker sound quality (Harman research). Without controlling for the playback speakers/headphones, how can we be sure that the listeners of this "experiment" (air quotes intended) were able to properly consider the bass response of the 4 speakers? All sorts of problems.

Appreciate the papers, @oivavoi. The second paper appears to have examined something related - i.e. binaural recordings of loudspeakers played back via earphones. Olive et al reported that the relative ranking of 3 loudspeakers did not differ when recorded and played back on headphones vs live listening. However, it's difficult to know how to interpret that because if you look at the appendix figures, the actual preferences scores are so close together (i.e. all 3 being +/- 0.5 pref scores from each other), with speakers 2 and 3 being rated +/- 0.25 pref scores apart - so if the listeners could barely distinguish between the 3 speakers live, and they could barely distinguish between them using binaural recordings, I'm not sure what that really tells us. While super interesting, it strikes me as more of a preliminary "proof of concept" experiment than a solid confirmation that binaural recordings can substitute for live listening when rating loudspeakers.

I think the proof is in the pudding. As was acknowledged by Olive in the second paper, it is far more painstaking and time-intensive to perform listening experiments using live loudspeakers instead of binaural recordings. Yet using live loudspeakers is EXACTLY what Olive did 10 years later in his landmark multiple regression paper correlating measurements of 70 loudspeakers with blind listening preferences. I think the fact that they performed that gigantic study using loudspeakers on a rotating platform instead of using binaural recordings speaks for itself.


I can agree with the possibility that binaural recordings of loudspeakers can "tell us something" about actual preferences - I don't think that's in question. But the degree to which they can is still in question to me, and I would encourage others to be highly skeptical, particularly if the playback transducers cannot be controlled/standardized (I think someone else may have also made this recommendation ).

Good response!
Now I need to spend the rest of the day writing a presentation in the academic field I actually work in... :) (which has nothing to do with audio whatsoever)
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,469
Likes
2,470
Location
Sweden
Good point - bass response (specifically -6dB frequency) is responsible for ~30% of perceived loudspeaker sound quality (Harman research). Without controlling for the playback speakers/headphones, how can we be sure that the listeners of this "experiment" (air quotes intended) were able to properly consider the bass response of the 4 speakers? All sorts of problems.

Appreciate the papers, @oivavoi. The second paper appears to have examined something related - i.e. binaural recordings of loudspeakers played back via earphones. Olive et al reported that the relative ranking of 3 loudspeakers did not differ when recorded and played back on headphones vs live listening. However, it's difficult to know how to interpret that because if you look at the appendix figures, the actual preferences scores are so close together (i.e. all 3 being +/- 0.5 pref scores from each other), with speakers 2 and 3 being rated +/- 0.25 pref scores apart - so if the listeners could barely distinguish between the 3 speakers live, and they could barely distinguish between them using binaural recordings, I'm not sure what that really tells us. While super interesting, it strikes me as more of a preliminary "proof of concept" experiment than a solid confirmation that binaural recordings can substitute for live listening when rating loudspeakers.

I think the proof is in the pudding. As was acknowledged by Olive in the second paper, it is far more painstaking and time-intensive to perform listening experiments using live loudspeakers instead of binaural recordings. Yet using live loudspeakers is EXACTLY what Olive did 10 years later in his landmark multiple regression paper correlating measurements of 70 loudspeakers with blind listening preferences. I think the fact that they performed that gigantic study using loudspeakers on a rotating platform instead of using binaural recordings speaks for itself.


I can agree with the possibility that binaural recordings of loudspeakers can "tell us something" about actual preferences - I don't think that's in question. But the degree to which they can is still in question to me, and I would encourage others to be highly skeptical, particularly if the playback transducers cannot be controlled/standardized (I think someone else may have also made this recommendation ).

Has someone read the paper 2 and looked into which target curve the headphones were calibrated against? Binaural, free field?
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California
Has someone read the paper 2 and looked into which target curve the headphones were calibrated against? Binaural, free field?
I don't think they were specific about that in the methodology.
 

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,942
Likes
6,086
B&W
- “Skyfall” from Adele. Mastered with B&W 802D

Revel
- Tracy Chapman, “Fast Car”

Klipschorn
- “Fanfare for the Common Man”, Copland.
(Spirit of the American Range, Oregon Symphony Orchestra)

Quad ESL
- “Hey Jude”, The Beatles.

Any thoughts, @krabapple ? It's a bit of a moot point now that we know this was done by a magazine in the "distant" past but it would be an interesting experiment to conduct to see if the popular speakers that "shouldn't" be popular based upon preference score happen to excel in the genres they are stereotypically good for.
 

LTig

Master Contributor
Forum Donor
Joined
Feb 27, 2019
Messages
5,856
Likes
9,616
Location
Europe
Sure, for starters, these binaural recordings, which were made using an artificial head/ear, do not necessarily replicate the ear shape, ear spacing, head size, and shoulder reflection of the general population. So already what you're recording does not account for the variation in humans and what they would "hear" if they were listening live.
I would expect that a maker of a dummy head designs it for an average person and not for a few outliers.
Secondly, and more importantly, you didn't control for playback. You and I both know that there are wide differences in the frequency response curves of headphones (vs a hypothetical "neutral" target) and loudspeakers. Which means if you're trying to evaluate FR, but you're also introducing another device in the evaluation chain that, itself, has large variations in frequency response, how do you know your results aren't simply a function of the variation in your playback device? These are some basic experimental questions that should come immediately to mind.
In my experience people adapt to the FR of their monitoring equipment, so an FR which deviates from flat appears as not neutral even if the FR of the monitoring is not flat.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,828
Likes
37,757
snip....
In my experience people adapt to the FR of their monitoring equipment, so an FR which deviates from flat appears as not neutral even if the FR of the monitoring is not flat.
I wonder about the exact parameters of that adaptation to non-Flat FR. I know people claim they learn and compensate, but do they really? And surely that has to muddy the water quite a bit. If you are talking a low end roll-off due to monitors being small I'd believe that easier to compensate for. If you are talking a roller coaster FR with lots of ups and down I highly doubt the ability to fully compensate. But maybe there is some mechanism like how our hearing largely filters out the early effects of the room.
 

LTig

Master Contributor
Forum Donor
Joined
Feb 27, 2019
Messages
5,856
Likes
9,616
Location
Europe
I wonder about the exact parameters of that adaptation to non-Flat FR. I know people claim they learn and compensate, but do they really? And surely that has to muddy the water quite a bit. If you are talking a low end roll-off due to monitors being small I'd believe that easier to compensate for. If you are talking a roller coaster FR with lots of ups and down I highly doubt the ability to fully compensate. [..]
I think we all rather underestimate the adaptability of the human brain.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California
I would expect that a maker of a dummy head designs it for an average person and not for a few outliers.
Perhaps you're not attuned to the degree of biometric variation that exists in human beings. Consider hat sizes - they have small, medium, and large. Consider, T-shirt sizes. They have XS up to 4XXL. And that's not even taking into consideration variation in external ear (pinna) shape and size. Not sure what constitutes an "average" person. How much does an "average person" weigh, pray tell?

In my experience people adapt to the FR of their monitoring equipment, so an FR which deviates from flat appears as not neutral even if the FR of the monitoring is not flat.
Interesting theory.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,828
Likes
37,757
I think we all rather underestimate the adaptability of the human brain.
It is very adaptable in the right conditions.

Remember the psychologist who wore upside down glasses continuously for several day. After a day or two it looked normal to him and briefly upon removing them things seemed wrong though not as long as the initial period of adjustment with glasses on.

I think too of the experiment where volunteers had their spatial acuity in hearing tested. Then agreed to wear implants that reshaped the pinna. After the implants the spatial acuity was all messed up. In a few days the brain adapted and at the end of 2 or 4 weeks whichever it was the spatial acuity was as good as the original test. Upon removing the implants spatial acuity was wrong, but only briefly. Results were that the testees could reorient to either condition in minutes. Like two different filters had been stored and give the ear a few minutes to figure out which was right and it was able to work with either condition equally well.

All of which supports the idea that people may adapt to their non-transparent gear and thereafter hear whether the source material is neutral or not. The difference would be physical reality would inform the new filters for upside down lenses or different pinna over time. I'm not sure the feedback is there with non-transparent speakers for the same result to take place or for it to be as precise. Then again, even after the reveal and listening over two headphones and a pair of speakers the best sounding speaker in this test in my opinion was the Quad 2805 which is update of the very, very similar ESL-63s which I owned for a dozen years.
 

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,725
Likes
6,032
Location
US East
In my experience people adapt to the FR of their monitoring equipment, so an FR which deviates from flat appears as not neutral even if the FR of the monitoring is not flat.
Interesting theory.

Griesinger agrees with @LTig [slide 40, http://www.davidgriesinger.com/intermod.ppt]

intermod.PNG
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,725
Likes
6,032
Location
US East
Fantastic. So why bother purchasing loudspeakers that are free of spectrum and timbre "errors" - when we could simply take any inexpensive pair of loudspeakers, listen to them for 10-20 mins to become "adapted," and transform them into a Gene-Revel Salon 8351!)
You certainly have not paid much attention to all the discussions we have here at ASR on the science. Why do we need level matched blind testing? It is because our perception bias can be completely overpowering. It is only when we compare with these controls in place that we can objectively differentiate which ones are better.

Now that you know about the power of perception bias and how poorly our two ears and a brain can tell the "objectively good" from the "objectively bad", you can use it wisely to your advantage*.

* Well, of course there is a limit to how far this can go. Truly atrocious gears are truly atrocious. Noise and insufficient output are also easily noticeable. Once the gear get pass these, and if you flat out refuse to step into a double blind test, and you've convinced yourself your set of gears is the very best, no one will be able to convince you otherwise.

So, don't go to blind test, don't get any listening training. Ignorance is bliss. You will hear all sorts of marvelous things, or equally possible, countless numbers of veils.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California
You certainly have not paid much attention to all the discussions we have here at ASR on the science. Why do we need level matched blind testing? It is because our perception bias can be completely overpowering. It is only when we compare with these controls in place that we can objectively differentiate which ones are better.

Now that you know about the power of perception bias and how poorly our two ears and a brain can tell the "objectively good" from the "objectively bad", you can use it wisely to your advantage*.

* Well, of course there is a limit to how far this can go. Truly atrocious gears are truly atrocious. Noise and insufficient output are also easily noticeable. Once the gear get pass these, and if you flat out refuse to step into a double blind test, and you've convinced yourself your set of gears is the very best, no one will be able to convince you otherwise.

So, don't go to blind test, don't get any listening training. Ignorance is bliss. You will hear all sorts of marvelous things, or equally possible, countless numbers of veils.
Make up your mind then. Are you saying it doesn't matter what playback headphones/speakers are used to evaluate binaural recordings of loudspeakers because the brain adapts? Or now it DOES matter. Pick one.

Also, unless you have an advanced degree in experimental research, I probably know more about blind testing than you do, and could do without the man-splaining. Will leave it at that.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,560
Likes
1,705
Location
California
Good grief. Someone has been indoctrinated to think this is actually a thing. I didn't expect to see these brain-dead buzzwords being used on a science-based forum.
I'd be happy to man-splain it to you. First, go to www.google.com. Then in that search bar, it's the rectangle where you enter text, go ahead and type in "mansplaining." Then hit enter on your keyboard or click the search icon using your mouse. Go ahead and click one of the many articles from reputable sources that appear on the first few pages.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,828
Likes
37,757
I'd be happy to man-splain it to you. First, go to www.google.com. Then in that search bar, it's the rectangle where you enter text, go ahead and type in "mansplaining." Then hit enter on your keyboard or click the search icon using your mouse. Go ahead and click one of the many articles from reputable sources that appear on the first few pages.
BTW, is this a sly way of telling us you are a woman? I didn't know, nor should it in general matter.

1633661019192.png
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,338
Likes
6,710
Might as well walk into Best Buy, take a video recording of 4 big screen TV's using your iphone, post it on an AV forum so people can look at the videos on their laptops, and decide which TV has the best picture quality.

That's what I thought coming into this thread, but it appears that somehow we not only have an ability to "hear through the room", but also "hear through the recording equipment and room".

The data is simply too clear. 80% picked the speaker that measured the best(the Revel), 0% of people picked the speaker that measured the worst(Klipsch), and a few people picked the speakers that measured decent, but not great(B&W, Quad).

Seems really unlikely that it's simply a coincidence. But, hopefully we can start gathering more data here soon. I'd like to start repeating this test with more speakers to see if this was just or coincidence or not.
 
Last edited:
Top Bottom