• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Binaural blind comparison test of 4 loudspeakers

Which loudspeaker sound do you personally prefer?

  • Loudspeaker A

    Votes: 7 13.5%
  • Loudspeaker B

    Votes: 42 80.8%
  • Loudspeaker C

    Votes: 0 0.0%
  • Loudspeaker D

    Votes: 7 13.5%

  • Total voters
    52
  • Poll closed .

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,223
Likes
3,834
If I had access to all of these speakers, I would then pick four songs to go with four speakers. Pick the song that stereotypically is associated with each speaker, blind everyone and then see what the preferences are :)

What songs would those be, for these speakers?
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,936
Likes
38,034
I recorded three short sequences today and applied an approximate EQ inverse of my room response, just to experiment further. Again using two OM1 microphones taped to my ears. Should be listened through headphones with a target curve similar to a room (e.g. Harman target).

I think you are doing something right. These still sound a little reverby or echo-y. The tonality seems very good on them. I also get a nice soundstage just outside my head all the way around. More so than with the actual binaural recordings used initially in this thread.
 

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,961
Likes
6,116
What songs would those be, for these speakers?

B&W
- Abbey Road Studios standardized on B&W early on
- let’s pick something from the modern era to test the “circle of confusion”
- “Skyfall” from Adele. Mastered with B&W 802D

Revel
- developed using Harman Science
- let’s choose a track from Harman’s blind testing selection
- Tracy Chapman, “Fast Car”

Klipschorn
- attempted to recreate the live orchestra feel at home
- let’s choose a modern orchestral recording with lots of dynamics
- “Fanfare for the Common Man”, Copland.
(Spirit of the American Range, Oregon Symphony Orchestra)

Quad ESL
- midrange purity without a lot of dynamics is the bias
- let’s choose something classic with low dynamic range
- “Hey Jude”, The Beatles.

There are plenty of other options, but all four tracks are recognizable, good tracks that you might listen to for pleasure independent of a test, and of different audiophile genres.

Score speakers based upon track choice and then overall.

If the science is correct, the Revel should still be in 1st or 2nd for most tests. But the question is if, under blind testing, Revel takes 2nd place to something like the Klipschorn for a very dynamic orchestral piece and the Klipsch is last place for everything else. That would show the science behind the strategy of Revel but also provide scientific rationale why the Klipschorn remains one of the speakers that has been in continuous production for the longest. It’s really good for a specific type of music.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
Might as well walk into Best Buy, take a video recording of 4 big screen TV's using your iphone, post it on an AV forum so people can look at the videos on their laptops, and decide which TV has the best picture quality.
 
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
Might as well walk into Best Buy, take a video recording of 4 big screen TV's using your iphone, post it on an AV forum so people can look at the videos on their laptops, and decide which TV has the best picture quality.
If you use a poor recording and playback than its your problem but for example with high quality calibrated photo and video recording and playback you can similarly replicate colour "tonality" and various problems.

Research has shown that binaural preference tests seems to correlate well with preferences when listening to actual loudspeakers https://www.audiosciencereview.com/...st-of-4-loudspeakers.26785/page-6#post-924548
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
I'll just mention that it's a fairly common procedure for speaker testing these days to do it exactly like this: Make binaural recordings of loudspeakers, play them for people through headphones, ask them to rate what they hear. I remember at least one such test which was done with Sennheiser 650, without any equalization. These preference tests seems to correlate well with preferences for listening to actual loudspeakers, according to this study: https://www.aes.org/e-lib/online/browse.cfm?elib=16086
I just read the referenced paper. To be clear, the paper does not validate the use of binaural recordings of loudspeakers played back over headphones as a substitute for actual listening to the loudspeakers to determine listener preferences.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
Research has shown that binaural preference tests seems to correlate well with preferences when listening to actual loudspeakers https://www.audiosciencereview.com/...st-of-4-loudspeakers.26785/page-6#post-924548

The referenced paper absolutely does not say what you think it does. Wow. Did you actually read it and understand it? (serious question).

For starters, the two test conditions were simply different microphone capsules, each capturing the exact same audio source. At no point did the study capture binaural recordings of various loudspeakers themselves and ask listeners to provide a preference score or a ranking based on listening to those recordings via headphones.

If you're going to cite a reference with me, you'd better believe I'm going to read it.
 
Last edited:
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
I just read the referenced paper. To be clear, the paper does not validate the use of binaural recordings of loudspeakers played back over headphones as a substitute for actual listening to the loudspeakers to determine listener preferences.
The paper writes that generally headphone recordings keep the similarity consistent and don't change the preference judgements.

Sound reproduction over headphones is, because of its convenience, indifferently used to reproduce and assess a large variety of audio contents. Nevertheless, it is not yet proven that differences between sound sequences are equally perceived when played back through headphones as using dedicated loudspeaker systems. This study aims at evaluating whether differences and preferences between excerpts are equally perceived using these two reproduction methods. Various types of audio contents, issued by two different recording systems, were then to be compared on both headphones and loudspeaker setups. The results indicate that the two reproduction methods provided consistent similarity and preference judgments. This suggests that the features involved in similarity and preference assessments were preserved when reproducing these excerpts over headphones.The results indicate that the two reproduction methods provided consistent similarity and preference judgments. This suggests that the features involved in similarity and preference assessments were preserved when reproducing these excerpts over headphones.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
The paper writes that generally headphone recordings keep the similarity consistent and don't change the preference judgements.

Sound reproduction over headphones is, because of its convenience, indifferently used to reproduce and assess a large variety of audio contents. Nevertheless, it is not yet proven that differences between sound sequences are equally perceived when played back through headphones as using dedicated loudspeaker systems. This study aims at evaluating whether differences and preferences between excerpts are equally perceived using these two reproduction methods. Various types of audio contents, issued by two different recording systems, were then to be compared on both headphones and loudspeaker setups. The results indicate that the two reproduction methods provided consistent similarity and preference judgments. This suggests that the features involved in similarity and preference assessments were preserved when reproducing these excerpts over headphones.The results indicate that the two reproduction methods provided consistent similarity and preference judgments. This suggests that the features involved in similarity and preference assessments were preserved when reproducing these excerpts over headphones.
I dont think you understand what the paper demonstrated. Anyone can copy/paste excerpts from a paper and take it out of context.

The authors tested two different MICROPHONE CAPSULES, A and B. They played back live music that were captured simultaneously by microphones A and B on headphones and loudspeakers. They found that listener preferences of the recordings captured by microphones A and B were similar whether heard via headphones or loudspeakers. Therefore it's reasonable to conclude that differences in microphone quality can be identified similarly with headphones and speakers.

But by no means did this experiment demonstrate applicability to evaluating differences in recorded speakers. And to be clear once again, the authors of this paper didn't do anything close to what you're attempting to do here. Not even a little close. In my line of work, I would consider the paper you cited to be preliminary and hypothesis generating, but definitely not generalizable or applicable.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
@thewas I would suggest that you disclose who made these recordings and under what conditions. Most people wouldn't just happen to have these four large and expensive loudspeakers in their living room just lying around.
 

Thomas_A

Major Contributor
Forum Donor
Joined
Jun 20, 2019
Messages
3,502
Likes
2,542
Location
Sweden
I think you are doing something right. These still sound a little reverby or echo-y. The tonality seems very good on them. I also get a nice soundstage just outside my head all the way around. More so than with the actual binaural recordings used initially in this thread.

Thanks. Ideally I would take the raw file and then use EQ at playback for the specific headphone. Do you know if a binaural target curve is the free field target? If so one could use EQ on headphone with free field target.
 
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
But by no means did this experiment demonstrate applicability to evaluating differences in recorded speakers. And to be clear once again, the authors of this paper didn't do anything close to what you're attempting to do here. Not even a little close. In my line of work, I would consider the paper you cited to be preliminary and hypothesis generating, but definitely not generalizable or applicable.
Binaural recordings have been used to replicate, compare and evaluate auditory scenes in the industry since the 80s, for example the Stax Lambda Pro version with the higher polarisation voltage was created from as request of Mercedes Benz as they needed higher SPLs to realistically replicate some events. Of course as I write in the introduction of this thread, there are limitations in this method like for example how well can the spatial rendering be replicated. On the other side there is no doubt that tonal differences/problems, resonances and distortions will be captured and replicated and as we know these count the most for the preference of a loudspeakers (like Toole says if frequency response is wrong, nothing else matters). Also if you look at the responses of the ASR listeners here who participated in the test, many correctly identified the presence dip of the B&Ws. Also Sean Olive has shown a very good correlation of virtual headphone ratings to the ratings of the real ones, so Harman uses the virtual headphones for their preference comparisons as it offers some advantages like the headphones are not revealed and can bias the listeners due to their haptics/feel.

@thewas I would suggest that you disclose who made these recordings and under what conditions. Most people wouldn't just happen to have these four large and expensive loudspeakers in their living room just lying around.
They were done by a big German hifi magazine in the mid 2000s with a Head Acoustics artificial head in their listening room, actually of many more loudspeakers but these 4 were done in the same session and with the same music and matched the famous Harman blind test with the B&W best.
 
Last edited:

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,940
Location
Oslo, Norway
I dont think you understand what the paper demonstrated. Anyone can copy/paste excerpts from a paper and take it out of context.

The authors tested two different MICROPHONE CAPSULES, A and B. They played back live music that were captured simultaneously by microphones A and B on headphones and loudspeakers. They found that listener preferences of the recordings captured by microphones A and B were similar whether heard via headphones or loudspeakers. Therefore it's reasonable to conclude that differences in microphone quality can be identified similarly with headphones and speakers.

But by no means did this experiment demonstrate applicability to evaluating differences in recorded speakers. And to be clear once again, the authors of this paper didn't do anything close to what you're attempting to do here. Not even a little close. In my line of work, I would consider the paper you cited to be preliminary and hypothesis generating, but definitely not generalizable or applicable.

You are correct that the paper I posted here does not directly test binaural recordings of loudspeakers over headphones. But the authors do seem to think that their results carry some generalizability, including to loudspeakers reproduced through binaural recordings (edit: as they mention such studies in the scholarly context, and implicitly present their study as a test of an underlying assumption in these studies). And as @thewas said, this has actually been a common practice among psychoacoustic researchers, including heavyweights such as Floyd Toole (https://www.aes.org/e-lib/online/browse.cfm?elib=5537) and Sean Olive (https://www.aes.org/e-lib/browse.cfm?elib=7674). I haven't read the latter two papers in a while though.

So the assumption that binaural recordings of loudspeakers can tell us something about actual preferences for loudspeakers is not crazy. Do we know this with absolute certainty? No, psychacoustics and loudspeaker research is a very small academic field. But it's not an unreasonable assumption, given the available evidence.

Some differences have also been shown in the literature though: One study demonstrated that listeners may prefer more reverberation with loudspeakers than with headphones. https://www.aes.org/e-lib/browse.cfm?elib=16787
I am therefore somewhat skeptical towards deciding between wide and narrow directivity loudspeakers only by using binaural recordings, for example - even though we have no direct test of that. But given that wider directivity creates stronger reverberation/reflections, that seems like a reasonable assumption to me.
But for tonality, smoothness of dispersion, bass, general issues with loudspeakers etc, I think there is some evidence that binaural recordings of loudspeakers can serve their purpose, given the existing research.

edited for clarity
 
Last edited:

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
Binaural recordings have been used to replicate, compare and evaluate auditory scenes in the industry since the 80s, for example the Stax Lambda Pro version with the higher polarisation voltage was created from as request of Mercedes Benz as they needed higher SPLs to realistically replicate some events. Of course as I write in the introduction of this thread, there are limitations in this method like for example how well can the spatial rendering be replicated. On the other side there is no doubt that tonal differences/problems, resonances and distortions will be captured and replicated and as we know these count the most for the preference of a loudspeakers (like Toole says if frequency response is wrong, nothing else matters). Also if you look at the responses of the ASR listeners here who participated in the test, many correctly identified the presence dip of the B&Ws. Also Sean Olive has shown a very good correlation of virtual headphone ratings to the ratings of the real ones, so Harman uses the virtual headphones for their preference comparisons as it offers some advantages like the headphones are not revealed and can bias the listeners due to their haptics/feel.
I hear you, and at the same time, nothing that you mentioned establishes the validity of attempting to obtain reliable preference ratings of loudspeakers by recording them and playing them back on their personal headphones or speakers. And please don't invoke Harman and Sean Olive in a way that mistakenly implies that either have established or endorsed the validity of what you're trying to do.

Harman's virtual headphones, since you bring them up, simply uses EQ of a known "replicator" headphone to simulate the FR curve of other headphones, which once again, is NOT what you've done here.
They were done by a big German hifi magazine in the mid 2000s with a Head Acoustics artificial head in their listening room, actually of many more loudspeakers but these 4 were done in the same session and with the same music and matched the famous Harman blind test with the B&W best.
Thank you. That still seems a little vague, and not sure what to make of it. For instance, how was optimal placement of each loudspeaker assured, how was the toe-in set, etc. Everybody and their mother knows that even small adjustments to location and toe-in make a dramatic difference in how a speaker sounds.
 
Last edited:
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
I hear you, and at the same time, nothing that you mentioned establishes the validity of attempting to obtain reliable preference ratings of loudspeakers by recording them and playing them back on a pair of headphones. And please don't invoke Harman and Sean Olive in a way that mistakenly implies that either have established or endorsed the validity of what you're trying to do.
Please answer my repeated question, why should a binaural recording be not be able to replicate tonal, resonance and distortion issues? Why were many listeners here able to recognise the presence dip?

Harman's virtual headphones, since you bring them up, simply uses EQ of a known "replicator" headphone to simulate the FR curve of other headphones, which once again, is NOT what you've done here.
And why should that differ from point of view of FR? There are people who have done comparisons of binaural recordings of headphones played back with the same headphones and the heard differences can be very small.

Thank you. That still seems a little vague, and not sure what to make of it. For instance, how was optimal placement of each loudspeaker assured, how was the toe-in set, etc. Everybody and their mother knows that even small adjustments to location and toe-in make a dramatic difference in how a speaker sounds.
Well, they placed the Klipschhorns in the corners as they should, while the others free from the walls, about small toe-in making dramatic differences I don't agree, unless the loudspeaker is a mess in terms of directivity.

And of course there are as said limitations, same also for the Harman blind test which gave similar results but on the other hand you often write about the superiority of the B&W voicing but to my knowledge there is not a single blind test which has shown it, B&W is a huge company, don't you think they would have presented one if they could? I did this test fully open to results and was myself even surprised how well they seem to correlate to the existing Harman research. As I had written I also hope that there will be more and better such testing in the future, on the other hand its easy to (correctly) criticise existing research but till there is no better that is what we have. You say for example you own both high end B&Ws and Genelecs, why don't you organise a simple blind test with few ASR members close to you? If I had such I would really love doing such as I am honestly curious how the preference will be.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,565
Likes
1,715
Location
California
Please answer my repeated question, why should a binaural recording be not be able to replicate tonal, resonance and distortion issues? Why were many listeners here able to recognise the presence dip?
Sure, for starters, these binaural recordings, which were made using an artificial head/ear, do not necessarily replicate the ear shape, ear spacing, head size, and shoulder reflection of the general population. So already what you're recording does not account for the variation in humans and what they would "hear" if they were listening live. Secondly, and more importantly, you didn't control for playback. You and I both know that there are wide differences in the frequency response curves of headphones (vs a hypothetical "neutral" target) and loudspeakers. Which means if you're trying to evaluate FR, but you're also introducing another device in the evaluation chain that, itself, has large variations in frequency response, how do you know your results aren't simply a function of the variation in your playback device? These are some basic experimental questions that should come immediately to mind.

In one of the papers cited by @oivavoi, the experimenters use a specific set of IN-EAR headphones, of sufficient quality, that had been specifically EQ'd (probably to match the FR of the room for a perfect transfer function). Now THAT is a reasonable way to do it. But you didn't do that.
 
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
Sure, for starters, these binaural recordings, which were made using an artificial head/ear, do not necessarily replicate the ear shape, ear spacing, head size, and shoulder reflection of the general population. So already what you're recording does not account for the variation in humans and what they would "hear" if they were listening live.
The limitations of binaural heads and related HRTFs, namely that they can only replicate and average head/pinna/HRTF are known but thankfully more relevant above 5 kHz, probably also the reason why still most people could clearly identify the presence dip.

In one of the papers cited by @oivavoi, the experimenters use a specific set of IN-EAR headphones, of sufficient quality, that had been specifically EQ'd (probably to match the FR of the room for a perfect transfer function).
The problem of the individual HRTF remains in that case, not the recording FR is the problem but at the playback, although there exist more and more methods recently to EQ the headphones to the individual HRTFs.
 

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
The limitations of binaural heads and related HRTFs, namely that they can only replicate and average head/pinna/HRTF are known but thankfully more relevant above 5 kHz, probably also the reason why still most people could clearly identify the presence dip.


The problem of the individual HRTF remains in that case, not the recording FR is the problem but at the playback, although there exist more and more methods recently to EQ the headphones to the individual HRTFs.
I didn't see "most people could clearly identify the presence dip.". Who in this thread said this?

I for one did not say this.

And no, I selected B not because of Harman curve. B is deep bass shy. It doesn't really fit the curve anyway. I would have selected others if the bass is cleaner for A or D in the room. I blame bad room placements. A got that deep bass that B lacks.
 

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,940
Location
Oslo, Norway
I didn't see "most people could clearly identify the presence dip.". Who in this thread said this?

I for one did not say this.

And no, I selected B not because of Harman curve. B is deep bass shy. It doesn't really fit the curve anyway. I would have selected others if the bass is cleaner for A or D in the room. I blame bad room placements. A got that deep bass that B lacks.

I think "most people could identify the presence dip" is worded too strongly. But several people commented on the midrange of B and thought it was elevated - whereas it turns out that it was actually speaker A and to a certain extent D (the main competitors) which had less energy in the mids and/or presence region.

Below is how I framed it. It's not completely precise, of course, but I do think I perceived some objective characteristics of these loudspeakers. I perceived A and B as being most close to each others, which turned out to be correct - those are the two conventional box-speakers in the bunch. I thought D was more "airy" and "less direct" - these descriptors are not unheard of with electrostatic speakers (but I was wrong about it having wider dispersion, it rather has a different dispersion). I thought B had more midrange, and even though I didn't write it this was mainly in comparison with A, given that I went back and forth in listening between A and B after having ruled out C and D. Others had impressions of these speakers which turned to be even more precise.
Ok, here are my impressions.

With the second track (Beethoven), however, I had a very clear preference for loudspeaker B. I assume it was easier for me to form a preference on this track because this track is more intensive and thus demands more of the loudspeaker. I won't claim that my listening provided me with a detailed technical analysis of what I was hearing, because it wasn't like that and I didn't bother listening in detail once I formed a presence. In subjective terms, I perceived a higher presence of the music. I would guess that the mids may be somewhat elevated in this speaker compared to some of the others, but I may be wrong about that. The presentation also seemed more coherent in a way.

Second-best for me was loudspeaker A, which I thought was the one which was closest to B in overall sound.

C was the worst-sounding by far.

D was different - it sounded more airy, less direct. Bass was worse than B and A to my ears. I would guess that this is a loudspeaker with broad dispersion. I often like such speakers in a room, but reproduced through headphones it sounded strange to me. Or was it about a different frequency response, a more pronounced treble perhaps? I'm not sure.
 
OP
thewas

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,934
Likes
17,078
I think "most people could identify the presence dip" is worded too strongly. But several people commented on the midrange of B and thought it was elevated - whereas it turns out that it was actually speaker A and to a certain extent D (the main competitors) which had less energy in the mids and/or presence region.
Exactly, here some more such examples:

Speaker A didn't have good bass. It was a little wooly and the midrange is unclear.

A could be a big BBC style monitor (Harbeth).

A: Almost neutral, but not a neutral as B, and maybe a little darker. Maybe Sonus Faber, Harbeth, Canton? I'm guessing it's shooting for neutral, and probably a cone and dome with no waveguide

A: B&W or alike

Given the few detailed replies in those few days that was quite significant and surprising in a positive way.
 
Top Bottom