When evaluating research, we cannot just look at the results and stop there. The methodology also has to make sense. At a most basic level, the possibility of confounders must be controlled for - otherwise you can't be sure that the results are actually demonstrating what you think they're demonstrating. In this case, while the results match what everybody was hoping, it doesn't, in turn, follow that the methodology was necessarily valid to begin with. In fact, the opposite burden is true - that is, if the methodology is suspect, you cannot necessarily accept the results, no matter how much you want to believe that the experiment "proved" your hypothesis.That's what I thought coming into this thread, but it appears that somehow we not only have an ability to "hear through the room", but also "hear through the recording equipment and room".
The data is simply too clear. 80% picked the speaker that measured the best(the Revel), 0% of people picked the speaker that measured the worst(Klipsch), and a few people picked the speakers that measured decent, but not great(B&W, Quad).
Seems really unlikely that it's simply a coincidence. But, hopefully we can start gathering more data here soon. I'd like to start repeating this test with more speakers to see if this was just or coincidence or not.
In this case, the methodology, which essentially lets a listener "evaluate" the frequency response of 4 loudspeakers, but allows the listener to choose the playback speaker/headphones, each having its own frequency response deviations, is horribly flawed. I get that the majority of intelligent and thoughtful members of ASR do not evaluate experimental research in their "day jobs" but there are people that do.
There are just so many possible confounders I don't even know where to start.
The potential for an interaction between the transfer curve of the original loudspeaker and the playback transducer is large. For instance if it turns out that, for the sake of argument, a single - 3db BBC dip turns out to be a preferable and desired characteristic, and the playback speaker has a - 3db BBC dip, then a source loudspeaker without a BBC dip will sound preferable. Whereas, a source loudspeaker with a - 3db BBC dip will sound like it had a big midrange suckout when combined with that same playback speaker that also has a - 3db bbc dip (= - 6db exaggerated bbc dip). This is just one example.
Secondly, we know nothing about the placement of speakers in that room. If it was a magazine conducted test (read: not scientific in rigor or protocol), and they were testing a long list of speakers, you can be sure that little attention was paid to properly placing the speaker in that room. The differences heard in perceived playback quality could easily reflect the differences in room placement and not the speakers themselves.
Thirdly we know that the room itself can affect perception of loudspeaker quality and that different rooms can affect the relative ranking of a loudspeaker. This is Harmans. own research btw.
And I could go on and on. When considering whether these are small vs large methodological flaws, my opinion is that they are large.
I get that the need to validate one's own beliefs is strong. While it is certainly possible that speaker B would be preferred under actual live blind listening tests, the experiment here doesn't necessarily demonstrate that due to its deal-breaking poor methodology.
Last edited: