• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Catalogue of blind tests

ahofer

Master Contributor
Forum Donor
Joined
Jun 3, 2019
Messages
5,521
Likes
10,181
Location
New York City
I thought this thread on Head-Fi was a valiant effort to put together blind tests that have been performed over the years. Given that there have been many more (I'm thinking Archimago's tests, among others), it would be fun to open source a complete list.

https://www.head-fi.org/threads/testing-audiophile-claims-and-myths.486598/

(incidentally, the old Stereo Review amp test from January 1987 is now available here - https://americanradiohistory.com/Archive-HiFI-Stereo/80s/HiFi-Stereo-Review-1987-01.pdf )

A thread is a bad way to do it, because new tests get buried in the responses, and it's hard to keep the tests grouped by subject matter (cables, amps, single-blind, double-blind). I was wondering if there would be much appetite to do this in a wiki or other open-sourced form. I'd be willing to start. Or has it been done? I'd like to know I'm seeing everything.

On another forum, it was asserted to me today that

It’s not even remotely true that all tests "fail to reject the null hypothesis."

His example (the Carver challenge) didn't support his assertion, nor does the Stereophile amp test which ended up proving the participant's were biased to hear a difference between..the same amp. (https://www.stereophile.com/features/113/index.html) He hasn't provided another. Are there any? Am I missing some great volume of blind tests that show listeners can distinguish between adequately powered amps/cables without inserts/resolutions above mp3?

If you know of other good, or more current, lists, please do drop them here.
 
If you know of other good, or more current, lists, please do drop them here.
The most outrageous was probably the old rec.audio.high-end stuff between high-end retailer Steve Zipser and Tom Nousaine. It was written up in an issue of Peter Aczel's Audio Critic. I've posted it before, but here it is again:

On Sunday afternoon, August 25th, Maki and I arrived at Zipser's house, which is also Sunshine Stereo. Maki brought his own control unit, a Yamaha AX-700 100-watt integrated amplifier for the challenge. In a straight 10-trial hard-wired comparison, Zipser was only able to identify correctly 3 times out of 10 whether the Yamaha unit or his pair of Pass Laboratories Aleph 1.2 monoblock 200-watt amplifiers was powering his Duntech Marquis speakers. A Pass Labs preamplifier, Zip's personal wiring, and a full Audio Alchemy CD playback system completed the playback chain. No device except the Yamaha integrated amplifier was ever placed in the system. Maki inserted one or the other amplifier into the system and covered them with a thin black cloth to hide identities. Zipser used his own playback material and had as long as he wanted to decide which unit was driving the speakers.

I had matched the playback levels of the amplifiers to within 0.1 dB at 1 kHz, using the Yamaha balance and volume controls. Playback levels were adjusted with the system preamplifier by Zipser. I also determined that the two devices had frequency response differences of 0.4 dB at 16 kHz, but both were perfectly flat from 20 Hz to 8 kHz. In addition to me, Zipser, and Maki, one of Zip's friends, his wife, and another person unknown to me were sometimes in the room during the test, but no one was disruptive and conditions were perfectly quiet.

As far as I was concerned, the test was over. However, Zipser complained that he had stayed out late the night before and this reduced his sensitivity. At dinner, purchased by Zipser, we offered to give him another chance on Monday morning before our flight back North. On Monday at 9 a.m., I installed an ABX comparator in the system, complete with baling-wire lead to the Yamaha. Zipser improved his score to 5 out of 10. However, my switchpad did develop a hang-up problem, meaning that occasionally one had to verify the amplifier in the circuit with a visual confirmation of an LED. Zipser has claimed he scored better prior to the problem, but in fact he only scored 4 out of 6 before any difficulties occurred.

His wife also conducted a 16-trial ABX comparison, using a 30-second phrase of a particular CD for all the trials. In this sequence I sat next to her at the main listening position and performed all the amplifier switching functions according to her verbal commands. She scored 9 out of 16 correct. Later another of Zip's friends scored 4 out of 10 correct. All listening was done with single listeners.

In sum, no matter what you may have heard elsewhere, audio store owner Steve Zipser was unable to tell reliably, based on sound alone, when his $14,000 pair of class A monoblock amplifiers was replaced by a ten-year old Japanese integrated amplifier in his personal reference system, in his own listening room, using program material selected personally by him as being especially revealing of differences. He failed the test under hardwired no-switching conditions, as well as with a high-resolution fast-comparison switching mode.
 
I remember following the Zipser challenge in 'real time' on that audio board. Fun days!
 
I think doing an ABX reveals a deeper truth. I think it is very common for objectivists to say all amplifiers, CD players etc sound the same, but I think that is a simplification as I can accept there may be differences,. But, if you do a double blind test such as ABX there are two questions really, the first is can you discern a difference, and the second is does any difference you discern actually matter. My own view is that if you identify different components in a double blind test but have to risk brain overheat in doing so and suffer anxiety at the thought of it all then that says that yes there may be differences but that those differences are irrelevant in the context of using audio equipment to listen to music. Hence in a way two seemingly contradictory things can be true simultaneously, that there may very well be discernible differences between amplifiers, DACs, CD players etc but that if well designed and implemented without having been set up for a particular sound to set them apart from the mainstream (such as euphonic distortion in some tube amps for example) these differences will be so minor that in reality they are not discernible if listening to music rather than listening to equipment.
 
Last edited:
I think doing an ABX reveals a deeper truth. I think it is very common for objectivists to say all amplifiers, CD players etc sound the same, but I think that is a simplification as I can accept there may be differences,. But, if you do a double blind test such as ABX there are two questions really, the first is can you discern a difference, and the second is does any difference you discern actually matter. My own view is that if you identify different components in a double blind test but have to risk brain overheat in doing so and suffer anxiety at the thought of it all then that says that yes there may be differences but that those differences are irrelevant in the context of using audio equipment to listen to music. Hence in a way two seemingly contradictory things can be true simultaneously, that there may very well be discernible differences between amplifiers, DACs, CD players etc but that if well designed and implemented without having been set up for a particular sound to set them apart from the mainstream (such as euphonic distortion in some tube amps for example) these differences will be so minor that in the real world these differences are so minor that in reality they are not discernible if listening to music rather than listening to equipment.

Yeah, more or less the way I feel. It's not just about "is there a difference." Particularly when there's lots of $ involved. I mean if you compare a $500 component against one costing $5000 and you have to struggle to hear a difference, and really they both sound great in a subjective musicality sense, the fact that you might actually be able to identify some barely perceptible difference is almost a moot point. In the real world of audio listening, there are so many variables that have a real impact on the quality of the sound we hear that minuscule little differences between say a $50 dac and a $300 one (if they exist at all) are just going to be buried under ambient noise, room acoustics, headphone positioning, bariatric pressure (lol), etc etc...
 
Yes indeed, struggling to hear a difference is meaningless, the difference is either relatively easy to hear or it does not exist.

A couple of years ago there was an article (editorial I *think*) in an LA Journal about how the reviewer/tester moved a glass of water on the table and was able tp measure a difference of several dB in a notch ta some frequency in the upper mids... The lesson is that moving one's head, or a glass, a few cm is enough to change what we hear slightly, so the differences in the equipment are either swamped by environmental listening effects or they don't exist.
 
I agree with the sentiments above. But I believe it remains true that nearly all the blind tests of amps/cables/digital sources fail to reject the null hypothesis that there is no audible difference. It would be good to put together a definitive compilation.

I think a Wiki might be the best way to do it, so new tests can be edited into the appropriate place as time goes by. It could be sorted by equipment type and methodology, and click through to a separate wiki page for each test, with results, links, and comments/criticism.
 
  • Like
Reactions: Wes
I think doing an ABX reveals a deeper truth. I think it is very common for objectivists to say all amplifiers, CD players etc sound the same, but I think that is a simplification as I can accept there may be differences,. But, if you do a double blind test such as ABX there are two questions really, the first is can you discern a difference, and the second is does any difference you discern actually matter. My own view is that if you identify different components in a double blind test but have to risk brain overheat in doing so and suffer anxiety at the thought of it all then that says that yes there may be differences but that those differences are irrelevant in the context of using audio equipment to listen to music. Hence in a way two seemingly contradictory things can be true simultaneously, that there may very well be discernible differences between amplifiers, DACs, CD players etc but that if well designed and implemented without having been set up for a particular sound to set them apart from the mainstream (such as euphonic distortion in some tube amps for example) these differences will be so minor that in reality they are not discernible if listening to music rather than listening to equipment.

Agreed.

What so often gets lost in the subjectivist excuses for poor results in blind tests is that they have gone from claims "the differences are SO OBVIOUS you'd have to have clothe ears not to hear it!" to, upon failing blind tests, excuses like "but the differences are so subtle they go away if I'm at all stressed about trying to detect them!"

It's either one or the other. They can't have it both ways.
 
What would be great would be to catalog the actual test here like above. Many are hard to find.

Yes, and all the issues of Stereo Review are now available on line.
Many discussion board software packages have a wiki feature, does this one?
 
On the one hand, interesting and a good referrence... on the other hand "OWWW MY EYES!" o_O

Certainly not a scientific analysis... but why does there seem to be an inverse relationship between the intellectual quality of the content and the aesthetics of presentation? I swear there are so many brilliant engineers and scientists out there with web pages that look like they were designed by a 12 year old girl for her myspace page in 2004.
 
Last edited:
[..] My own view is that if you identify different components in a double blind test but have to risk brain overheat in doing so and suffer anxiety at the thought of it all then that says that yes there may be differences but that those differences are irrelevant in the context of using audio equipment to listen to music.
I second that. If the differences between two components are said to be "night and day" I expect a zero failure rate, that is 10 of 10 correct decisions in any DBT. If just one decision is wrong the differences are so miniscule that they must be insignificant in real use, which is listening to music.

It happened to me when I did a BT between 2 Arcam Black Box 3 DACs (one in original state, one modified). In the first sighted listening I noticed big differences, in the BT I scored 8 of 10 correct which means that there is enough statistical significance for a real difference. However it was very difficult to hear any difference at all, so I concluded that the difference is so small as to be insignificant for real use. I therefore did not modify the second DAC.
 
Last edited:
I find with speakers that differences do tend to be quite clear, and set up in a room makes a clear difference (though that's not to say speakers that I don't prefer are bad or that I don't enjoy listening to music through them). Ditto for headphones. When it comes to the other parts of the chain I find you need to stumble over one of the statistical outliers to find a component that really does introduce an obvious difference.
 
I find with speakers that differences do tend to be quite clear, and set up in a room makes a clear difference (though that's not to say speakers that I don't prefer are bad or that I don't enjoy listening to music through them). Ditto for headphones. When it comes to the other parts of the chain I find you need to stumble over one of the statistical outliers to find a component that really does introduce an obvious difference.

That's my experience as well, even with speakers of very similar response and components. With everything else, it's more a case of non-musical interference (noise, etc.) that's the only difference if there is one. Whether that's a case of a poorly made or failing component, or (much more commonly) of a component being "abused". Even with noisy amps, most of the time I can only tell the difference by listening to the noise coming from the tweeter without a signal (at a couple inches). I certainly can't tell any difference between the various DACs, optical players, cables, etc. - even in a sighted comparison... though I probably could make myself think I could if I listened expecting one.

It seems the most common application (in audio) of a DBT is in attempting to prove to snake-oil victims... that they are indeed victims of deception (either self deception or marketing & pseudo-science). However, I think the belief that this will yield significant results is unrealistic in many cases. The majority of people investing tons of time and money into esoteric gear of dubious value - or any other belief for that matter - are committed to some extent. For them there could never be a "perfect enough" testing setup - because part of the belief system revolves around the intangible. For the other major demographic... there wasn't a belief in the significance of esoteric methods or products in the first place.

So, in reality, the majority of DBT's in audio are for objectivists confirming selections likely chosen based on measurements and specifications - and for those who are convinced that enough evidence will somehow change the position of the first group. I think that's a pretty small minority in the total population of audio consumers and music enthusiasts. However, that's just my feeling and not a scientific analysis - so it could be as flawed as your average Shunyata product whitepaper. ;)
 
It seems the most common application (in audio) of a DBT is in attempting to prove to snake-oil victims... that they are indeed victims of deception (either self deception or marketing & pseudo-science). However, I think the belief that this will yield significant results is unrealistic in many cases. The majority of people investing tons of time and money into esoteric gear of dubious value - or any other belief for that matter - are committed to some extent. For them there could never be a "perfect enough" testing setup - because part of the belief system revolves around the intangible. For the other major demographic... there wasn't a belief in the significance of esoteric methods or products in the first place.

While I agree with that to an extent, I think the importance of DBTs relates more to a middle-ground group - which I'd put myself in. There was a time when I sorta bought in to things like burn-in and esoteric cables. But I didn't do so to an extreme extent and was always a little suspicious of the benefits I thought I was hearing, and encountering the results of various tests that have been done sort of moved the dial over to the objectivist end of the spectrum for me. I think more (and better) information is always a good thing. I think for the group of people who isn't entirely convinced one way or the other, the more evidence pointing to the truth the better...
 
Back
Top Bottom