• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Catalogue of blind tests

Do partially-sighted speaker faceoffs tally well with blind test results? If they can be shown to tally well then we we can trust many comparisons - anyone can do a face off (and many do...), while blind tests are too tricky for most to attempt. Example:


Audio Musings by Sean Olive

A blog about the science of sound recording and reproduction
seanolive.blogspot.com
seanolive.blogspot.com

To quote from that page (emphasis mine):

"In summary, the sighted and blind loudspeaker listening tests in this study produced significantly different sound quality ratings. The psychological biases in the sighted tests were sufficiently strong that listeners were largely unresponsive to real changes in sound quality caused by acoustical interactions between the loudspeaker, its position in the room, and the program material. In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. It may already be too late according to Stereophile magazine founder, Gordon Holt, who lamented in a recent interview:
 
 

The perceptibility of high-resolution versus CD standard audio has been the subject of research and debate since the introduction of hi-res audio distribution formats twenty years ago. The author conducted a large survey to determine whether experienced listeners could differentiate between a diverse set of twenty native high-resolution PCM stereo recordings and down conversions of the same masters at 44.1 kHz/16-bits fidelity. Participants were encouraged to audition the files using their own systems, which ranged from modest, headphone-based personal setups to audiophile quality rooms costing in excess of $50,000 to professional studio environments. They were not allowed to use analytical tools or other non-listening means to assist in their observations. Over 400 responses were received from professional audio engineers, experienced audiophiles, casual music enthusiasts, and novices aged eleven to eighty-one years. The online survey submissions show that high-resolution audio was undetectable by a substantial majority of the respondents regardless of experience level, equipment cost, or process with almost 25% choosing “No Choice.”. However, some evidence exists that specific genres and recordings produced moderately higher positives.
 
How many flaws in this blind test?

Have to wait until they post the full video of the actual testing to tell... but I can definitely say the premise is flawed at the least. The test seems to be simply can you tell your cable from another - however the conclusion is that, since you can... his cable sounds better. As soon as you start saying you can easily hear differences in "sound-stage" and clarity based on speaker cables (not to mention anything from Shunyata Research being heralded) I've already lost interest.

Would be interesting to see if they could find a panel of 5-6 people that picked his cable out as "sounding the best" with 80% accuracy - I'll wait for that before buying. LOL! :rolleyes:
 
Well, hearing a difference is progress, for cables. I didn’t watch the whole thing - any tests or measurements? I mean, if the cables have unusual resistance or capacitance, this would be the expected result.
 
Another paper by Tom Nousaine Can You Trust Tour Ears ?
CONCLUSIONS:
There are suggestions for listening test design and evaluations that can be drawn from this data.
Likewise several important considerations for consumer protection in purchase surroundings can be seen.

Listening Evaluation Design:

1. Blind or double blind listening techniques are an absolute must for final evaluation confirmation.
Listeners are strongly disposed to report differences that do not exist when they expect to hear them. This bias has been demonstrated consistently through analysis of 15 years of published blind testing and our test data shows the effect to be real and probably persistent across the general population.

2. Level matching is also an absolute must in subjective evaluations. The bias to hear differences is equally divided among first and second alternatives until loudness level differences are introduced. At that point preferences swing strongly to the louder alternative especially so when that choice comes as the second alternative of two.

3. Analysis of test results employing common scoring formats (Is A the Same or Different from B? and Do You Prefer A, В or Have No Preference?) must have an equal number of trials where A and В are the same and when they are different if the typical significance test using the >50% criterion rule is used or misinterpretation of results is likely.

For example in Martin Colloms 1986 HFNNR Amplifier Comparison[3] the experimenter concluded that when subjects correctly 63% of the time in 150 trials this statistically showed they were reliably able to identify amplifiers under blind conditions. However, careful examination reveals Colloms based this conclusion using only those trials where the amplifiers were actually different.
We also find during those trials where the amplifiers were the same subjects only correctly identified В as the same 65% of the time. Thus Colloms should have adjusted his criterion score to include those trials where A and В were the same as though the number of trials had been identical to when they were different. When this is done his results reverse when subjected to the 50% correct at 95 % confidence level criterion. Alternately a comparison of the two correct identification rates (63% when they were different versus 65% when they were the same) leads one to the same conclusion: not significant.
This same criticism applies to the 1990 Stereophile CD Tweaks[4] report where the experimenters employed an unequal number of trials where A and В were the same versus different and falsely conclude they had statistically significant results.
Interestingly, data presented in that report has several subanalysis that show a strong response bias to report differences when choices are identical. 4. The ABX Double Blind[5] method automatically guards against even inadvertently biased experiments caused the problems outlined in item 3.

Consumer Purchase Advice:
Purchase decisions are usually made with a salesperson, product reviewer or even a well intentioned friend serving as “coach.” The data (People are inclined to hear differences even if they don’t exist, Small loudness differences will be interpreted as quality differences and Order magnifies loudness) suggest several shopping tips:

* During an A-B comparison try to assume control of the volume control.
* Recognize that even if you control the level you may not be able to assure equal loudness.
* Extend A/В tests to A/B/A and be suspicious of tests where A cannot be repeated (e.g. disc treatments.)
* Remember that extensive blind testing strongly suggests that many components (amplifiers, wires[s' and CD Tweaks1-7!) for example, cannot be reliably identified under blind conditions.
* If you are ever “in doubt” during a listening evaluation, even for a moment, wait until tomorrow to make your decision.
* Your Coach hears things too!
 
  • Like
Reactions: MAB
The most outrageous was probably the old rec.audio.high-end stuff between high-end retailer Steve Zipser and Tom Nousaine. It was written up in an issue of Peter Aczel's Audio Critic. I've posted it before, but here it is again:

On Sunday afternoon, August 25th, Maki and I arrived at Zipser's house, which is also Sunshine Stereo. Maki brought his own control unit, a Yamaha AX-700 100-watt integrated amplifier for the challenge. In a straight 10-trial hard-wired comparison, Zipser was only able to identify correctly 3 times out of 10 whether the Yamaha unit or his pair of Pass Laboratories Aleph 1.2 monoblock 200-watt amplifiers was powering his Duntech Marquis speakers. A Pass Labs preamplifier, Zip's personal wiring, and a full Audio Alchemy CD playback system completed the playback chain. No device except the Yamaha integrated amplifier was ever placed in the system. Maki inserted one or the other amplifier into the system and covered them with a thin black cloth to hide identities. Zipser used his own playback material and had as long as he wanted to decide which unit was driving the speakers.

I had matched the playback levels of the amplifiers to within 0.1 dB at 1 kHz, using the Yamaha balance and volume controls. Playback levels were adjusted with the system preamplifier by Zipser. I also determined that the two devices had frequency response differences of 0.4 dB at 16 kHz, but both were perfectly flat from 20 Hz to 8 kHz. In addition to me, Zipser, and Maki, one of Zip's friends, his wife, and another person unknown to me were sometimes in the room during the test, but no one was disruptive and conditions were perfectly quiet.

As far as I was concerned, the test was over. However, Zipser complained that he had stayed out late the night before and this reduced his sensitivity. At dinner, purchased by Zipser, we offered to give him another chance on Monday morning before our flight back North. On Monday at 9 a.m., I installed an ABX comparator in the system, complete with baling-wire lead to the Yamaha. Zipser improved his score to 5 out of 10. However, my switchpad did develop a hang-up problem, meaning that occasionally one had to verify the amplifier in the circuit with a visual confirmation of an LED. Zipser has claimed he scored better prior to the problem, but in fact he only scored 4 out of 6 before any difficulties occurred.

His wife also conducted a 16-trial ABX comparison, using a 30-second phrase of a particular CD for all the trials. In this sequence I sat next to her at the main listening position and performed all the amplifier switching functions according to her verbal commands. She scored 9 out of 16 correct. Later another of Zip's friends scored 4 out of 10 correct. All listening was done with single listeners.

In sum, no matter what you may have heard elsewhere, audio store owner Steve Zipser was unable to tell reliably, based on sound alone, when his $14,000 pair of class A monoblock amplifiers was replaced by a ten-year old Japanese integrated amplifier in his personal reference system, in his own listening room, using program material selected personally by him as being especially revealing of differences. He failed the test under hardwired no-switching conditions, as well as with a high-resolution fast-comparison switching mode.
Steve Zipser and his wife visited with me in Colorado. After all the battles royale we had fought online, I was pleasantly surprised to find Steve and his wife were lovely people. Steve also said he didn't really doubt that DBTs yielded important results.

I was quite bummed when Steve passed shortly after that night.
 
I found this on AVSForum. They compared the $500,000 TechDas Air Force Zero turntable to a $78 (in 1961, ~$600 today) AR-XA. They compared vinyl rips, with methodology similar to Archimago's. The cartridges used had prices of $13,000 vs. $99. The A/D converters were $2500 vs. $80. The total system costs estimate was $720,056 vs. $561 (or $1083 if you price the AR at $600). The TechDas weighs ~834 pounds (379 kg) without arm(s), the AR ~14 pounds (6.5 kg) with arm. The photo below is to ~correct size scale.

TechDastest.jpg

$500,000 vs. $78 Turntable Blind Test​


TechDas.png


Turntable A was not the TechDas...

The post there refers to Michael Fremer's less exacting blind tests with the TechDas. The 1st half of the comments relate to the test, and then it ~becomes an analog vs. digital argument. The person who set up the test and responds to comments does a nice job to try to fight the insanity. :cool:
 
Last edited:

posted first here:



This is yet another good example of the tendency to hear a difference where there is none. Which means you have to be careful of AB tests where people "hear a difference" vs ABX test which provide more control for that bias.
 

Audio Musings by Sean Olive

A blog about the science of sound recording and reproduction
seanolive.blogspot.com
seanolive.blogspot.com

To quote from that page (emphasis mine):

"In summary, the sighted and blind loudspeaker listening tests in this study produced significantly different sound quality ratings. The psychological biases in the sighted tests were sufficiently strong that listeners were largely unresponsive to real changes in sound quality caused by acoustical interactions between the loudspeaker, its position in the room, and the program material. In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. It may already be too late according to Stereophile magazine founder, Gordon Holt, who lamented in a recent interview:
Interesting conclusion. How exactly does the data he cites support it? Looks to me like 1. There’s a woefully small amount of data to draw any conclusions and 2. The results in rankings between the blind and sighted tests were mostly the same. I’m not saying blind protocols aren’t important. But I don’t see how his data supports the extreme conclusion that the listeners were largely unresponsive to real changes in sound quality. So out of four speakers first and second place were just coincidences?
 
Interesting conclusion. How exactly does the data he cites support it? Looks to me like 1. There’s a woefully small amount of data to draw any conclusions and 2. The results in rankings between the blind and sighted tests were mostly the same. I’m not saying blind protocols aren’t important. But I don’t see how his data supports the extreme conclusion that the listeners were largely unresponsive to real changes in sound quality. So out of four speakers first and second place were just coincidences?
Data he cites is a book and various AES paper that is behind a paywall. Did you read the book or download the papers?
 
Back
Top Bottom