This is abbreviated synopsis of the paper, Ten Years of ABX Testing, by David Clark: http://www.aes.org/e-lib/browse.cfm?elib=5549
ABX testing as you may know is a double blind testing methodology aimed at determining if someone can reliably identify A from B. Both A and B are presented to the user together with "X" which is either A or B. Your job is to say which one it is. This is a "forced choice" test in that you must select one of the two answers (different from preference test where you could assign a fidelity number). Given this, you could easily pick A or B using random guesses. Statistical analysis is performed then to determine the likelihood of you doing this based on the number you got right.
Back to the paper, I am focusing on the second part of the paper given the fact that the topic has come up and there has been no references of it online that I know of. The first part is a resume for ABX testing recounting all the tests that have been performed.
Clark sets out to show the efficacy of ABX using two separate tests:
TESTING THE DOUBLE-BLIND TEST
The sensitivity of the A/B/X test can be tested by comparing it to a long-term listening session with infrequent switching and low stress. Audio magazine encouraged the present author and Lawrence L. Greenhill to undertake such a comparison in 1984. Unfortunately, the results were never published. The experiment used a fixed detection task of identifying whether or not the audio was passed through a nonlinear circuit which generated 2.5% total harmonic distortion on a sine wave. The nonlinearity used called "Grunge") generated a constant distortion, independent of sine wave amplitude or frequency over a wide range. High amounts of the effect produce an annoying "garbled" sound on complex program material. The circuit is described in reference [1].
Two groups of audiophiles were used as subjects. Lawrence Greenhill's Long Island based, The Audiophile Society (TAS) provided the high-end oriented "golden ears." David Clark's Southeastern Michigan Woofer and Tweeter Marching Society (SMWTMS) provided the "engineers." Two sets of tests were to be run with each group. The first test was a group double-blind test of 16 trials comparing the 2.5% distorted signal to a bypass. As it turned out, the TAS group refused to have the signal passed through the relays and connectors of the ABX Comparator. A manually-patched 16-trial pair-comparison test was used instead. They listened to a very expensive sound system which was familiar to most of them. The SMWTMS group used A/B/X testing and an unfamiliar sound system and room. They were given a one-hour familiarization period before the test began.
The second of the tests consisted of ten battery powered black boxes, five of which had the distortion circuit and five of which did not. The sealed boxes appeared identical and were built to golden ear standards with gold connectors, silver solder and buss-bar bypass wiring. Precautions were taken to prevent accidental or casual identification of the distortion by using the on/off switch or by letting the battery run down. The boxes were handed out in a double-blind manner to at least 16 members of each group with instructions to patch them into the tape loop of their home preamplifier for as long as they needed to decide whether the box was neutral or not. This was an attempt to duplicate the long-term listening evaluation favored by golden ears.
So summarizing a distortion generating device was created and handed out to two groups to identify against a dummy device, or pass through with no modification of audio signal as such. Testers were allowed to take the equipment home and evaluate on their own audio system. Testing was done both with quick switching in ABX versus long term evaluation using "take home" version of the same.
This was the outcome:
The results were that the Long Island group [Audiophile/Take Home Group] was unable to identify the distortion in either of their tests. SMWTMS's listeners also failed the "take home" test scoring 11 correct out of 18 which fails to be significant at the 5% confidence level. However, using the A/B/X test, the SMWTMS not only proved audibility of the distortion within 45 minutes, but they went on to correctly identify a lower amount. The A/B/X test was proven to be more sensitive than long-term listening for this task.
So the audiophile group failed to identify the correct box despite the gross amount of distortion that was inserted in the loop and extended evaluation time they had. The ABX believer group using a system they did not know managed in quick order tell the difference between the distortions inserted in the audio path versus not. Not only that, they were able to repeat that by detecting even smaller amount of distortion inserted in the path.
All of this matches my personal experience 100%. The ABX test group had the benefit of training and quick switching. Both of these improve the ability to hear small differences. In the countless tests of small differences I have passed in ABX testing, I would easily fail to do so if you made the test "long term."
Science of our hearing system backs this completely. Our hearing system has a short term and long term recall. Short-term recall is almost like a tape recorder, capturing everything from our ears. That is huge amount of data so the brain applies a massive, lossy filter to what is in short term memory, and commits what is left over to long term memory.
Short-term memory only lasts seconds and is re-written. As such you need to hear both of the stimulus in that short amount of time and analyze them before they fade away. Waiting longer means relying on long term memory which has no ability to remember fine details.
Training helps by optimizing usage of short-term memory by eliminating what doesn't matter.
Summary
All in all, the position of audio science on this matter is clear: fast AB switching is far more revealing than any long term tests. No evidence has ever been presented to show otherwise or to demonstrate anything based on psychoacoustics why that would be so.
ABX testing as you may know is a double blind testing methodology aimed at determining if someone can reliably identify A from B. Both A and B are presented to the user together with "X" which is either A or B. Your job is to say which one it is. This is a "forced choice" test in that you must select one of the two answers (different from preference test where you could assign a fidelity number). Given this, you could easily pick A or B using random guesses. Statistical analysis is performed then to determine the likelihood of you doing this based on the number you got right.
Back to the paper, I am focusing on the second part of the paper given the fact that the topic has come up and there has been no references of it online that I know of. The first part is a resume for ABX testing recounting all the tests that have been performed.
Clark sets out to show the efficacy of ABX using two separate tests:
TESTING THE DOUBLE-BLIND TEST
The sensitivity of the A/B/X test can be tested by comparing it to a long-term listening session with infrequent switching and low stress. Audio magazine encouraged the present author and Lawrence L. Greenhill to undertake such a comparison in 1984. Unfortunately, the results were never published. The experiment used a fixed detection task of identifying whether or not the audio was passed through a nonlinear circuit which generated 2.5% total harmonic distortion on a sine wave. The nonlinearity used called "Grunge") generated a constant distortion, independent of sine wave amplitude or frequency over a wide range. High amounts of the effect produce an annoying "garbled" sound on complex program material. The circuit is described in reference [1].
Two groups of audiophiles were used as subjects. Lawrence Greenhill's Long Island based, The Audiophile Society (TAS) provided the high-end oriented "golden ears." David Clark's Southeastern Michigan Woofer and Tweeter Marching Society (SMWTMS) provided the "engineers." Two sets of tests were to be run with each group. The first test was a group double-blind test of 16 trials comparing the 2.5% distorted signal to a bypass. As it turned out, the TAS group refused to have the signal passed through the relays and connectors of the ABX Comparator. A manually-patched 16-trial pair-comparison test was used instead. They listened to a very expensive sound system which was familiar to most of them. The SMWTMS group used A/B/X testing and an unfamiliar sound system and room. They were given a one-hour familiarization period before the test began.
The second of the tests consisted of ten battery powered black boxes, five of which had the distortion circuit and five of which did not. The sealed boxes appeared identical and were built to golden ear standards with gold connectors, silver solder and buss-bar bypass wiring. Precautions were taken to prevent accidental or casual identification of the distortion by using the on/off switch or by letting the battery run down. The boxes were handed out in a double-blind manner to at least 16 members of each group with instructions to patch them into the tape loop of their home preamplifier for as long as they needed to decide whether the box was neutral or not. This was an attempt to duplicate the long-term listening evaluation favored by golden ears.
So summarizing a distortion generating device was created and handed out to two groups to identify against a dummy device, or pass through with no modification of audio signal as such. Testers were allowed to take the equipment home and evaluate on their own audio system. Testing was done both with quick switching in ABX versus long term evaluation using "take home" version of the same.
This was the outcome:
The results were that the Long Island group [Audiophile/Take Home Group] was unable to identify the distortion in either of their tests. SMWTMS's listeners also failed the "take home" test scoring 11 correct out of 18 which fails to be significant at the 5% confidence level. However, using the A/B/X test, the SMWTMS not only proved audibility of the distortion within 45 minutes, but they went on to correctly identify a lower amount. The A/B/X test was proven to be more sensitive than long-term listening for this task.
So the audiophile group failed to identify the correct box despite the gross amount of distortion that was inserted in the loop and extended evaluation time they had. The ABX believer group using a system they did not know managed in quick order tell the difference between the distortions inserted in the audio path versus not. Not only that, they were able to repeat that by detecting even smaller amount of distortion inserted in the path.
All of this matches my personal experience 100%. The ABX test group had the benefit of training and quick switching. Both of these improve the ability to hear small differences. In the countless tests of small differences I have passed in ABX testing, I would easily fail to do so if you made the test "long term."
Science of our hearing system backs this completely. Our hearing system has a short term and long term recall. Short-term recall is almost like a tape recorder, capturing everything from our ears. That is huge amount of data so the brain applies a massive, lossy filter to what is in short term memory, and commits what is left over to long term memory.
Short-term memory only lasts seconds and is re-written. As such you need to hear both of the stimulus in that short amount of time and analyze them before they fade away. Waiting longer means relying on long term memory which has no ability to remember fine details.
Training helps by optimizing usage of short-term memory by eliminating what doesn't matter.
Summary
All in all, the position of audio science on this matter is clear: fast AB switching is far more revealing than any long term tests. No evidence has ever been presented to show otherwise or to demonstrate anything based on psychoacoustics why that would be so.
Last edited: