- Thread Starter
- #61
Again thanks for comments and ideas, we will definitely factor those in to the next time we do a test.
But while the problems that have been pointed out are valid, it doesn't change the fundamental problem of explaining how I ended up with exactly the same sequence of 1,2,3 six times in a row. I would love for a the true statistician on this thread to maybe do a better job, but let me take a quick stab at it.
Hypothesis: It is unlikely, if not impossible to discern audible differences in amplifiers that have bench test results that are beyond audibility. In other words the standard suite of tests that we perform, Frequency Response, Noise, IMD etc, fully characterize the audio response such that two amplifiers (or DACs) that have similar results should be indistinguishable. (Of course we can assume that the audible test would be performed in a linear response region, i.e. not clipping).
Real world limitations in the testing that have been pointed out, and more importantly how could each SKEW the results in a particular direction. Random skew doesn't matter as with a sufficient number of samples, the skew will be average out.
1. In accurate measurements of level due to the use of an acoustic reference instead of electrical. Random skew factor as this would affect each amplifier measurement (1/36) equally.
2. Acoustic memory limitations. Random skew factor as this would affect each amplifier measurement (1/36) equally.
3. Preamp impedance issues: This could be a systematic error that skews in a particular direction. This seems highly unlikely in today's world of modern DAC's. The D90SE has an output impedance of 100ohms with the Benchmark's input impedance is 50K while the Eval1 is 10.2K. It isn't clear why this would matter.
4. Clipping: None of the amplifiers were clipping when we listened to selections as they were all moderate since our interest was in hearing differences, not how loud they could get. That doesn't seem to be a very plausible reason to invalidate the results.
So what is the probability in this case of picking 6 tests with the same order RANDOMLY? It is 1 in 46,656. In this case the results actually strongly favor that there is an audible difference that can't be explained by chance. Even if the odds were reduced to only 3 of the 6 tests were valid due to errors in the method that were random, that still is 1 in 216. That still doesn't favor the explanation that it is random and therefore inaudible.
Here is what I would strongly recommend. Why don't a few others repeat the tests and see what you get? There is nothing like actually using the scientific method and testing a hypothesis vs. theorizing about it. Remember the basis for this hypothesis is bench testing can measure anything that we can discern audibly. How well have we tested that hypothesis? If anything with the specs on every new DAC and amplifier approaching the limits of test equipment, testing this hypothesis should be easier and easier.
But while the problems that have been pointed out are valid, it doesn't change the fundamental problem of explaining how I ended up with exactly the same sequence of 1,2,3 six times in a row. I would love for a the true statistician on this thread to maybe do a better job, but let me take a quick stab at it.
Hypothesis: It is unlikely, if not impossible to discern audible differences in amplifiers that have bench test results that are beyond audibility. In other words the standard suite of tests that we perform, Frequency Response, Noise, IMD etc, fully characterize the audio response such that two amplifiers (or DACs) that have similar results should be indistinguishable. (Of course we can assume that the audible test would be performed in a linear response region, i.e. not clipping).
Real world limitations in the testing that have been pointed out, and more importantly how could each SKEW the results in a particular direction. Random skew doesn't matter as with a sufficient number of samples, the skew will be average out.
1. In accurate measurements of level due to the use of an acoustic reference instead of electrical. Random skew factor as this would affect each amplifier measurement (1/36) equally.
2. Acoustic memory limitations. Random skew factor as this would affect each amplifier measurement (1/36) equally.
3. Preamp impedance issues: This could be a systematic error that skews in a particular direction. This seems highly unlikely in today's world of modern DAC's. The D90SE has an output impedance of 100ohms with the Benchmark's input impedance is 50K while the Eval1 is 10.2K. It isn't clear why this would matter.
4. Clipping: None of the amplifiers were clipping when we listened to selections as they were all moderate since our interest was in hearing differences, not how loud they could get. That doesn't seem to be a very plausible reason to invalidate the results.
So what is the probability in this case of picking 6 tests with the same order RANDOMLY? It is 1 in 46,656. In this case the results actually strongly favor that there is an audible difference that can't be explained by chance. Even if the odds were reduced to only 3 of the 6 tests were valid due to errors in the method that were random, that still is 1 in 216. That still doesn't favor the explanation that it is random and therefore inaudible.
Here is what I would strongly recommend. Why don't a few others repeat the tests and see what you get? There is nothing like actually using the scientific method and testing a hypothesis vs. theorizing about it. Remember the basis for this hypothesis is bench testing can measure anything that we can discern audibly. How well have we tested that hypothesis? If anything with the specs on every new DAC and amplifier approaching the limits of test equipment, testing this hypothesis should be easier and easier.