• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Amplifier Bakeoff: Purifi Eval1, McIntosh MA252 & Benchmark AHB2

OP
P

PJ2000

Member
Joined
Aug 19, 2021
Messages
26
Likes
86
Again thanks for comments and ideas, we will definitely factor those in to the next time we do a test.

But while the problems that have been pointed out are valid, it doesn't change the fundamental problem of explaining how I ended up with exactly the same sequence of 1,2,3 six times in a row. I would love for a the true statistician on this thread to maybe do a better job, but let me take a quick stab at it.

Hypothesis: It is unlikely, if not impossible to discern audible differences in amplifiers that have bench test results that are beyond audibility. In other words the standard suite of tests that we perform, Frequency Response, Noise, IMD etc, fully characterize the audio response such that two amplifiers (or DACs) that have similar results should be indistinguishable. (Of course we can assume that the audible test would be performed in a linear response region, i.e. not clipping).

Real world limitations in the testing that have been pointed out, and more importantly how could each SKEW the results in a particular direction. Random skew doesn't matter as with a sufficient number of samples, the skew will be average out.

1. In accurate measurements of level due to the use of an acoustic reference instead of electrical. Random skew factor as this would affect each amplifier measurement (1/36) equally.

2. Acoustic memory limitations. Random skew factor as this would affect each amplifier measurement (1/36) equally.

3. Preamp impedance issues: This could be a systematic error that skews in a particular direction. This seems highly unlikely in today's world of modern DAC's. The D90SE has an output impedance of 100ohms with the Benchmark's input impedance is 50K while the Eval1 is 10.2K. It isn't clear why this would matter.

4. Clipping: None of the amplifiers were clipping when we listened to selections as they were all moderate since our interest was in hearing differences, not how loud they could get. That doesn't seem to be a very plausible reason to invalidate the results.

So what is the probability in this case of picking 6 tests with the same order RANDOMLY? It is 1 in 46,656. In this case the results actually strongly favor that there is an audible difference that can't be explained by chance. Even if the odds were reduced to only 3 of the 6 tests were valid due to errors in the method that were random, that still is 1 in 216. That still doesn't favor the explanation that it is random and therefore inaudible.

Here is what I would strongly recommend. Why don't a few others repeat the tests and see what you get? There is nothing like actually using the scientific method and testing a hypothesis vs. theorizing about it. Remember the basis for this hypothesis is bench testing can measure anything that we can discern audibly. How well have we tested that hypothesis? If anything with the specs on every new DAC and amplifier approaching the limits of test equipment, testing this hypothesis should be easier and easier.
 

tmtomh

Major Contributor
Forum Donor
Joined
Aug 14, 2018
Messages
2,634
Likes
7,483
No we had to use it in the highest gain mode for as close to parity as possible across the amplifers.

Aside from its price, this is my one issue with the Benchmark. I know it's not an issue with an active preamp, and probably not an issue with a balanced (and therefore 4V instead of 2v) line-level source going directly into it. But still, I feel like some of the Benchmark's measured advantage over Purifi and Hypex amps (not to mention other very good amps that meet the full 29dB THX gain mark) is because of the reduced gain. We've seen with a variety of products Amir has tested that reduced gain is sometimes a way to squeeze another few dB of SINAD out of a design.
 

pogo

Major Contributor
Joined
Sep 4, 2020
Messages
1,239
Likes
382
What I noticed that distinguished the best (Purifi) from the worst (Benchmark) in my case was the following: The Purifi had better clarity, detail and high end compared to the other two, especially the Benchmark and it was noticeable. It was especially easy to hear differences for me with complex stringed instruments like a strummed acoustic guitar or cello and cymbals and triangles.
One explanation could be the damping factor.
The Purifi has a high DF, the McIntosh a moderate wide band DF and the Benchmark a falling low DF in this range.
According to soundstagenetwork measurements, Amplifier damping factor, not bridged, (ref. 8 ohms 20Hz – 6.5kHz):
NAD C298 (Purifi) -> min. 1500
Benchmark AHB2 -> min. 65

A counter test could be done with a T+A A200, which allows a switchable DF on the Purifis:
Link

'A high damping factor tends to produce a more clearly defined, very precise and analytical sound image, whereas a reduced damping factor produces a more warm and softer sound image.' <- extract from the manual
 
Last edited:

peng

Master Contributor
Forum Donor
Joined
May 12, 2019
Messages
5,614
Likes
5,167
Here is what I would strongly recommend. Why don't a few others repeat the tests and see what you get? There is nothing like actually using the scientific method and testing a hypothesis vs. theorizing about it. Remember the basis for this hypothesis is bench testing can measure anything that we can discern audibly. How well have we tested that hypothesis? If anything with the specs on every new DAC and amplifier approaching the limits of test equipment, testing this hypothesis should be easier and easier.
Sounds fair, but logically there is an issue if specs and measurements cannot do the job, because then we have to ask Bruno if he designed his amp based on facts, data and theory and then use listening tests to fine tune his final design. If not, then would it be possible that your Purifi amp's superiority in this case wasn't totally by design, but also by luck, random..

I doubt most amp designers would do that kind of things, except may be a few. I can think of at least one, Mr. Peter Walker did not do such thing, but Denon/Marantz claim they use their "Sound masters" though they never said if those masters had the power to tell their engineers/designers to keep tweaking certain thing such as resistor/capacitor values, bias, feedback etc.

If it was done based on listening test feedback to finalize the design, then how should one pick his/her choice, not knowing if the designer(s) would have the same taste. If in fact the best sounding one is the most transparent one, then you would have picked the AHB2. If not, which part of the available measurements would have been missed? And if we know that, then every manufacturer should be able to design/build one that would be as transparent as it possible could, enough such that people with discerning hearing like you could not tell them apart.

Again, I am only talking about logic here, not saying your test is not valid.

So if all this is true, that specs and measurements cannot predict which one would win in such a test, then the end results for us shopping for the "best sounding" amp, or the most "transparent/accurate" power amp would be randomly determined. Again, back to logic, so if your test is totally valid (I know you are saying that..), then we all should go and build a Purifi amp like yours and save some money. Then the question is, how would we know if we could save even more money by going with a NC502MP, or just another $600 amp?

Regardless, really appreciate your efforts in this, but it would be great if you are willing to invest more time to do another test, incorporating some of the suggested improvements. I would also suggest the trials be increased to 20 times or at least double it to 12 times, though you are right, 6 times is not enough, but combined with the exact 1,2,3 order it really is enough. So I believe the key to the puzzle is still the methodology, that likely need to be improved.

Oh, about your question: Why don't a few others repeat the tests and see what you get?

I think my answer is, as I alluded to earlier in my point on the logic. It would be sad if in order to pick the best sounding amp, one would have to do such a test and one that needs to include a lot more than those 3 amps. We really should be able to pick just based on specs and measurements, then it is just a matter of identifying all the specs and measurements needed, to predict reliably..

Or if accuracy is not as important for sound quality as the actual perceived sound quality in such a listening test comparison, then the question is, your pick of the Purifi would be valid only to you, because then the preference would be not based on accuracy, but "preference" that is subjective in nature.
 
Last edited:

BelgianJoey

Member
Joined
Jun 16, 2021
Messages
30
Likes
67
Again thanks for comments and ideas, we will definitely factor those in to the next time we do a test.

But while the problems that have been pointed out are valid, it doesn't change the fundamental problem of explaining how I ended up with exactly the same sequence of 1,2,3 six times in a row. I would love for a the true statistician on this thread to maybe do a better job, but let me take a quick stab at it.

Hypothesis: It is unlikely, if not impossible to discern audible differences in amplifiers that have bench test results that are beyond audibility. In other words the standard suite of tests that we perform, Frequency Response, Noise, IMD etc, fully characterize the audio response such that two amplifiers (or DACs) that have similar results should be indistinguishable. (Of course we can assume that the audible test would be performed in a linear response region, i.e. not clipping).

Real world limitations in the testing that have been pointed out, and more importantly how could each SKEW the results in a particular direction. Random skew doesn't matter as with a sufficient number of samples, the skew will be average out.

1. In accurate measurements of level due to the use of an acoustic reference instead of electrical. Random skew factor as this would affect each amplifier measurement (1/36) equally.

2. Acoustic memory limitations. Random skew factor as this would affect each amplifier measurement (1/36) equally.

3. Preamp impedance issues: This could be a systematic error that skews in a particular direction. This seems highly unlikely in today's world of modern DAC's. The D90SE has an output impedance of 100ohms with the Benchmark's input impedance is 50K while the Eval1 is 10.2K. It isn't clear why this would matter.

4. Clipping: None of the amplifiers were clipping when we listened to selections as they were all moderate since our interest was in hearing differences, not how loud they could get. That doesn't seem to be a very plausible reason to invalidate the results.

So what is the probability in this case of picking 6 tests with the same order RANDOMLY? It is 1 in 46,656. In this case the results actually strongly favor that there is an audible difference that can't be explained by chance. Even if the odds were reduced to only 3 of the 6 tests were valid due to errors in the method that were random, that still is 1 in 216. That still doesn't favor the explanation that it is random and therefore inaudible.

Here is what I would strongly recommend. Why don't a few others repeat the tests and see what you get? There is nothing like actually using the scientific method and testing a hypothesis vs. theorizing about it. Remember the basis for this hypothesis is bench testing can measure anything that we can discern audibly. How well have we tested that hypothesis? If anything with the specs on every new DAC and amplifier approaching the limits of test equipment, testing this hypothesis should be easier and easier.
Maybe the acoustical calibration procedure had a non random effect if the Benchmark was clipping/struggling to get to the 110 dB 1 kHz tone? Which none of you could detect because ears were covered?
 

pogo

Major Contributor
Joined
Sep 4, 2020
Messages
1,239
Likes
382
Perhaps the explanation is also much simpler and as it seems, the ranking matches the damping factors.
I can also reproduce this observation in my setup since 30y, i.e. a low DF leads exactly to these sound impressions.
The Magico are really excellent speakers and will definitely not be a bottleneck and are a very good choice for such a test.
 
Last edited:

xyvyx

Member
Forum Donor
Joined
Jan 14, 2019
Messages
36
Likes
43
Perhaps the explanation is also much simpler and as it seems, the ranking matches the damping factors.
I can also reproduce this observation in my setup since 30y, i.e. a low DF leads exactly to these sound impressions.
The Magico are really excellent speakers and will definitely not be a bottleneck and are a very good choice for such a test.
Yup... and last I heard, the amp testing we see here isn't with a reactive load simulator. Even if he did use one, it's still just a simulator that doesn't necessarily represent the load of unusual speakers. I don't know where the likes of Magico & Magnepan fit on the "easy to drive" scale, but I'm pretty sure they sit on the hard-end. An amp with a good damping factor generally means it's able to bend the cone to the will of the source signal, so if that was the relevant factor here, that makes sense to me!

From:
https://audiosciencereview.com/foru...power-spec-distortion-cut-off.2462/post-70728
Hey Amir, when ya' gotta' save up enough to buy the active loadbox? :)
I got stickshock when I looked at how much Audio Precision charges for it a while back. :)
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,521
Likes
37,050
Yup... and last I heard, the amp testing we see here isn't with a reactive load simulator. Even if he did use one, it's still just a simulator that doesn't necessarily represent the load of unusual speakers. I don't know where the likes of Magico & Magnepan fit on the "easy to drive" scale, but I'm pretty sure they sit on the hard-end. An amp with a good damping factor generally means it's able to bend the cone to the will of the source signal, so if that was the relevant factor here, that makes sense to me!

From:
https://audiosciencereview.com/foru...power-spec-distortion-cut-off.2462/post-70728
Maggies require a bit of power, but aren't hard to drive as some speakers with highly variable impedance. Maggies are almost like a plain resistive load. Box speakers can have reactive impedance that is much more difficult to supply current to than is the case with Maggies.
 

peng

Master Contributor
Forum Donor
Joined
May 12, 2019
Messages
5,614
Likes
5,167
Maggies require a bit of power, but aren't hard to drive as some speakers with highly variable impedance. Maggies are almost like a plain resistive load. Box speakers can have reactive impedance that is much more difficult to supply current to than is the case with Maggies.

Highly reactive impedance is harder to drive in terms of heat dissipation in the output stage but it doesn't sound right to say it is much more difficult to supply current in my opinion. Current would still follow V/|Z| where |Z| is the absolute value (or modulus) of the impedance. So given the same voltage, 4 ohm is 4 ohm whether it is resistance or impedance and the current would be the same, only the phase angle would be different depending or the real term and the imaginary term of the impedance. Sorry if I come across as splitting hair, I don't mean to..
 

bravomail

Addicted to Fun and Learning
Joined
Oct 19, 2018
Messages
817
Likes
461
I must add it was loads of fun and I would highly recommend comparing the Benchmark and Purifi since in theory there should be no real difference that you should be able to detect to see if you can.
it's a puzzling result. In my limited subjective sighted testing of Class AB vs ClassD power amps - ClassD always came on top. Maybe it just handles power and transitions better?
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,521
Likes
37,050
Highly reactive impedance is harder to drive in terms of heat dissipation in the output stage but it doesn't sound right to say it is much more difficult to supply current in my opinion. Current would still follow V/|Z| where |Z| is the absolute value (or modulus) of the impedance. So given the same voltage, 4 ohm is 4 ohm whether it is resistance or impedance and the current would be the same, only the phase angle would be different depending or the real term and the imaginary term of the impedance. Sorry if I come across as splitting hair, I don't mean to..
Yeah, that is how it works. And usually if you know your load is reactive you look for an amp that can output plenty of current to handle dips into the lower impedance ranges. What acutally happens is extra power has to be dissipated by the output stage.
 
Last edited:

pogo

Major Contributor
Joined
Sep 4, 2020
Messages
1,239
Likes
382
What acutally happens is extra power has to be dissipate by the output stage.
And that, per design, is less of a problem with class D amps.
I still have unanswered questions regarding the AHB2/temperature under normal (real) operating conditions in this area: Link

The AHB2 has a limited application range in my experience and would not work on my speakers because of the moderate DF.
 
Last edited:

peng

Master Contributor
Forum Donor
Joined
May 12, 2019
Messages
5,614
Likes
5,167
And that, per design, is less of a problem with class D amps.
I still have unanswered questions regarding the AHB2/temperature under normal (real) operating conditions in this area: Link

The AHB2 has a limited application range in my experience and would not work on my speakers because of the moderate DF.

Even for class AB amps, in theory as long as the higher phase angles don't result in stability related issues, one can deal with the heat by the use of some cheap and quite (relatively) external fans. Stereophile (and Benchmark iirc) seem to have used some sort of current equivalency method to allow for the high phase angles resulted dissipation in the output devices, that imo is misleading and might have created the misconception that higher phase angles mean higher current that is not true.

About your questions in the linked post, I thought Benchmark has an article on their website that might have at least covered the topic, have you read it already?
 

pogo

Major Contributor
Joined
Sep 4, 2020
Messages
1,239
Likes
382
About your questions in the linked post, I thought Benchmark has an article on their website that might have at least covered the topic, have you read it already?
I am aware of this idealized approach, hence my questions, which may not represent the whole reality, but come a little closer to the truth ;)
 
OP
P

PJ2000

Member
Joined
Aug 19, 2021
Messages
26
Likes
86
Sounds fair, but logically there is an issue if specs and measurements cannot do the job, because then we have to ask Bruno if he designed his amp based on facts, data and theory and then use listening tests to fine tune his final design. If not, then would it be possible that your Purifi amp's superiority in this case wasn't totally by design, but also by luck, random..

I don't think that the implication is that amp is 'absolutely' superior, rather that I found it to be subjectively favorable. The more interesting question isn't favorability as much as whether amps with equally great (beyond the threshold of hearing) measurements can be differentiated. As far as the favorability being a product of luck vs. design, the argument here is in some sense on the other end of the spectrum, i.e. where we are in the 'great' design space where we wouldn't expect to hear differences vs. say a comparison of an amplifier with very poor specs compared to one with great specs. In that case the differences could be grossly obvious. As someone pointed out, all three of these amplifiers are in fact designed quite differently, and our assumption is that since the specs that we measure are 'great' that can't/shouldn't result in any audible difference.

I doubt most amp designers would do that kind of things, except may be a few. I can think of at least one, Mr. Peter Walker did not do such thing, but Denon/Marantz claim they use their "Sound masters" though they never said if those masters had the power to tell their engineers/designers to keep tweaking certain thing such as resistor/capacitor values, bias, feedback etc.

We do know for sure that different people 'hear' differently, and the majority of people are not trained nor care about differences in audio quality. Like most things, 'listening more critically' is probably a skill that could be honed (i.e. a chef or sommelier) but one that most people don't really care about which is why they don't notice the difference between a good pair of headphones and the stock earbuds that come with their phone. That isn't to say that they can't. If in fact something is 'tuned' subjectively then all it means is that the sound is 'colored' in a way that that person finds most pleasurable.

I would argue that the role of an amplifier designer is to create the equivalent of the 'magic wire' that amplifies and doesn't modify the signal in any way. Our approximation of that measurement is the standard AP set that see. We haven't actually validated that that is true since even the AP doesn't use actual 'music' but a small set of tones. In this case our hypothesis is that if on those tests we cannot measure an appreciable difference (like the comparison of the Benchmark and Purifi) then we have achieved the 'magic wire.'

While I have seen a lot of opinions on how are testing methodology was flawed etc, no one so far has provided any numbers as to what the impacts of those flaws are in terms of numbers.

If it was done based on listening test feedback to finalize the design, then how should one pick his/her choice, not knowing if the designer(s) would have the same taste. If in fact the best sounding one is the most transparent one, then you would have picked the AHB2. If not, which part of the available measurements would have been missed? And if we know that, then every manufacturer should be able to design/build one that would be as transparent as it possible could, enough such that people with discerning hearing like you could not tell them apart.

The 'subjective' part of this test is in fact the order that I chose and to reiterate I can't make a claim about that other than 'it sounded better to me.' I think you may be missing the point which isn't whether I chose 3,2,1 or 1,2,3 rather it is that I consistently chose one vs. the other, 100% of the time and we haven't identified any 'smoking guns' that would explain a directed vs. random result. To put this again into perspective, I am the 'objectivist' who went into this thinking that the results would be completely random, I am just as puzzled by the results as you are.

But the puzzlement doesn't change the data or the results.

Again, I am only talking about logic here, not saying your test is not valid.

So if all this is true, that specs and measurements cannot predict which one would win in such a test, then the end results for us shopping for the "best sounding" amp, or the most "transparent/accurate" power amp would be randomly determined. Again, back to logic, so if your test is totally valid (I know you are saying that..), then we all should go and build a Purifi amp like yours and save some money. Then the question is, how would we know if we could save even more money by going with a NC502MP, or just another $600 amp?

I can't answer your question other than to say, test it out and see what you get. My only comment was that if you can't test them out yourself and you had to choose one, I would choose the cheapest of the 3. I didn't test the NC502MP and if I had and it had been #1 on my list, I would have suggested to buy that, again if you had no other information and had to choose.

Regardless, really appreciate your efforts in this, but it would be great if you are willing to invest more time to do another test, incorporating some of the suggested improvements. I would also suggest the trials be increased to 20 times or at least double it to 12 times, though you are right, 6 times is not enough, but combined with the exact 1,2,3 order it really is enough. So I believe the key to the puzzle is still the methodology, that likely need to be improved.

Oh, about your question: Why don't a few others repeat the tests and see what you get?

I think my answer is, as I alluded to earlier in my point on the logic. It would be sad if in order to pick the best sounding amp, one would have to do such a test and one that needs to include a lot more than those 3 amps. We really should be able to pick just based on specs and measurements, then it is just a matter of identifying all the specs and measurements needed, to predict reliably..

I fundamentally agree with you that it would be nice to pick based on measurements. Every piece of gear that I have purchased/used in the last couple of years was based on measurements, and specifically ASR. So we are left with 4 possibilities that I can think of now:

1. We screwed up the test majorly in a non-random fashion and ended up with a very non-random result.
2. We fabricated the data
3. I am an extremely good guesser (1 in 46K)
4. Maybe our measurements don't cover all the audible differences in real music vs. synthetic signals. (we didn't use to do multitone distortion measurements, and even those are very simple compared to actual music)

The way the scientific method works is that we need many trials that are properly done to see whether our results are reproducible. So we again ask that others repeat similarly designed tests that invalidate or validate the primary hypothesis of audibility of differences.

Or if accuracy is not as important for sound quality as the actual perceived sound quality in such a listening test comparison, then the question is, your pick of the Purifi would be valid only to you, because then the preference would be not based on accuracy, but "preference" that is subjective in nature.

Agreed.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
I don't think that the implication is that amp is 'absolutely' superior, rather that I found it to be subjectively favorable. The more interesting question isn't favorability as much as whether amps with equally great (beyond the threshold of hearing) measurements can be differentiated. As far as the favorability being a product of luck vs. design, the argument here is in some sense on the other end of the spectrum, i.e. where we are in the 'great' design space where we wouldn't expect to hear differences vs. say a comparison of an amplifier with very poor specs compared to one with great specs. In that case the differences could be grossly obvious. As someone pointed out, all three of these amplifiers are in fact designed quite differently, and our assumption is that since the specs that we measure are 'great' that can't/shouldn't result in any audible difference.



We do know for sure that different people 'hear' differently, and the majority of people are not trained nor care about differences in audio quality. Like most things, 'listening more critically' is probably a skill that could be honed (i.e. a chef or sommelier) but one that most people don't really care about which is why they don't notice the difference between a good pair of headphones and the stock earbuds that come with their phone. That isn't to say that they can't. If in fact something is 'tuned' subjectively then all it means is that the sound is 'colored' in a way that that person finds most pleasurable.

I would argue that the role of an amplifier designer is to create the equivalent of the 'magic wire' that amplifies and doesn't modify the signal in any way. Our approximation of that measurement is the standard AP set that see. We haven't actually validated that that is true since even the AP doesn't use actual 'music' but a small set of tones. In this case our hypothesis is that if on those tests we cannot measure an appreciable difference (like the comparison of the Benchmark and Purifi) then we have achieved the 'magic wire.'

While I have seen a lot of opinions on how are testing methodology was flawed etc, no one so far has provided any numbers as to what the impacts of those flaws are in terms of numbers.



The 'subjective' part of this test is in fact the order that I chose and to reiterate I can't make a claim about that other than 'it sounded better to me.' I think you may be missing the point which isn't whether I chose 3,2,1 or 1,2,3 rather it is that I consistently chose one vs. the other, 100% of the time and we haven't identified any 'smoking guns' that would explain a directed vs. random result. To put this again into perspective, I am the 'objectivist' who went into this thinking that the results would be completely random, I am just as puzzled by the results as you are.

But the puzzlement doesn't change the data or the results.



I can't answer your question other than to say, test it out and see what you get. My only comment was that if you can't test them out yourself and you had to choose one, I would choose the cheapest of the 3. I didn't test the NC502MP and if I had and it had been #1 on my list, I would have suggested to buy that, again if you had no other information and had to choose.



I fundamentally agree with you that it would be nice to pick based on measurements. Every piece of gear that I have purchased/used in the last couple of years was based on measurements, and specifically ASR. So we are left with 4 possibilities that I can think of now:

1. We screwed up the test majorly in a non-random fashion and ended up with a very non-random result.
2. We fabricated the data
3. I am an extremely good guesser (1 in 46K)
4. Maybe our measurements don't cover all the audible differences in real music vs. synthetic signals. (we didn't use to do multitone distortion measurements, and even those are very simple compared to actual music)

The way the scientific method works is that we need many trials that are properly done to see whether our results are reproducible. So we again ask that others repeat similarly designed tests that invalidate or validate the primary hypothesis of audibility of differences.



Agreed.
Or maybe your controls were poor. Others have pointed out that the level-matching was not done in an accurate way, as an example. But there could be other issues as well (timing, extraneous noises from switching, inadvertent cueing by the person supposedly behind you...). I'd look there before thinking about statistics or unlikely hypotheses about source impedance differences (which in this case are likely to cause response variations in the second or third decimal place of dB).
 
OP
P

PJ2000

Member
Joined
Aug 19, 2021
Messages
26
Likes
86
Or maybe your controls were poor. Others have pointed out that the level-matching was not done in an accurate way, as an example. But there could be other issues as well (timing, extraneous noises from switching, inadvertent cueing by the person supposedly behind you...). I'd look there before thinking about statistics or unlikely hypotheses about source impedance differences (which in this case are likely to cause response variations in the second or third decimal place of dB).

Sure, but poor controls should have led to the expected case, i.e. random distribution. Please read the original descriptions on items like extraneous noise. We actually had a very loud movie playing while switching to mask any potential switching noises. Since none of us could see behind us, unlikely the person behind us could have cued anything.
 
Top Bottom