Amplifier Bakeoff: Purifi Eval1, McIntosh MA252 & Benchmark AHB2

PJ2000 · Dec 4, 2021

SIY said:
Nope, exactly the opposite.

OK, How so? I would love a more in-depth explanation as I don't understand how that could be the case.

JRS · Dec 4, 2021

Maybe an analogy--there are two types of scenarios where the conditions were not constant during the test (obviously a very bad situation say with testing 3 sets of skis: one pair in the morning, one mid-day and one in the afternoon after a cold night leaving a icy crust. No telling what those results mean!

But even if they were all done in the icy morning and one set had sharper edges or was stiffer, then it would test best in the morning and perhaps worst in the afternoon, so only tells us how it performed under one of may possible conditions--maybe a 6 ohm purely resistive load, as opposed to say a ported system with a highly inductive woofer ad big electrostattic panel so that impedance is all over the map. In fairness you did use 2 different speakers which are both representative of the types that would likely be used in combination with these amps. Kudos for that. Ideally, it should have been 4 different speakers, from say very easy peasy load to something like the big Apogees that could eat amps for lunch and dinner. Of course, time is a factor but all these subjective gurus reviews should be using at least 4 different speakers IMO.

So back to amps, and one amp was stressed into the load, then to hit 110dB might have required relatively more output--in other words the amp was compressing during playback--higher input voltage required to compensate for the compression, and hence louder at the "calibrated" play back levels for the tests. So yea, never should have normalized volume setting at such a high output, because it took it out of bounds as to "normal" conditions. The same phenomenon comes up all the time in beer making competitions-unless the temperatures are the same and optimized for beer style, results are meaningless. Get it cold enough even American mainstream pilsners might hold up in a test of lagers, why--because all the nuances in aroma, mouthfeel, maltiness, and so forth are killed.

So the other flaw is that you had a human switching stuff, and I still don't visualize--were the amps behind the couch and he was just changing wire? If so why so long between tests? I'm assuming they were terminated connectors. Seems like 20 seconds at most. Puzzled by all the futzing around is all. You know I am sure the proper way to do this is, using high quality relays and hard gain matching or at least bang bang banana plug switches.

So anyhow I just jumped in here and need to re-read the thread. I believe you FWIW, my friends and I did same sorts of tests and tubed preamp was preferred--measured FR was same, the output impedance trivial compared to the input of the amps. What killed me is that every one of my frieds wanted the solid state to win, or at least be sixes. That was a real eye opener. So not discounting the results--if we take them at face value, you're now that guy who has to audition everything for hours while us tin ears can just say I like the looks and featuresof that unit-- all 3 test well, I want that one. Case closed.

Bottom line: lack of controls will always favor one outcome over another, It may not be obvious or otherwise you would control it. That's why getting good scientific data requires ten pair of eyeballs, esp when it is in the squishy area of perception.

pogo · Dec 4, 2021

PJ2000 said:
As someone pointed out, all three of these amplifiers are in fact designed quite differently, and our assumption is that since the specs that we measure are 'great' that can't/shouldn't result in any audible difference.

Regarding the measured values of the damping factors, I can not agree here, because from my own experience, it is precisely at this point that the differences in sound are audible on potent loudspeakers, see also here the description of a manufacturer: Link

In addition, the DFs have been determined under idealized conditions and do not reflect the behavior under normal operating conditions.
That is why I switched to the Purifis and was not disappointed. Here is the grip available that my speakers need and a 'safety buffer' through the high DF is certainly still given under any normal (real) operating condition.

SIY · Dec 4, 2021

pogo said:
Regarding the measured values of the damping factors, I can not agree here, because from my own experience, it is precisely at this point that the differences in sound are audible on potent loudspeakers, see also here the description of a manufacturer: Link

Can you describe the procedures and controls used in your determination? This is a HIGHLY extraordinary claim, as a few seconds with a pencil and back of envelope will make clear.

pogo · Dec 4, 2021

SIY said:
Can you describe the procedures and controls used in your determination? This is a HIGHLY extraordinary claim, as a few seconds with a pencil and back of envelope will make clear.

No procedures and controls are required in addition. Not even a level matching is necessary, because the difference is so clear to hear. With a higher DF, I hear more details in my setup, for example, and the room modes in the lower frequency range are less excited, meaning I need less DSP correction.
I think the new T+A A200 amplifier with its switchable DF offers a very good opportunity to confirm these impressions. However, this will also be very dependent on the speakers used, because a lower DF can also compensate deficits of loudspeakers, but have less to do with a accurate reproduction of the signal.

Blumlein 88 · Dec 4, 2021

pogo said:
No procedures and controls are required in addition. Not even a level matching is necessary, because the difference is so clear to hear. With a higher DF, I hear more details in my setup, for example, and the room modes in the lower frequency range are less excited, meaning I need less DSP correction.
I think the new T+A A200 amplifier offers a very good opportunity to confirm these impressions. However, this will also be very dependent on the speakers used, because a lower DF can also compensate deficits of loudspeakers, but have less to do with a accurate reproduction of the signal.

Someone needs to be held back for remedial education.

Willem · Dec 4, 2021

PJ2000 said:
Sure, but poor controls should have led to the expected case, i.e. random distribution.

No. If you have not equalized the levels accurately enough, the amplifiers are not playing at the same level, and the louder one will systematically rather than randomly be prefered.

pogo · Dec 4, 2021

Willem said:
No. If you have not equalized the levels accurately enough, the ampliers are not playing at the same level, and the louder one will systematically rather than randomly be prefered.

This can be true if the differences are not too great. But if even with a louder amp details are not cleanly worked out or even hidden, the quieter one can be preferred here.

SIY · Dec 4, 2021

pogo said:
No procedures and controls are required in addition. Not even a level matching is necessary, because the difference is so clear to hear.

Bullshit.

peng · Dec 4, 2021

pogo said:
I am aware of this idealized approach, hence my questions, which may not represent the whole reality, but come a little closer to the truth

I read that, and thought the effects of temperature is an excellent point. Practically speaking though, when the amp heats up enough to have a significant effects on D.F., won't you have to consider the effects on the speaker voice coil and crossover as well as temperature would go up too if the speaker is driven to high enough level for long enough time? I know the effects will not likely be in the exact proportion, but I think the effects be have at least some countering acting effects. Good thing I don't have to worry because my amps typically output on average less than 0.2 to may be 0.5 W, peaks to may be 50 W (would be rare) at the most so nothing should be cooked to the point I have to consider the temperature effects affecting DF enough.

Goodman · Dec 4, 2021

PJ2000 said:
I bought the parts and assembled it myself. Purifi modules from Purifi, Ghent Case from Ghent and Hypex PS from Hypex.

What was the total cost of the amp assembled by you, and what would it cost fully assembled?

pogo · Dec 4, 2021

peng said:
won't you have to consider the effects on the speaker voice coil and crossover as well as temperature would go up too if the speaker is driven t o high enough level for long enough time? I know the effects will not likely be in the exact proportion, but I think the effects be have at least some countering acting effects.

These effects certainly contribute to it, but with well-dimensioned speakers they rather make the deficits of the amplifier audible. And Magico also seems to place a lot of attention on this area, see for example the built-in Mundorf MResist Ultra for the highest temperature stability. Perhaps it is also due to my chassis and crossover components that this effect is more noticeable in my setup. For example, my crossovers should not be a major bottleneck:

x-over satellites

x-over passive stereo subwoofer

Willem · Dec 4, 2021

For me it boils down to the question why postulate super-human hearing acuity to justify a completely unexpected observation using a problematic methodology? For the paradigmatic test, see here: https://linearaudio.nl/sites/linearaudio.net/files/Valves versus Transistors DCD.pdf Interestingly even the valve amplifier in the test could not be identified, although admittedly it was a well designed one that measures decently (but no more than that by modern standards).

pogo · Dec 4, 2021

I don't think it's because of a super-human hearing (at my age i certainly don't have that anymore), but the setup used plays a very big role. And it is not really an unexpected observation, because well-known manufacturers are talking about it and they should know best.

peng · Dec 4, 2021

As someone pointed out, all three of these amplifiers are in fact designed quite differently, and our assumption is that since the specs that we measure are 'great' that can't/shouldn't result in any audible difference.

Yes, I believe that is the case. The design of the necessary hardware/software can be very different, but all must have been engineered based on facts and data, and likely numerous measurements done in different stages of the design and build. Just like different design/build by different countries managed to land on Mars. As often cited, amps, including the newer class D amps are not considered "rocket science" anyway. We shouldn't need to go by ears to tell us if audible difference could be there in a tightly controlled A/B/X, and I know you are not suggesting that at all. Regardless, I would love to see more of such tests (like yours) done by more people and use more audience such as what Harman had done in the past, for statistical reasons if nothing else.

I have no doubt you heard the difference (based on consistency in your score), I am just interested in knowing what the real reasons may be. At the moment I tend to think it has to do mainly with level matching, based on what has been cited by others so far. Someone mentioned D.F. too, that could be if somehow the net output impedance were affected enough by the various connections necessitated for the tests to result in FR differences that your obviously very good and discerning ability might have allowed you to pick up on when others might not have. So if there were differences in levels and D.F. resulted in the test process, it could explain the consistency you experienced in your successful picks.

PJ2000 said:
I would argue that the role of an amplifier designer is to create the equivalent of the 'magic wire' that amplifies and doesn't modify the signal in any way. Our approximation of that measurement is the standard AP set that see. We haven't actually validated that that is true since even the AP doesn't use actual 'music' but a small set of tones. In this case our hypothesis is that if on those tests we cannot measure an appreciable difference (like the comparison of the Benchmark and Purifi) then we have achieved the 'magic wire.'

While I have seen a lot of opinions on how are testing methodology was flawed etc, no one so far has provided any numbers as to what the impacts of those flaws are in terms of numbers.

Again, yes I believe that is the case, and there is no argument from me on your first sentence. I thought Peter Walker made n interesting point about having designed his amp without being guided by any listening test. In my mind, he must be a good real engineer for him to be so confident that he wouldn't need to "tweak" based on listening tests.

PJ2000 said:
The 'subjective' part of this test is in fact the order that I chose and to reiterate I can't make a claim about that other than 'it sounded better to me.' I think you may be missing the point which isn't whether I chose 3,2,1 or 1,2,3 rather it is that I consistently chose one vs. the other, 100% of the time and we haven't identified any 'smoking guns' that would explain a directed vs. random result. To put this again into perspective, I am the 'objectivist' who went into this thinking that the results would be completely random, I am just as puzzled by the results as you are.

But the puzzlement doesn't change the data or the results.

I absolutely did not miss your point, and it was in fact the consistency part that made me believe for sure you heard the difference. I wonder why you would think I missed that point when I thought I was so clear, that there are at least too parts, "audible difference" and "preference" one does not automatically equal the other. Sorry if for some reason I failed to make that clear. Clearly I need to improve on my writing skill.

PJ2000 said:
I can't answer your question other than to say, test it out and see what you get. My only comment was that if you can't test them out yourself and you had to choose one, I would choose the cheapest of the 3. I didn't test the NC502MP and if I had and it had been #1 on my list, I would have suggested to buy that, again if you had no other information and had to choose.

I have been trying, and so far there is no way I could tell a difference in the so called "sound quality" between my cheap Hypex and Purifi amps(and also my class AAB Parasound amp). I am sure I have hearing loss in the high frequencies, but I do consider myself well trained to hear differences that I know for sure other non hifi enthusiast with much better hearing could not. I could even the minor difference between my DACs and we all know many don't believe in audible differences between external DACs and the typical AVR's internal ICs such as the AK4458. All such tests were always sighted except I did do it blind with just one other person once, so I also consider all my tests are not valid, and totally irrelevant except for me only. I will never post anything such as "I just added a Purifi amp to my AVR, or upgraded my whatever and immediately heard more details, wider soundstage etc.", because I would be concern that someone may read it and take it as facts. That's how a lot of hearsay started about how one brand is warm, good for music while another is crisp, impactful and better for movies based on sighted tests. In my opinion, anyone who made such claims (whether it was about hearing a difference for the better or worse subjectively) should state the caveats and test conditions, and that's exactly what you have done. The fact that you have taken the time to do your tests with the stated details, in my opinion, sets a great example for others.

Lastly, I really appreciate your detailed response to my post.

caught gesture · Dec 4, 2021

PJ2000 said:
Sure, but poor controls should have led to the expected case, i.e. random distribution. Please read the original descriptions on items like extraneous noise. We actually had a very loud movie playing while switching to mask any potential switching noises. Since none of us could see behind us, unlikely the person behind us could have cued anything.

Ideally the person doing the switching should also have no idea as to what is being changed. Bias is a slippery thing. Non-verbal cues are picked up and influence results. You went to a lot of effort, but you need to do a bit more to get a truly randomised trial and a definitive result. Well done for having a go though.

peng · Dec 4, 2021

caught gesture said:
Ideally the person doing the switching should also have no idea as to what is being changed.

Agreed, and I thought that's part of the definition of "DBT".

pogo · Dec 4, 2021

peng said:
Someone mentioned D.F. too, that could be if somehow the net output impedance were affected enough by the various connections necessitated for the tests to result in FR differences

On a real setup, this measurement method could help and takes into account not only the frequency axis but also the time axis:
Link

This method could also show a different damping factor well in the swing-out behavior, since the amplitude can be reduced differently over time depending on the DF.

SIY · Dec 4, 2021

Back of envelope before trying to make a big deal of thousandths of a dB.

PJ2000 · Dec 4, 2021

Willem said:
No. If you have not equalized the levels accurately enough, the amplifiers are not playing at the same level, and the louder one will systematically rather than randomly be prefered.

Yeah, but we recalibrated each amplifier on each sample, as a result this would introduce a random variation. Your point would be valid if we had only calibrated each amplifier once and then listened to the 6 songs.

Amplifier Bakeoff: Purifi Eval1, McIntosh MA252 & Benchmark AHB2

Member

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Major Contributor

Major Contributor

Grand Contributor

Master Contributor

Active Member

Major Contributor

Major Contributor

Major Contributor

Master Contributor

Senior Member

Master Contributor

Major Contributor

Grand Contributor

Member

Similar threads