"23 dB" is a completely accurate published representation of something that measures 22.9 dB, if we take the measurement to a decimal place beyond what is published.I agree that they should be indistinguishable under a tightly controlled double blind test.
Also the 4B3 has 23 and 29 dB gain settings, whereas the AHB2 has 9.2, 17, and 23 dB gain settings. I assume, the sighted test has been made with the 23 dB setting on both amps to have the same SPL.
I wonder how accurate are those gains though: a fraction of dB hotter on the 4B3 and a fraction of dB lower on the AHB2 will make the 4B3 sounds better.
Assuming the gain is set by 0.5%-accurate resistors in both amps and that the gains are by design exactly 23 dB, the error in voltage gain is at most 0.5%. So worst case, there is a 0.5% difference between the two amps voltage gain. As the SPL is driven by power and not voltage, the difference in power/SPL is about 1% or 0.04dB SPL.
Actually, Stereophile´s AHB2 measurements show the 23 dB gain measured at 22.9 dB, so the accuracy of the gain is clearly worst than 0.5% (more like 1%). Assuming 1%, the worst case difference in dB SPL becomes 0.09 dB. Still small, but maybe already perceptible as a sound quality difference.
Edit: added some back-of-the-envelope calculations
Edit 2: added Stereophile measurements of gains
But published gain has never been an acceptable means of matching output during a comparison, simply because of that routine rounding that takes place. The only accurate way is to use an AC voltmeter measuring a test tone, say, 1KHz.
Without a voltmeter, one can get pretty close by playing one amp into the right speaker and the other amp into the left speaker, pump a test tone through both, and match levels so that the phantom stereo image is exactly between the two speakers from the listening position. This isn't as precise as a voltmeter, of course, but it should balance the outputs below the threshold of perception at least.
Both amps should be driven in their linear range, too. For these two amps, exceeding that threshold would probably drive humans out of the house, but it bears repeating given that lots of amps that attract favorable comment are far lower in power.
Then, the selection of the amps needs to be unknown to the listener, which isn't that easy to do with amps without some specific switching apparatus that is blindly controlled. But for humans to be able to claim that they hear a difference, that's what it takes. If the listener knows which amp is under test, the evaluation will not and cannot be based solely on what they hear, and they should not trust their perceptions. The same comparison next week might go the other way for reasons exogenous to the amps themselves.
The placebo effect can be real, but it is not repeatable because uncontrollable expectation bias sets in immediately. For me, I expect one amp to sound better, but then I discount that expectation consciously, meaning that now I have two types of expectation bias fighting each other with no way to arbitrate their effect. We cannot transcend those biases. It does not mean we end up with something that is no good, it means that the reason we had for thinking something was better isn't trustworthy or repeatable, and six months from now we might go the other way. (Leading to such earnestly held fantasies as, "I again listened to an AHB and compared it with my Bryston, and, wow!, Benchmark apparently made some changes and this time it blew me away!")
Personally, spending money on the Next Great Amp seems to me an unproductive use of funds. (I'm assuming the current amp has sufficient power to be in its linear range even when driving our speakers during peaks, and that's perhaps a grander assumption than many realize.) I'd think there would be more potential gains with better speakers or, for those who still listen to analog media, better source transducers.
(As much as I admire Benchmark as a company--and I happily own a couple of their products--I run into the same mental issue when considering the AHB as an alternative to my far less expensive Buckeye amp.)
Rick "a reminder" Denney