• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Do all amplifiers sound the same? Level matched listening test

Can you hear a difference and which amp do you prefer?

  • I can hear a difference

    Votes: 11 29.7%
  • I cannot hear a difference

    Votes: 25 67.6%
  • I prefer amp X music sample

    Votes: 3 8.1%
  • I prefer amp Y music sample

    Votes: 1 2.7%

  • Total voters
    37
  • Poll closed .
If chosen tight for a task, is really easy to max out with every beat.
And what if it's proportional?
If chosen adequately for every given situation down the road, with thermals and everything at check I suppose that the difference could be not as significant.

But at the end of the say, why? It's cleary a compromise, and a big one. There are amps that would pass this with flying colors.
What I would call a good amp apart from reliable, decent gain and power, is one that's invariant under any condition, be it power, load, frequency, etc.
I don't disagree, really. If the demand is absolute transparency I wouldn't get that amp, nor would I be happy if I was making that amp. But. It is good to be reminded the scale of differences. Below is a link to 2 files. It is open E on a bass. Everything is the same except one is 2dB quieter than the other. I think quick A/B the difference is "obvious", but once the sound leaves echoic memory, I wouldn't be able to tell a difference.

1 note 2dB difference
2 notes 2 dB difference in upper note
 
Last edited:
I don't disagree, really. If the demand is absolute transparency I wouldn't get that amp, nor would I be happy if I was making that amp. But. It is good to be reminded the scale of differences. Below is a link to 2 files. It is open E on a bass. Everything is the same except one is 2dB quieter than the other. I think quick A/B the difference is "obvious", but once the sound leaves echoic memory, I wouldn't be able to tell a difference.

Link
A tip. Have you seen, tested this: :)

 
I don't disagree, really. If the demand is absolute transparency I wouldn't get that amp, nor would I be happy if I was making that amp. But. It is good to be reminded the scale of differences. Below is a link to 2 files. It is open E on a bass. Everything is the same except one is 2dB quieter than the other. I think quick A/B the difference is "obvious", but once the sound leaves echoic memory, I wouldn't be able to tell a difference.

Link
I think of it about balance.
Try two strings instead of one, with the second up high (not on a bass guitar obviously) and listen to their sum, their balance.

The lean one sounds thiner, sometimes is what separates enjoyment (the "full" one) from headache (the "thin" one) .
 
A tip. Have you seen, tested this: :)

I hadn't and I can't even hear it. With my 2dB different files, it turns out it wasn't as obvious as I thought. I got 6 of 16. So even with echoic memory I can't discriminate 2dB @ 40hz on my nearfield speakers.
 
Last edited:
Why did you omit this? It makes it an invalid test. Clipping and recovery are non linear effects and can be different for different amps and sound different. I wouldnt be surprised if one amp clipped more (higher % distortion 1% of the time).
Clipping Level (2017_01_26 19_22_36 UTC).png
What do you me "omit?" I gave you the link for all the clippings. Question was posed as if there were *any* positive control tests that showed differences and I responded that there was. A published one.

As I noted in the article though, I don't know how they determine the 1% in the age of analog scopes and dynamic music. So I don't consider this very trustworthy but for transparency, I included it in my write-up.

FYI, many months after Arny told me about that article, he mentioned that the origin of this test was from a ad-hoc, sighted comparison they were making between these amps. They thought they hard something from one amp so decided to perform the double blind test. And managed to identify something to find the two amps being different

Regardless, I don't know what your beef is as @pma has said the same for his test:
The amps were just at the boundary of occasional clipping for very very short time. This is normally inaudible. We can discuss it after the poll is closed.
So the test I posted is even closer analogy to what is being discussed. To the extent you think clipping should be audible, then you should assume something is not right in this test either, at least with respect to broad conclusions @pma is drawing.

Keep in mind that despite the clipping statement, the positive identification failed in one out of three tests:

index.php


This goes to the point I made earlier on how critical it is to find the right material that is revealing. This is especially important when the person conducting the test is inclined to have one outcome instead of another
 
I think of it about balance.
Try two strings instead of one, with the second up high (not on a bass guitar obviously) and listen to their sum, their balance.

The lean one sounds thiner, sometimes is what separates enjoyment (the "full" one) from headache (the "thin" one) .
Do you mean play a chord with one string 2dB louder? or two separate notes one 2dB louder than the other? As far as bass goes? Why not? we were talking full power at 40hz. Which is about open E on a bass.
 
Do you mean play a chord with one string 2dB louder? or two separate notes one 2dB louder than the other? As far as bass goes? Why not? we were talking full power at 40hz. Which is about open E on a bass.
Yes, do it any way you like, the sum is the interesting part of it.
I would use both parts of the balance of a pink noise-like spectrum

(and unlike what intuition is saying and many around forums, power needs are not SO much bigger down low, spectrum is additive and can use as much power as it can be up higher too)

Example:

1756765454782.png1756765474734.png

That's about the sweet spot of PN, it's evident that mid-highs must not be neglected as well.
 
Careful about this version of ABX test. It is designed to make you fail such comparisons more than succeed.

The change came about some 10 years ago when there were a number of blind audio challenges being posted online. I passed many of them with then ABX plugin shocking many people who thought these challenges were impossible to pass. I explained during the arguments that ensued, how I would find the critical segment in a clip by trial and error. Once there, I would then run the full test and pass it. This was enabled by the previous version of the plugin providing the information as to whether you passed or failed in every trial. This way I could identify the segment and see if i could pass it in the next trial. If I failed, then I would go to find another segment.

Me passing these "impossible tests" bothered folks and they championed the above change as to remove that running count (you only see the results at the end of the test). They also forced minimum number of trails. These two changes mean it takes a long time and a ton of effort to find the critical, audible segment. So much so that no reasonable person has any prayer of detecting small impairments that come and go.

Above is on top of an already lousy foundation. The plug-in will NOT let you change its size. As such, the slider that lets you select segments is small and very coarse and you can't just type a number of the start/stop timecode. This makes creating a segment around a note or two extremely hard. Yet, this is what is required to find time-variable distortions.

The test as is then, is only good for gross differences that are audible across the full clip.

While some conveniently forget, our goal as people seeking maximum fidelity should be to give people the maximum chance to find audible differences. Last thing we want to do is put our head in the sand by making it hard for people to detect the audible impairments. Yet that is what we have with this being one of only two plugins available for such tests (the other is on the Mac).

So be careful when you see people posting such challenges. They are not looking for a fair evaluation but to win a point. Their conclusion may be right, but you wouldn't know it from the barriers put in front of you.
 
I hadn't and I can't even hear it. With my 2dB different files, it turns out it wasn't as obvious as I thought. I got 6 of 16. So even with echoic memory I can't discriminate 2dB @ 40hz on my nearfield speakers.
Did you test it? I don't think it was that hard. :
Screenshot_2025-09-02_002548.jpgScreenshot_2025-09-02_004330.jpg

If I can do it ( I never get it right on other similar blind tests) , then you should be able to do it too.:)
(up 1 dB compared to down 1 dB and flat was the easiest to spot)
 
Last edited:
Did you test it? I don't think it was that hard. :
View attachment 473655View attachment 473657

If I can do it ( I never get it right on other similar blind tests) , then you should be able to do it too.:)
(up 1 dB compared to down 1 dB and flat was the easiest to spot)
Nope. I'm bad at it. 63% correct over speakers. 70% correct over headphones. 1 dB up is easy for me. But I am guessing between flat and 1 dB down.

What's funny, is I can pass it if enough ups are randomly selected early. But if it starts off with downs or flats, I fail. I do improve with training. I'll give it a rest for a week, then see if I need to retrain.
 
Last edited:
Nope. I'm bad at it. 63% correct over speakers. 70% correct over headphones. 1 dB up is easy for me. But I am guessing between flat and 1 dB down.

What's funny, is I can pass it if enough ups are randomly selected early. But if it starts off with downs or flats, I fail.
You can "cheat" by alternately clicking on "Files being tested" and on "?" multiple times back and forth before leaving an answer. Fast back and forth.It tunes in, teaches the ears to distinguish between down 1 dB and flat, which I find is the hardest to distinguish.
It actually isn't difficult at all when you do it that way. But as I said, it can almost be seen as cheating then. Or maybe practice makes perfect?:)

____
By the way. Thanks PMA for creating this Do all amplifiers sound the same? Level matched listening test. :)
 
Last edited:
Careful about this version of ABX test. It is designed to make you fail such comparisons more than succeed.

The change came about some 10 years ago when there were a number of blind audio challenges being posted online. I passed many of them with then ABX plugin shocking many people who thought these challenges were impossible to pass. I explained during the arguments that ensued, how I would find the critical segment in a clip by trial and error. Once there, I would then run the full test and pass it. This was enabled by the previous version of the plugin providing the information as to whether you passed or failed in every trial. This way I could identify the segment and see if i could pass it in the next trial. If I failed, then I would go to find another segment.

Me passing these "impossible tests" bothered folks and they championed the above change as to remove that running count (you only see the results at the end of the test). They also forced minimum number of trails. These two changes mean it takes a long time and a ton of effort to find the critical, audible segment. So much so that no reasonable person has any prayer of detecting small impairments that come and go.

Above is on top of an already lousy foundation. The plug-in will NOT let you change its size. As such, the slider that lets you select segments is small and very coarse and you can't just type a number of the start/stop timecode. This makes creating a segment around a note or two extremely hard. Yet, this is what is required to find time-variable distortions.

The test as is then, is only good for gross differences that are audible across the full clip.

While some conveniently forget, our goal as people seeking maximum fidelity should be to given people the maximum chance to find audible differences. Last thing we want to do is put our head in the sand by making it hard for people to detect the audible impairments. Yet that is what we have with this being one of only two plugins available for such tests (the other is on the Mac).

So be careful when you see people posting such challenges. They are not looking for a fair evaluation but to win a point. Their conclusion may be right, but you wouldn't know it from the barriers put in front of you.
Thanks. This make sense, and has been brewing in the back of my head for some time.
I often do comparisons in foobar's 'training mode' since it allows you to see the running count, for the same reasons you mention. Finding the tell is already so difficult, near impossible to identify otherwise. I feel better now now that I realize it isn't just me struggling. :D
I've also resorted to slicing out critical subsections. I usually pass on that, just too difficult.
I agree with your caution on this.
 
I explained during the arguments that ensued, how I would find the critical segment in a clip by trial and error. Once there, I would then run the full test and pass it. This was enabled by the previous version of the plugin providing the information as to whether you passed or failed in every trial. This way I could identify the segment and see if i could pass it in the next trial. If I failed, then I would go to find another segment.
Isn't that what "Training mode" is for?

foobar2000_01.png

foobar2000_02.png


The plug-in will NOT let you change its size. As such, the slider that lets you select segments is small and very coarse and you can't just type a number of the start/stop timecode. This makes creating a segment around a note or two extremely hard. Yet, this is what is required to find time-variable distortions.
There's a "..." button next to the slider, it opens the window where you can enter the exact time. You only have to keep in mind that changing start/end position also adds about 40ms fade-in / fade-out.

foobar2000_03.png
 
Last edited:
You can "cheat" by alternately clicking on "Files being tested" and on "?" multiple times back and forth before leaving an answer. Fast back and forth.It tunes in, teaches the ears to distinguish between down 1 dB and flat, which I find is the hardest to distinguish.
It actually isn't difficult at all when you do it that way. But as I said, it can almost be seen as cheating then. Or maybe practice makes perfect?:)

____
By the way. Thanks PMA for creating this Do all amplifiers sound the same? Level matched listening test. :)
When I started doing that, I was missing fewer between down and flat. But still missing. Eventually, I’m sure I could get it down.

But to get back to the power at 40 hZ being 2 dB less than at 1k, to find that significant to listening to music (not designing reproduction or testing limits of hearing) it would be being able to A/B/X between two full songs where that relationship is scaled into them AND to do it with 30 second pauses between switches.
 

Why do you think they should all sound much the same? If you went to a dozen concerts and listened to the same piece performed by well-respected orchestras, would you expect them to all sound the same? Of course not.

Of course yes, absolutely!
Doing the same thing over and over again and expecting different result is a definition of idiocy.

The fact that there are differences only proves that something is wrong. What's wrong? The performers are wrong. How wrong are they? It depends; the wrong note can be played, tempo, note also can be played with the wrong dynamics - the forte is a range, there are tolerances for that. How do we know that a certain note should be played with a certain dynamic? Well, those are pre-determined things - there's a composition on paper! You know, the stimulus. Program material. How do you know you're listening to the right piece of music if you can't read music, if you don't know what to expect? Just asking.
The fact that you can't read music (and graphs!) doesn't make you an expert, quite the contrary...

More demagogic crap.

Don't ask me - I'm not an amplifier designer.
You don't say...

Don't tell me that an amp designed doesn't ask people to listen to and comment on its sound before they release it on the general public. Beta testing, etc

If you can't tell a great sounding amp from a good sounding one, get your hearing fixed, rather than look at measurement to offer the answer.
Of course, let's ask junkies what they think about the new drugs we're pushing on them.
Mid heavy program material may sound "ok", "nice" and "full" and other crap, on smiley face contoured speaker - and only mid heavy program material and only on smiley face speaker. Or amp, in that sense. Without looking at the measurements, how can you tell what's wrong?
P.S.
Don't answer, it's rhetorical.
 
The test level is set such that 0dBFS of digital data corresponds to THD = 0.9% for both amplifiers under test (that means close to clipping). The music sample has maximum digital amplitude of -0.1 dBFS and DR = 12.

Time points (in the music sample) that are near to clipping of the DUT amplifiers

Please let me return to post #1, that should always be reviewed in case of doubts about the method used. Below I am posting the time domain visualization of the music sample used for testing.

AX_originaldata.png

Click on the image to see the full resolution. It can be seen that at time 10.239s and 20.941s the music data are about -0.1dB below full scale. This sends the DUT amplifiers close to clipping with THD just below 1%. At these point there might be some audible? difference. However, the peaks are very narrow.

When the recorded samples (from amplifier outputs) are compared to the original digital file in the Deltawave, we get very different linearity -
orig_linearity.png orig-linearity.png
like 16 bits vs. 25 bits, reflecting previously posted differences in THD and THD+N between the amplifiers. However, the PKmetrics, which is the measure of possible audibility of the differences, is same for both amplifiers when compared to the original file. Thus, we are getting poor results in the DBT ABX tests (me included), as expected.

However, PK metric finds different points with "highest possibility of audibility" of differences between X and Y samples:

1756796026100.png


Final note: thanks to all my more patient colleagues here at ASR who are patient enough to argue with some nonsensical posts. I am not the one ;). I will stick with preparing data and posting facts :).
 
Last edited:
Whats the max power with pink noise?
You would need to measure distortion to know what maximum power is. How would you measure distortion with pink noise?
 
  • Like
Reactions: pma
In my test, the power with stepped sine is measured here. Both amps, as explained in post #1, were driven with such input amplitude (different in both cases due to different gain and power of both amps) to get THD less than 1% in the peaks (there were only 2 such peaks, very short). As the burst power is considerably higher than continuous sine power, it may be well supposed that the distortion in those peaks was even much lower. I am dealing with burst power in another threads.
 
I listened a few times before running the Foobar ABX test and was convinced I could hear a very, very slight difference. I felt amp_Y had a slightly different presentation of vocal edges and decays. To be clear, they sound so very similar I wasn't sure, so ran the Foobar ABX test. I deliberately chose to use the entire samples including the long introduction rather than shortening the test to the vocal bit. In the end I got 10/16 - so possibly a subtle audible difference but statistically not quite significant. BTW, I'm not going to say what I think the "tell" is. I used TRUTHEAR x Crinacle Zero BLUE, no EQ and the room is very quiet.

foo_abx 2.2.1 report
foobar2000 v2.24.6
2025-09-02 10:54:26

File A: PMA - Amplifier Comparison - 1.01 - amp_X .wav
SHA1: ef3b0eb1670f5332234d52e12c17eb0d7f2dd80a
File B: PMA - Amplifier Comparison - 1.02 - amp_Y.wav
SHA1: 609806867a3232fbb56fe54e795dddddc6c71280

Output:
Default : Speakers (SABAJ USB AUDIO) [exclusive], 32-bit
Crossfading: NO

10:54:26 : Test started.
10:56:08 : Test restarted.
10:56:08 : 01/01
10:57:25 : Test restarted.
10:57:25 : 01/02
10:58:40 : Test restarted.
10:58:40 : 02/03
10:59:58 : Test restarted.
10:59:58 : 02/04
11:01:21 : Test restarted.
11:01:21 : 02/05
11:03:45 : Test restarted.
11:03:45 : 03/06
11:05:01 : Test restarted.
11:05:01 : 04/07
11:06:13 : Test restarted.
11:06:13 : 05/08
11:07:27 : Test restarted.
11:07:27 : 06/09
11:08:38 : Test restarted.
11:08:38 : 06/10
11:09:51 : Test restarted.
11:09:51 : 07/11
11:11:08 : Test restarted.
11:11:08 : 07/12
11:12:21 : Test restarted.
11:12:21 : 08/13
11:13:42 : Test restarted.
11:13:42 : 08/14
11:14:55 : Test restarted.
11:14:55 : 09/15
11:16:07 : Test restarted.
11:16:07 : 10/16
11:16:07 : Test finished.

----------
Total: 10/16
p-value: 0.2272 (22.72%)

-- signature --
9b2f21540c8387c65ba63c4371f1ac3a9b86ac79
 
You would need to measure distortion to know what maximum power is. How would you measure distortion with pink noise?
Indeed you can measure TD+N with WN (edit: I used PN for this one, I'll add a WN one as well) using FSFA measurement.

It roughly looks like this, a quick, dirty one:

FSFA.PNG


Chart.PNG



...and a WN one (blue) for comparison:

WN.PNG


We have to note that filters matter greatly, and so do levels.

You can also measure with music but results can be strange.
 
Last edited:
Back
Top Bottom