What to do about the ABX test?

lashto · Dec 1, 2022

JSmith said:
ASR is very much a breath of fresh air compared to many other audio forums and I think most of us prefer things kept that way.

An example... a forum I would often frequent many years ago had a resident "golden ears" that members would actually send their amps (often expensive too) to for a listening "test", after which time the member would post a few pages of impressions and comments about the amp for the member who sent it and the forum in general. They were only listened to... with comparison comments, yet they were not compared AB let alone blind. In fact the amps he would "compare" the current amp to would not even be in his possession anymore. To me it was a completely useless exercise... and strangely many of the "tested" amps would end up in the FS section not long at high prices for 2nd hand after, with a referral to the "test" post as evidence of it being a fine amp. In fact many members actually relied on this list to make purchasing decisions... and the forum then had sponsors. To make it worse, the person was actually very pleasant and really believed in what they were doing after years of raking amps this way, thus due to this "experience" felt he was in a position to conduct such single "tests".

Now if anyone asked are any of these amps AB compared at least, or any kind of actual testing done on them, or pointing out the folly of same... their posts would be removed and a note sent asking the member to refrain from posting such things as it was against the spirit of the forum, regardless of seniority or join date and regardless of time consuming contributions to the forum itself. ASR is basically the bizarro world opposite of forums like that... a decent place where those attitudes are in the minority based on factual information and real published data.

JSmith

most audio forums are just like you described, especially oldschool ones like audioasylum. Biggest/best 2 examples would be headfi & audiophilestyle, where it's forbidden to even mention tests/DBTs in the 'normal' threads.
And then you have a few in the ~middle like sbaf and headphones.com (or at least they try/pretend to be). And quite a lot of 'cacophonies' like reddit/discord/facebook groups.

The only forum I know to be quite close to ASR is hydrogenaudio. They are even more vocal/strict about things like "DBT or it did not happen". And it's ~deserted nowadays.
They do not have someone like Amir to feed an inhuman amount of info/tests. And those strict, hyper-skepticism policies drove people away. When you only accept DBTed posts, you only have one per month...

RichB · Dec 1, 2022

GaryH said:
Yep, I've pointed out the invalidity of these tests before.

I was referring to the jangling keys discussions that have occurred across many threads as an example where an HD Audio file was distinguishable from the CD quality version. Agreed, raising the volume is the video does not prove that HD Audio makes sense. I think we already knew there was a noise floor.

Reconstruction filters implementation may have an audible effect. I suppose a SBT could be performed for that though.
I see no reason for any implementation that does not have a linear fast filter, as this technically the most correct Implementation.

- Rich

krabapple · Dec 1, 2022

RichB said:
Not the Wikipedia is a trusted source but:

ABX test - Wikipedia

en.wikipedia.org

- Rich

Um, right. Now explain how any of that contradicts or negates what I wrote.

krabapple · Dec 1, 2022

GaryH said:
Yep, I've pointed out the invalidity of these tests before.

Blumlein 88 said:
Okay, this wasn't jangling keys as someone else mentioned in this case. It was a music file in Archimago's hi-res listening test. And yes, he did use IEM's and found levels fading under -70 dbFS and pumped up the volume. I do think that is cheating. Amir even says without doing this he could hear no difference. I don't see this as any different than dumping a file in a sound editor and looking for differences. It has zero to do with any normal listening. Amir doesn't consider 16 bits transparent due to this, while I'd say for normal listening it is. One could say 24 bit encoding is transparent to levels beyond the ability of any hardware and 16 bits isn't, but if 16 bits gets enough humans can never tell then what have we gained?

Just after this he shows hearing Ethan's generational test. I don't know what the difference is, but Amir said except for one track with a tale accidentally left in he couldn't hear the 8th generation files I have posted. Of course I didn't leave any dead space in files to prevent someone doing the find quiet portions and amp up the noise to hear differences thing.

What this shows is the use of ABX listening test with technical knowledge of how things work one can discern extremely small differences using ears only. Other methods of blind testing work beyond just the ABX method.

Indeed, and that's pretty much the objection I've posted every time he's posted such 'see, there *is* an audible difference' exercises. And we're talking all the way back to Hydrogenaudio days where Amir and Arny Kruger went at it frequently.

Such forensic 'listening' is 'valid' in the sense that SINAD differences with no likely bearing on audibility are 'valid'. Which is to say, pretty to know...but totally irrelevant to the sorts of 'veils were lifted' and 'of course A sounded better than B' claims routinely presented by audiophiles re: lossy vs lossless audio and hi res vs CD.

Amir is of a very different opinion.

krabapple · Dec 1, 2022

lashto said:
The only forum I know to be quite close to ASR is hydrogenaudio. They are even more vocal/strict about things like "DBT or it did not happen". And it's ~deserted nowadays.
They do not have someone like Amir to feed an inhuman amount of info/tests. And those strict, hyper-skepticism policies drove people away. When you only accept DBTed posts, you only have one per month...

You exaggerate. Go there now and tell me how many new posts went up today.

What *is* true is that the majority of posts these days are foobar 2000 related. Nothing wrong with that, as it's a free and highly configurable audio player, and popular among a certain demographic. HA hosts the f2k discussion forum.

HA started out as a forum devoted mainly to improving lossy codecs, not a general audio forum. Inviting sighted anecdotes would have been useless for that purpose.

And of course, plenty of posts there, even before it became so F2K centric, didn't 'require' DBTs -- because they weren't about a poster's claims of audio difference. A lot of posts were and are about how to make stuff work. Or how stuff worked. Including human hearing as well as gear as well as software.

I don't think this forum benefits from the endless rehashes of physically and psychoacoustically dubious claims any more that HA would have.

RichB · Dec 1, 2022

krabapple said:
Um, right. Now explain how any of that contradicts or negates what I wrote.

OK, I found the context. The ABX is a methodology. Listen to A, listen to B, then listen to X that is A Or B.
If strictly adhered to, that requires 3 per test.

From what I understand it, Harman tests are blinded comparisons between speaker A and speaker B.
The listener selects tracks and switches as they like, having been trained they give their impressions of A/B.
There is no X where a speaker is presented that may be A or B.
This is a different methodology but also a blind test.
ABX in this type of test adds no value.

I have done single blinded tests at home, there is A and B and no X in the methodology.
In each listening session A or B may be either component and the user selects how long to listen and when to switch.
At the end, provides a preference if any and a description of the differences.

- Rich

lashto · Dec 1, 2022

krabapple said:
You exaggerate.

of course I do, that is how an "alarm signal" works

Blumlein 88 · Dec 1, 2022

RichB said:
OK, I found the context. The ABX is a methodology. Listen to A, listen to B, then listen to X that is A Or B.
If strictly adhered to, that requires 3 per test.

From what I understand it, Harman tests are blinded comparisons between speaker A and speaker B.
The listener selects tracks and switches as they like, having been trained they give their impressions of A/B.
There is no X where a speaker is presented that may be A or B.
This is a different methodology but also a blind test.
ABX in this type of test adds no value.

I have done single blinded tests at home, there is A and B and no X in the methodology.
In each listening session A or B may be either component and the user selects how long to listen and when to switch.
At the end, provides a preference if any and a description of the differences.

- Rich

Should we discuss 2AFC, triangle and duo-trio testing next?

RichB · Dec 1, 2022

Blumlein 88 said:
Should we discuss 2AFC, triangle and duo-trio testing next?

You can start without me

- Rich

antcollinet · Dec 1, 2022

RichB said:
OK, I found the context. The ABX is a methodology. Listen to A, listen to B, then listen to X that is A Or B.
If strictly adhered to, that requires 3 per test.

From what I understand it, Harman tests are blinded comparisons between speaker A and speaker B.
The listener selects tracks and switches as they like, having been trained they give their impressions of A/B.
There is no X where a speaker is presented that may be A or B.
This is a different methodology but also a blind test.
ABX in this type of test adds no value.

I have done single blinded tests at home, there is A and B and no X in the methodology.
In each listening session A or B may be either component and the user selects how long to listen and when to switch.
At the end, provides a preference if any and a description of the differences.

- Rich

The point of ABX is to detect if there even is an audible difference to be preferred, or described. It is not intended to determine preference.

If you can't decide reliably if x is A when it is, or B when it is, then it demonstrates that you are unable to detect the difference between A and B - any difference if it exists is below the level of audibility for the listener. Or conversely it demonstrates that you can.

The listener can still decide how long to listen to a/b/x, and can switch back and forwards between them as much as, and for as long as they like.

Once you know that there is an audible difference, you can go on to test for preference or description of the difference.

lashto · Dec 1, 2022

Blumlein 88 said:
Should we discuss 2AFC, triangle and duo-trio testing next?

RichB said:
You can start without me
- Rich

And I'll generously take care of the bar & cocktails

krabapple · Dec 1, 2022

RichB said:
OK, I found the context. The ABX is a methodology. Listen to A, listen to B, then listen to X that is A Or B.
If strictly adhered to, that requires 3 per test.

You quoted Wikipedia's correct statement:
"An ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples (sample A, the first reference, and sample B, the second reference) followed by one unknown sample X that is randomly selected from either A or B."

That was in response to me writing (emphasis added here):
It doesn't really add a third. The X is either A or B.

I.e....clearly I know that an X is involved.
ABX 'requires' 3 'samples', but there are only 2 potentially distinct stimuli.

RichB said:
From what I understand it, Harman tests are blinded comparisons between speaker A and speaker B.
The listener selects tracks and switches as they like, having been trained they give their impressions of A/B.
There is no X where a speaker is presented that may be A or B.
This is a different methodology but also a blind test.
ABX in this type of test adds no value.

Uh huh. Which is quite aligned with what I wrote: Harman's blind speaker tests don't employ an ABX protocol. ABX is inappropriate for tests of preference.

RichB said:
I have done single blinded tests at home, there is A and B and no X in the methodology.
In each listening session A or B may be either component and the user selects how long to listen and when to switch.
At the end, provides a preference if any and a description of the differences.

- Rich

Uh huh.

I have no idea what you think you are disagreeing with, in any of this. Wikipedia does not disagree with me; you don't seem to, either.

Let's stop there.

RichB · Dec 1, 2022

tonycollinet said:
The point of ABX is to detect if there even is an audible difference to be preferred, or described. It is not intended to determine preference.

If you can't decide reliably if x is A when it is, or B when it is, then it demonstrates that you are unable to detect the difference between A and B - any difference if it exists is below the level of audibility for the listener. Or conversely it demonstrates that you can.

The listener can still decide how long to listen to a/b/x, and can switch back and forwards between them as much as, and for as long as they like.

Once you know that there is an audible difference, you can go on to test for preference or description of the difference.

Understood. I am addressing this from the original post

Blumlein 88 said:
Seems increased commentary in recent weeks about ABX tests. Much of it stemming from people who come to ASR to set us straight about trusting our ears. I do agree with some who have said that calls for ABX or it didn’t happen have become almost like a club to beat people over the head with, and nearly cultish in how some new posters have the call rain down upon them. Not that I haven’t been guilty of it myself.

Some comments by @restorer-john have caused me to think about this situation. We stand little chance of convincing, or engaging in meaningful discussion with people with this approach. Like restorer-john I think there is a lot more talk of it than participation in or use of ABX listening tests among most posters. For most audiophiles it is impractical for most situations.

Some who don’t like ABX tests complain they are stressful. Only if you feel challenged by it or think you’ll suffer loss of face. After you have done it a couple or three times it isn’t stressful. It is major league TEDIOUS and BORING. Most of us do them with Foobar ABX or similar software. That isn’t very useful for amps and not at all for speakers.

So what is a next best alternative? What is a friendlier way to get the point across? How do regular ASR members pick their gear?

Blind tests are the best most discriminating method. I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching. OTOH, some of those I score 50/50 if segments are 15 or 30 seconds long. I have found anything I only hear using the very short segments which both can fit inside my Echoic memory are so small they have zero relevance to normal music listening. So on one hand if you cannot hear something using short rapid switching listening tests it is a pretty sure bet you cannot hear it. On the other if the difference isn’t large enough to hear with 30 second segments it isn’t big enough to matter for music listening.

I believe the #1 thing to emphasize with any comparative listening is you must match levels precisely. Set a comfortable listening level and measure voltage of test tones at speaker terminals so each component matches within 1%. You cannot do any useful listening comparisons without this step. This one thing even in sighted listening can cause people to experience the disappearance or large reduction in differences they thought they were hearing.

The #2 thing to make clear is that fairly small deviations in frequency response are audible. So checking that might eliminate any need to go further for differences you hear. There are some simple ways to test this.

So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”

This thread discussed the validity of components evaluation using fast switching when level matched.
If the listener is unable to find a meaningful difference our mis-identifies the source when unblinded, that is the result.

I have used the MinniDSP dual output device to compare amps because there is a single source with DSP level matching to .1 dB.
This works well for A/B comparison that is not ABX.

There is a value to such tests to select gear, IMO, of course.

- Rich

xnor · Dec 2, 2022

RichB said:
I have done single blinded tests at home, there is A and B and no X in the methodology.
In each listening session A or B may be either component and the user selects how long to listen and when to switch.
At the end, provides a preference if any and a description of the differences.

- Rich

Sure there is an X. There's also a Y. What you describe is a test with X and Y.
Those get randomly assigned either A or B at the start of each round.
ABX tools may also offer a Y. Y is simply the other option. In those tools, you can also choose only to listen to X and Y.

The point of these tools/protocols should not be to make you suffer or to force you into something (which is a common invalid excuse that golden ear audiophiles make), but to help you detect even the tiniest of differences in a way that eliminates the possibility of you tricking yourself.

Whether you make your choice because you hear a difference, have a preference or experience a tingling sensation in your pinky does not matter.
The point is that, statistically, if your choices align with random chance then we're inclined to say that you cannot hear a difference.

RichB · Dec 2, 2022

From the OP:

Blumlein 88 said:
So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?

xnor said:
Sure there is an X. There's also a Y. What you describe is a test with X and Y.
Those get randomly assigned either A or B at the start of each round.
ABX tools may also offer a Y. Y is simply the other option. In those tools, you can also choose only to listen to X and Y.

The point of these tools/protocols should not be to make you suffer or to force you into something (which is a common invalid excuse that golden ear audiophiles make), but to help you detect even the tiniest of differences in a way that eliminates the possibility of you tricking yourself.

Whether you make your choice because you hear a difference, have a preference or experience a tingling sensation in your pinky does not matter.
The point is that, statistically, if your choices align with random chance then we're inclined to say that you cannot hear a difference.

Level matched A/B or Y/X comparisons can be a useful tool for people at home to try to evaluate and compare components.
Level matching using a voltage is better than using SPL in room.

SBTs (Single Blind Tests) are a good method at home since these component tests often requires manual intervention.
Encouragement is to suggest and support rigor, discouragement is “hey, do an ABX test or it didn’t happen”.

Let's say a user buys a set of $2,000 speaker cables. Speaker cables can absolutely change the response of a speakers.
Especially, ones with dubious designs. A rigorous experiment is to setup a SBT using two-amp channels from a split source.
Another person could switch the speaker cables out of view (SBT) upon request, such the listener does not know which cables are attached.

Possible outcomes could be:

- The listener is unable to reliably determine the cables in use
- The listener reliably determined the cables and picks the cheaper cable
- The listener reliably determined the cables and picks the more expensive cable

This gets interesting if the product in question is still within the return window.

The user may decide that they don't really sound much different but loves how they look on their new cable lifters.

I encourage such experiments (with rigor) and reading impressions. It is clear what these are listening sessions. They are not proof that one thing is superior to another on an objective basis.

I think many of us have been on the upgrade train, searching for that last bit of (affordable) performance, so have a visceral response to impression posts that threaten our wallets (again)

, but this is not always productive nor helpful.

- Rich

lashto · Dec 2, 2022

Blumlein 88 said:
That is what this thread is about. Different strategies. It is not an anti-ABX thread or one questioning its veracity. It is about different approaches that might result in more useful engagement by those who doubt the measurement and blind testing approach.

Here's a more concrete suggestion..

ABX/DBT with two different devices is by far the most complex & hard. We cannot really expect/request an 'amateur' to do one. Or to do it right.

It is however easier to compare recordings with some ABX plugin. @amirm can use one latest-and-greatest ADC and record the sound of all tested devices. And members will try the ABX fun.
One of those super revealing recrdings could be used: Arnie's keys-jingle or audience applause or...
Still not exactly easy but should be much easier/simpler. And I'm putting a lot more work in other people's laps

Otherwise, I only have the old "keep an open mind" .. wish that one was not so over-used and thoroughly-abused.

xnor · Dec 2, 2022

RichB said:
Level matching using a voltage is better than using a SPL in room.

I've also seen people fall into the trap of comparing amps with different output impedances and matching the unloaded output voltages.

Imo, in such situations its more practical to just measure the output impedances, look at the headphone/speaker impedance curves and do a quick calculation of the max frequency response deviations. Because why bother with the physical setup when you can tell straight from the numbers that amp B will result in a, let's say, 3 dB deviation?

If one still thinks that amp B has other sonic "qualities" and wants to test that, then I'd create an EQ curve to eliminate the FR differences.
So instead of matching loaded output voltages at a single frequency you match across all audible frequencies.

RichB said:
Let's say a user buys a set of $2,000 speaker cables. Speaker cables can absolutely change the response of a speakers.
Especially, ones with dubious designs. A rigorous experiment is to setup a SBT using two-amp channels from a split source.

Here I'd again look at the specs first and run the numbers. You need to measure the voltage drop/resistance for level matching anyway.
A cable can be modeled as a passive filter and its effects on FR can be calculated.

Good manufacturers will provide the specs, so you don't even need the equipment to measure them yourself.
The thing is this: once you understand any of this then you also understand that such cables are a scam / waste of money.

RichB said:
I encourage such experiments (with rigor) and reading impressions. It is clear what these are listening sessions. They are not proof that one thing is superior to another on an objective basis.

While I agree that people should do blind tests I think that such tests are mainly useful for the person doing the test, because it tests their setup and hearing limits.
Objective measurement data is more universally useful.

RichB · Dec 2, 2022

xnor said:
I've also seen people fall into the trap of comparing amps with different output impedances and matching the unloaded output voltages.

Imo, in such situations its more practical to just measure the output impedances, look at the headphone/speaker impedance curves and do a quick calculation of the max frequency response deviations. Because why bother with the physical setup when you can tell straight from the numbers that amp B will result in a, let's say, 3 dB deviation?

If one still thinks that amp B has other sonic "qualities" and wants to test that, then I'd create an EQ curve to eliminate the FR differences.
So instead of matching loaded output voltages at a single frequency you match across all audible frequencies.

I've always matched amps levels using 200Hz, 500Hz, 1000Hz, and 2000Hz when driving speakers but settled on 1kHz.
That will be included in future posts.

Certainly, output impedance will provide some insight into maximum deviations but that of course will vary based on the actual load.
A well designed A/B comparison can help the user determine difference detection and preference.
People who love their tube preamp may not like a Benchmark SS LA4.
Still their observations.

For SS amps, output impedance is not sufficient to characterize an amplifier driving loads.
Here are two amp measurements into load with frequency variation based on the output impedance.

AHB2: https://www.stereophile.com/content/benchmark-media-systems-ahb2-power-amplifier-measurements

The output impedance, including a 6' speaker cable, was a low 0.09 ohm at 20Hz and 1kHz, rising slightly to 0.22 ohm at 20kHz. As a result, the modification of the Benchmark amplifier's frequency response due to the interaction between this impedance and that of our standard simulated loudspeaker was just ±0.1dB (fig.1, gray trace).

Parasound: https://www.stereophile.com/content/parasound-halo-21-power-amplifier-measurements

The Parasound's output impedance was a very low 0.077 ohm at 20Hz and 1kHz, increasingly slightly to 0.1 ohm at 20kHz. (These figures include the series impedance of 6' of loudspeaker cable.) The modulation of the amplifier's frequency response, due to the Ohm's law interaction between this source impedance and the impedance of our standard simulated loudspeaker, was therefore minuscule, at ±0.1dB (fig.1, gray trace).

The AHB2 has higher output impedance but maintains .2 dB reduction (from 8 Ohms), dropping to .25 at 10kHz.
The A21+ with lower output impedance maintains .5 dB reduction (from 8 Ohms) up to 10kHz.

Is .25 versus .5 dB difference noticeable, perhaps I don't know, I also don't know if this test represents the worst case an amp driving reactive loads.
Are there measurements for distortion driving reactive loads representative of all speakers for 20Hz to 20kHz?
Personally, I look at all measurements that I can find but also know this may not be the whole story but lends a great deal of confidence.

- Rich

dlaloum · Dec 3, 2022

xnor said:
I've also seen people fall into the trap of comparing amps with different output impedances and matching the unloaded output voltages.

Imo, in such situations its more practical to just measure the output impedances, look at the headphone/speaker impedance curves and do a quick calculation of the max frequency response deviations. Because why bother with the physical setup when you can tell straight from the numbers that amp B will result in a, let's say, 3 dB deviation?

If one still thinks that amp B has other sonic "qualities" and wants to test that, then I'd create an EQ curve to eliminate the FR differences.
So instead of matching loaded output voltages at a single frequency you match across all audible frequencies.

Here I'd again look at the specs first and run the numbers. You need to measure the voltage drop/resistance for level matching anyway.
A cable can be modeled as a passive filter and its effects on FR can be calculated.

Good manufacturers will provide the specs, so you don't even need the equipment to measure them yourself.
The thing is this: once you understand any of this then you also understand that such cables are a scam / waste of money.

In a world moving ever more towards computerised measurement and RoomEQ, a slight FR variation between two amps or speaker cables becomes irrelevant, as the Room/Speaker EQ is capable of compensating for these...

With connections prior to the stage where automated EQ can be applied, it may well be more relevant - but even then only applicable to analogue.... so in today's world, we are pretty much down to turntables, or vintage sources such as radio / cassette....

It has become (mostly) irrelevant - the standard connection methods are all becoming digital... HDMI, SPDIF, Ethernet
The move to digital connection to speakers (active speakers) - is well on its way too.

We can discuss the relative merits of different breeds of horses and types of cart, but we are usually driving cars...

kongwee · Dec 3, 2022

All the ABX in AES are just read purpose. You can't listen for a difference is not a big deal at all. It is ok you can't differential a 8361 and M9.

What to do about the ABX test?

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Active Member

Major Contributor

Major Contributor

Active Member

Major Contributor

Major Contributor

Major Contributor

Similar threads