What to do about the ABX test?

Blumlein 88 · Nov 28, 2022

Seems increased commentary in recent weeks about ABX tests. Much of it stemming from people who come to ASR to set us straight about trusting our ears. I do agree with some who have said that calls for ABX or it didn’t happen have become almost like a club to beat people over the head with, and nearly cultish in how some new posters have the call rain down upon them. Not that I haven’t been guilty of it myself.

Some comments by @restorer-john have caused me to think about this situation. We stand little chance of convincing, or engaging in meaningful discussion with people with this approach. Like restorer-john I think there is a lot more talk of it than participation in or use of ABX listening tests among most posters. For most audiophiles it is impractical for most situations.

Some who don’t like ABX tests complain they are stressful. Only if you feel challenged by it or think you’ll suffer loss of face. After you have done it a couple or three times it isn’t stressful. It is major league TEDIOUS and BORING. Most of us do them with Foobar ABX or similar software. That isn’t very useful for amps and not at all for speakers.

So what is a next best alternative? What is a friendlier way to get the point across? How do regular ASR members pick their gear?

Blind tests are the best most discriminating method. I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching. OTOH, some of those I score 50/50 if segments are 15 or 30 seconds long. I have found anything I only hear using the very short segments which both can fit inside my Echoic memory are so small they have zero relevance to normal music listening. So on one hand if you cannot hear something using short rapid switching listening tests it is a pretty sure bet you cannot hear it. On the other if the difference isn’t large enough to hear with 30 second segments it isn’t big enough to matter for music listening.

I believe the #1 thing to emphasize with any comparative listening is you must match levels precisely. Set a comfortable listening level and measure voltage of test tones at speaker terminals so each component matches within 1%. You cannot do any useful listening comparisons without this step. This one thing even in sighted listening can cause people to experience the disappearance or large reduction in differences they thought they were hearing.

The #2 thing to make clear is that fairly small deviations in frequency response are audible. So checking that might eliminate any need to go further for differences you hear. There are some simple ways to test this.

So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?

pma · Nov 28, 2022

Blumlein 88 said:
So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?

IMO it is close to impossible, for a beginner, to prepare and perform the ABX test with e.g. amplifiers properly. Level matching, ground loops, possibility of shorting the output, you name it. To ask laymen for such test leads to some kind of imperfections and such test is useless.

Years ago, when I was interested in listeners opinion on sound of different topologies, I organized listening sessions with 1 - 3 listeners and an A/B tests, perfectly level matched, with DUTs put into the same “black boxes”, the content inside the boxes was not specified and disclosed. The listeners were asked to say (or write notes, if they were more than 1) what they prefer and to write down distinctive sound attributes. This approach seemed to give quite consistent results and the participants were quite interested to take part in such tests, especially if it was disclosed in future what was tested. On the contrary, listeners were seldom willing to take part in a variation on Foobar ABX test.

charleski · Nov 28, 2022

Blumlein 88 said:
I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching.

Rapid switching is indeed essential to pick up differences at the feature level, where you’re comparing the new input against decaying traces in a short-term buffer. But perception has a hierarchical structure and it’s possible to encode higher outputs of the chain in a more robust fashion. Of course the higher up the chain you go, the more the output is a result of interaction with your idiosyncratic perceptual models and the farther it gets from the raw sensory input.

So rapid switching is the optimal way to distinguish actual differences in the raw sensory data, but subjective (and sighted) reviewers are all attempting to discriminate on the basis of higher-level perceptual constructs. They then complain that ABX tests are unnatural as they involve a way of listening that’s very different. This is a fair complaint, as we don’t listen to the same segment of music repeatedly (generally, unless you’re a big fan of dance music).

But I think it’s important to note that it’s perfectly possible to perform an ABX test without rapid switching. As long as levels are matched and the test is properly blinded you’re free to take as long as you want. The only point of rapid switching is to make it easier to detect feature-level differences, but the ABX is still perfectly valid if you want to spend half an hour (or as long as you want) on each candidate before making your choice.

amirm · Nov 28, 2022

The reason for application of controlled testing determines the protocol. Poster says he changed cables and sound improved a ton. I tell them to do the same 10 times blind and come back with the result. He doesn't have to do anything different than what he did sighted. Most outrageous claims fall in this category where levels don't change.

amirm · Nov 28, 2022

Adding on, AB testing is what should be asked, not ABX unless we are dealing with files. ABX testing of hardware requires dedicated ABX switcher which folks don't normally have. ABX testing can be reduced to "AX" testing which is how I do ABX testing anyway.

restorer-john · Nov 28, 2022

I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.

I think there's one ASR member who purchased a Van Alstine ABX comparator as far as I know. One.

So, nobody on ASR (correct me if I'm wrong) is doing even real-time level matched A-B comparisons of multiple (at least 2) amplifiers, be they headphone or speaker amplifiers. Headphones require output switching to the cans themselves, as do speakers.

JSmith · Nov 28, 2022

I think there needs to be some separation in this discussion between picking gear and comparing gear.

Test/measurements results, aesthetics, functionality options and user feedback on build would be appropriate for selecting gear. However comparing gear, and making declarations based upon same, would be where an unsighted AB comparison may be sought if the person making the claim wanted to be able to further explore the results.

JSmith

Blumlein 88 · Nov 28, 2022

restorer-john said:
I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.

I think there's one ASR member who purchased a Van Alstine ABX comparator as far as I know. One.

So, nobody on ASR (correct me if I'm wrong) is doing even real-time level matched A-B comparisons of multiple (at least 2) amplifiers, be they headphone or speaker amplifiers. Headphones require output switching to the cans themselves, as do speakers.

I have done series amplifier testing. Had it arranged so I could include an amp in front of the amp connected to speakers or switch to straight wire bypass. DUT in or out of circuit with a simple line level switchbox. There are details to making that work, but it is quite doable. Not that most people have that setup ready to use. Would be even easier to do with headphone series amp testing.

Sokel · Nov 28, 2022

There's another matter overviewed sometimes.
There is a percentage in population that don't do well under tests.Any tests.
It's not about rebellious attitude or something like that,it's about the stress of the test.
Teachers,professors,even driving instructors,etc know that well and they are able to identify it.
So,sometimes demanding a test can be stressful.

(only for consideration)

Blumlein 88 · Nov 28, 2022

JSmith said:
I think there needs to be some separation in this discussion between picking gear and comparing gear.

Test/measurements results, aesthetics, functionality options and user feedback on build would be appropriate for selecting gear. However comparing gear, and making declarations based upon same, would be where an unsighted AB comparison may be sought if the person making the claim wanted to be able to further explore the results.

JSmith

Maybe, but some people use comparative listening to pick gear. Doing so with very faulty methodology.

restorer-john · Nov 28, 2022

Sokel said:
There's another matter overviewed sometimes.
There is a percentage in population that don't do well under tests.Any tests.
It's not about rebellious attitude or something like that,it's about the stress of the test.
Teachers,professors,even driving instructors,etc know that well and they are able to identify it.
So,sometimes demanding a test can be stressful.

(only for consideration)

This is true. The entire calling for (demanding) ABX tests as some sort of validation that a poster/member has something useful to contribute is simply boorish behaviour in my opinion. It doesn't set the scene for friendly, robust or even respectful discussion. New members will go to ground and keep quiet which is not what a healthy 'community' is about.

I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.

By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.

pma · Nov 28, 2022

Yeah, @restorer-john . A small DC offset like 20mV at the output of a power amplifier is perfectly audible during fast switching. An experienced listener gets his abilities to distinguish between the amps based not on “sound differences”, but on accompanying attributes like this one. As we have armchair designers, we have armchair testers. People who never did the real job.

Shadrach · Nov 28, 2022

ABX testing is not the correct test to discover a preference.
ABX testing should be used to establish whether the listener, in the case of sound reproduction, can hear a difference between one unit and another.
The test tells one nothing about whether one unit is better than another because better is a subjective judgement.
The measurements should tell one whether a unit is above or below a standard.

amirm · Nov 28, 2022

restorer-john said:
I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.

I have post the most number of ABX tests here and I don't do anything of the kind. Nor would your scheme work because foobar ABX randomizes what you are listening to so prior knowledge doesn't help you. Why don't you try replicating the tests I have passed and see how far you get with cheating.

solderdude · Nov 28, 2022

Have done AB tests in the past, also with speaker amps and cables. I learned a lot from them.
Now and then when someone posts files, and it interests me, I have a listen.
It is mostly about audibility thresholds and with music this can go in all directions between very measurable but inaudible all the way up to audible.
Besides they are 'demanding' and take a lot of attempts to become statistically valid. They are generally hard to do when the differences are really small.

One should realize that taking an AB(X) test is only for the one taking it. Not admissible as evidence.
All one can do is to do such a test when one really wants to find out. And then comes the point of how to do it properly. This requires knowledge. I mean even a relay click can give away what is playing.

So all 'we' can do is to post (not demand, but rather suggest) to do this properly and explain why and how. Those that prefer to trust their ears are not going to do that anyway. They clearly heard it, no blind test needed....

amirm · Nov 28, 2022

restorer-john said:
I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.

Not at all my experience. I have run my tests after being challenged that the difference is inaudible. Once you show that it is, then the landscape of discussion changes completely so not a waste of time at all. In almost all of these cases, the files were offered by others who were quite sure that the difference was inaudible. This was proven by vast majority of people failing to pass them because they are level matched, and differences require skill and knowledge to find. See example here of a very difficult test to pass:

amirm · Nov 28, 2022

restorer-john said:
By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.

I have no idea what you are talking about. Here is an example of ABX testing that was done by the very people who popularized such tests:

Double Blind tests did show amplifiers to sound different

In many online debates, position taken by some that when amplifiers are used that have flat frequency response and low distortion that no double blind tests have shown them to sound different. Well, I managed to dig up a 32 year old test that says otherwise. What is fascinating is that one of...

www.audiosciencereview.com

How is this not "comparing one device to another to determine if you can tell them apart?"

Blumlein 88 · Nov 28, 2022

restorer-john said:
This is true. The entire calling for (demanding) ABX tests as some sort of validation that a poster/member has something useful to contribute is simply boorish behaviour in my opinion. It doesn't set the scene for friendly, robust or even respectful discussion. New members will go to ground and keep quiet which is not what a healthy 'community' is about.

I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.

By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.

I have found quite a few posted files to be poorly done where level matching was by ear or SPL meter from speakers. Or others issues like background noises that shouldn't be there etc etc. And yes you feel like you wasted your time. Not all are like that however.

Currently I will run files thru Deltawave before bothering to listen to them in such instances. Mainly to check relative gain, and FR. If those are off, I don't bother.

voodooless · Nov 28, 2022

The whole ABX thing isn't really about ABX testing at all. There would be no need to suggest such a test if people would actually accept established science. So really, I think you should be trying to solve a different problem.

Blumlein 88 said:
I believe the #1 thing to emphasize with any comparative listening is you must match levels precisely. Set a comfortable listening level and measure voltage of test tones at speaker terminals so each component matches within 1%. You cannot do any useful listening comparisons without this step. This one thing even in sighted listening can cause people to experience the disappearance or large reduction in differences they thought they were hearing.

The #2 thing to make clear is that fairly small deviations in frequency response are audible. So checking that might eliminate any need to go further for differences you hear. There are some simple ways to test this.

These things always come up, to no effect. People hear what they want to hear, they read what they want to read. If you really want to change their minds, you must come up with different strategies.

Blumlein 88 · Nov 28, 2022

voodooless said:
The whole ABX thing isn't really about ABX testing at all. There would be no need to suggest such a test if people would actually accept established science. So really, I think you should be trying to solve a different problem.

These things always come up, to no effect. People hear what they want to hear, they read what they want to read. If you really want to change their minds, you must come up with different strategies.

That is what this thread is about. Different strategies. It is not an anti-ABX thread or one questioning its veracity. It is about different approaches that might result in more useful engagement by those who doubt the measurement and blind testing approach.

What to do about the ABX test?

Grand Contributor

Master Contributor

Major Contributor

Founder/Admin

Founder/Admin

Grand Contributor

Master Contributor

Grand Contributor

Master Contributor

Grand Contributor

Grand Contributor

Master Contributor

Addicted to Fun and Learning

Founder/Admin

Grand Contributor

Founder/Admin

Founder/Admin

Grand Contributor

Grand Contributor

Grand Contributor

Similar threads