• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

What to do about the ABX test?

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
Seems increased commentary in recent weeks about ABX tests. Much of it stemming from people who come to ASR to set us straight about trusting our ears. I do agree with some who have said that calls for ABX or it didn’t happen have become almost like a club to beat people over the head with, and nearly cultish in how some new posters have the call rain down upon them. Not that I haven’t been guilty of it myself.

Some comments by @restorer-john have caused me to think about this situation. We stand little chance of convincing, or engaging in meaningful discussion with people with this approach. Like restorer-john I think there is a lot more talk of it than participation in or use of ABX listening tests among most posters. For most audiophiles it is impractical for most situations.

Some who don’t like ABX tests complain they are stressful. Only if you feel challenged by it or think you’ll suffer loss of face. After you have done it a couple or three times it isn’t stressful. It is major league TEDIOUS and BORING. Most of us do them with Foobar ABX or similar software. That isn’t very useful for amps and not at all for speakers.

So what is a next best alternative? What is a friendlier way to get the point across? How do regular ASR members pick their gear?

Blind tests are the best most discriminating method. I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching. OTOH, some of those I score 50/50 if segments are 15 or 30 seconds long. I have found anything I only hear using the very short segments which both can fit inside my Echoic memory are so small they have zero relevance to normal music listening. So on one hand if you cannot hear something using short rapid switching listening tests it is a pretty sure bet you cannot hear it. On the other if the difference isn’t large enough to hear with 30 second segments it isn’t big enough to matter for music listening.

I believe the #1 thing to emphasize with any comparative listening is you must match levels precisely. Set a comfortable listening level and measure voltage of test tones at speaker terminals so each component matches within 1%. You cannot do any useful listening comparisons without this step. This one thing even in sighted listening can cause people to experience the disappearance or large reduction in differences they thought they were hearing.

The #2 thing to make clear is that fairly small deviations in frequency response are audible. So checking that might eliminate any need to go further for differences you hear. There are some simple ways to test this.

So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?
 

pma

Major Contributor
Joined
Feb 23, 2019
Messages
4,591
Likes
10,727
Location
Prague
So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?
IMO it is close to impossible, for a beginner, to prepare and perform the ABX test with e.g. amplifiers properly. Level matching, ground loops, possibility of shorting the output, you name it. To ask laymen for such test leads to some kind of imperfections and such test is useless.

Years ago, when I was interested in listeners opinion on sound of different topologies, I organized listening sessions with 1 - 3 listeners and an A/B tests, perfectly level matched, with DUTs put into the same “black boxes”, the content inside the boxes was not specified and disclosed. The listeners were asked to say (or write notes, if they were more than 1) what they prefer and to write down distinctive sound attributes. This approach seemed to give quite consistent results and the participants were quite interested to take part in such tests, especially if it was disclosed in future what was tested. On the contrary, listeners were seldom willing to take part in a variation on Foobar ABX test.
 

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,239
Location
Manchester UK
I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching.
Rapid switching is indeed essential to pick up differences at the feature level, where you’re comparing the new input against decaying traces in a short-term buffer. But perception has a hierarchical structure and it’s possible to encode higher outputs of the chain in a more robust fashion. Of course the higher up the chain you go, the more the output is a result of interaction with your idiosyncratic perceptual models and the farther it gets from the raw sensory input.

So rapid switching is the optimal way to distinguish actual differences in the raw sensory data, but subjective (and sighted) reviewers are all attempting to discriminate on the basis of higher-level perceptual constructs. They then complain that ABX tests are unnatural as they involve a way of listening that’s very different. This is a fair complaint, as we don’t listen to the same segment of music repeatedly (generally, unless you’re a big fan of dance music).

But I think it’s important to note that it’s perfectly possible to perform an ABX test without rapid switching. As long as levels are matched and the test is properly blinded you’re free to take as long as you want. The only point of rapid switching is to make it easier to detect feature-level differences, but the ABX is still perfectly valid if you want to spend half an hour (or as long as you want) on each candidate before making your choice.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
The reason for application of controlled testing determines the protocol. Poster says he changed cables and sound improved a ton. I tell them to do the same 10 times blind and come back with the result. He doesn't have to do anything different than what he did sighted. Most outrageous claims fall in this category where levels don't change.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
Adding on, AB testing is what should be asked, not ABX unless we are dealing with files. ABX testing of hardware requires dedicated ABX switcher which folks don't normally have. ABX testing can be reduced to "AX" testing which is how I do ABX testing anyway.
 

restorer-john

Grand Contributor
Joined
Mar 1, 2018
Messages
12,579
Likes
38,280
Location
Gold Coast, Queensland, Australia
I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.

I think there's one ASR member who purchased a Van Alstine ABX comparator as far as I know. One.

So, nobody on ASR (correct me if I'm wrong) is doing even real-time level matched A-B comparisons of multiple (at least 2) amplifiers, be they headphone or speaker amplifiers. Headphones require output switching to the cans themselves, as do speakers.
 

JSmith

Master Contributor
Joined
Feb 8, 2021
Messages
5,153
Likes
13,214
Location
Algol Perseus
I think there needs to be some separation in this discussion between picking gear and comparing gear.

Test/measurements results, aesthetics, functionality options and user feedback on build would be appropriate for selecting gear. However comparing gear, and making declarations based upon same, would be where an unsighted AB comparison may be sought if the person making the claim wanted to be able to further explore the results.


JSmith
 
OP
Blumlein 88

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.

I think there's one ASR member who purchased a Van Alstine ABX comparator as far as I know. One.

So, nobody on ASR (correct me if I'm wrong) is doing even real-time level matched A-B comparisons of multiple (at least 2) amplifiers, be they headphone or speaker amplifiers. Headphones require output switching to the cans themselves, as do speakers.
I have done series amplifier testing. Had it arranged so I could include an amp in front of the amp connected to speakers or switch to straight wire bypass. DUT in or out of circuit with a simple line level switchbox. There are details to making that work, but it is quite doable. Not that most people have that setup ready to use. Would be even easier to do with headphone series amp testing.
 

Sokel

Master Contributor
Joined
Sep 8, 2021
Messages
5,840
Likes
5,775
There's another matter overviewed sometimes.
There is a percentage in population that don't do well under tests.Any tests.
It's not about rebellious attitude or something like that,it's about the stress of the test.
Teachers,professors,even driving instructors,etc know that well and they are able to identify it.
So,sometimes demanding a test can be stressful.

(only for consideration)
 
OP
Blumlein 88

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
I think there needs to be some separation in this discussion between picking gear and comparing gear.

Test/measurements results, aesthetics, functionality options and user feedback on build would be appropriate for selecting gear. However comparing gear, and making declarations based upon same, would be where an unsighted AB comparison may be sought if the person making the claim wanted to be able to further explore the results.


JSmith
Maybe, but some people use comparative listening to pick gear. Doing so with very faulty methodology.
 

restorer-john

Grand Contributor
Joined
Mar 1, 2018
Messages
12,579
Likes
38,280
Location
Gold Coast, Queensland, Australia
There's another matter overviewed sometimes.
There is a percentage in population that don't do well under tests.Any tests.
It's not about rebellious attitude or something like that,it's about the stress of the test.
Teachers,professors,even driving instructors,etc know that well and they are able to identify it.
So,sometimes demanding a test can be stressful.

(only for consideration)

This is true. The entire calling for (demanding) ABX tests as some sort of validation that a poster/member has something useful to contribute is simply boorish behaviour in my opinion. It doesn't set the scene for friendly, robust or even respectful discussion. New members will go to ground and keep quiet which is not what a healthy 'community' is about.

I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.

By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.
 

pma

Major Contributor
Joined
Feb 23, 2019
Messages
4,591
Likes
10,727
Location
Prague
Yeah, @restorer-john . A small DC offset like 20mV at the output of a power amplifier is perfectly audible during fast switching. An experienced listener gets his abilities to distinguish between the amps based not on “sound differences”, but on accompanying attributes like this one. As we have armchair designers, we have armchair testers. People who never did the real job.
 

Shadrach

Addicted to Fun and Learning
Joined
Feb 24, 2019
Messages
662
Likes
947
ABX testing is not the correct test to discover a preference.
ABX testing should be used to establish whether the listener, in the case of sound reproduction, can hear a difference between one unit and another.
The test tells one nothing about whether one unit is better than another because better is a subjective judgement.
The measurements should tell one whether a unit is above or below a standard.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
I'd suggest that of all the actual ABX tests actually done by ASR members, the overwhelming majority would be on digital files. Those digital files are, of course trivial to analyse prior to performing a foobar style ABX. So, they go in with knowledge before the test and likely are already keyed into what to listen for to obtain a set of results worth posting.
I have post the most number of ABX tests here and I don't do anything of the kind. Nor would your scheme work because foobar ABX randomizes what you are listening to so prior knowledge doesn't help you. Why don't you try replicating the tests I have passed and see how far you get with cheating.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
15,891
Likes
35,912
Location
The Neitherlands
Have done AB tests in the past, also with speaker amps and cables. I learned a lot from them.
Now and then when someone posts files, and it interests me, I have a listen.
It is mostly about audibility thresholds and with music this can go in all directions between very measurable but inaudible all the way up to audible.
Besides they are 'demanding' and take a lot of attempts to become statistically valid. They are generally hard to do when the differences are really small.

One should realize that taking an AB(X) test is only for the one taking it. Not admissible as evidence.
All one can do is to do such a test when one really wants to find out. And then comes the point of how to do it properly. This requires knowledge. I mean even a relay click can give away what is playing.

So all 'we' can do is to post (not demand, but rather suggest) to do this properly and explain why and how. Those that prefer to trust their ears are not going to do that anyway. They clearly heard it, no blind test needed....
 
Last edited:

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.
Not at all my experience. I have run my tests after being challenged that the difference is inaudible. Once you show that it is, then the landscape of discussion changes completely so not a waste of time at all. In almost all of these cases, the files were offered by others who were quite sure that the difference was inaudible. This was proven by vast majority of people failing to pass them because they are level matched, and differences require skill and knowledge to find. See example here of a very difficult test to pass:

 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.
I have no idea what you are talking about. Here is an example of ABX testing that was done by the very people who popularized such tests:


i-NVbTMcL-XL.png

How is this not "comparing one device to another to determine if you can tell them apart?"
 
OP
Blumlein 88

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
This is true. The entire calling for (demanding) ABX tests as some sort of validation that a poster/member has something useful to contribute is simply boorish behaviour in my opinion. It doesn't set the scene for friendly, robust or even respectful discussion. New members will go to ground and keep quiet which is not what a healthy 'community' is about.

I've done too many ASR ABX tests (in threads) just for fun, but they quickly become no fun whatsoever and rather pointless. My efforts of several minutes of intense listening and concentration- for what? Somebody's amusement and then embarrassment when it becomes obvious the 'carefully matched' tracks are far from that, or the channels have differences in levels/balance which give away the source file. i.e. waste of my time and others.

By the time you level match, even out channel balance, account for switching transients, mask residual noise in one device to equal another, filter out hum and even-out the frequency response, what do you have? Nothing useful. You are not comparing one device to another to determine if you can reliably tell them apart. It's not remotely representative of the real world or real world comparisons.
I have found quite a few posted files to be poorly done where level matching was by ear or SPL meter from speakers. Or others issues like background noises that shouldn't be there etc etc. And yes you feel like you wasted your time. Not all are like that however.

Currently I will run files thru Deltawave before bothering to listen to them in such instances. Mainly to check relative gain, and FR. If those are off, I don't bother.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,226
Likes
17,805
Location
Netherlands
The whole ABX thing isn't really about ABX testing at all. There would be no need to suggest such a test if people would actually accept established science. So really, I think you should be trying to solve a different problem.
I believe the #1 thing to emphasize with any comparative listening is you must match levels precisely. Set a comfortable listening level and measure voltage of test tones at speaker terminals so each component matches within 1%. You cannot do any useful listening comparisons without this step. This one thing even in sighted listening can cause people to experience the disappearance or large reduction in differences they thought they were hearing.

The #2 thing to make clear is that fairly small deviations in frequency response are audible. So checking that might eliminate any need to go further for differences you hear. There are some simple ways to test this.
These things always come up, to no effect. People hear what they want to hear, they read what they want to read. If you really want to change their minds, you must come up with different strategies.
 
OP
Blumlein 88

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
The whole ABX thing isn't really about ABX testing at all. There would be no need to suggest such a test if people would actually accept established science. So really, I think you should be trying to solve a different problem.

These things always come up, to no effect. People hear what they want to hear, they read what they want to read. If you really want to change their minds, you must come up with different strategies.
That is what this thread is about. Different strategies. It is not an anti-ABX thread or one questioning its veracity. It is about different approaches that might result in more useful engagement by those who doubt the measurement and blind testing approach.
 
Top Bottom