What to do about the ABX test?

krabapple · Nov 28, 2022

Blumlein 88 said:
Seems increased commentary in recent weeks about ABX tests. Much of it stemming from people who come to ASR to set us straight about trusting our ears. I do agree with some who have said that calls for ABX or it didn’t happen have become almost like a club to beat people over the head with, and nearly cultish in how some new posters have the call rain down upon them. Not that I haven’t been guilty of it myself.

Some comments by @restorer-john have caused me to think about this situation. We stand little chance of convincing, or engaging in meaningful discussion with people with this approach. Like restorer-john I think there is a lot more talk of it than participation in or use of ABX listening tests among most posters. For most audiophiles it is impractical for most situations.

Some who don’t like ABX tests complain they are stressful. Only if you feel challenged by it or think you’ll suffer loss of face. After you have done it a couple or three times it isn’t stressful. It is major league TEDIOUS and BORING. Most of us do them with Foobar ABX or similar software. That isn’t very useful for amps and not at all for speakers.

So what is a next best alternative? What is a friendlier way to get the point across? How do regular ASR members pick their gear?

We don't need an 'alternative'. All anyone needs, is to understand and acknowledge the *fact* that their 'sighted' audio claims of difference are going to be subject to cognitive bias. Which means their claims should be accordingly tempered, qualified, or supported with excellent proof.

It's a matter of language, in other words.

Sgt. Ear Ache · Nov 28, 2022

In most cases we're talking about distinctions that could best be described as "infinitesimal." Differences between cables or dacs or (competent) amps...even if we allow that there might possibly be some difference (even if only measurable), the chances of audibility are minute. So, my first suggestion to anyone who thinks they are hearing readily identifiable differences (because that's what is often claimed right? "I just hooked up my new dac and OMG it's like night and day!") should go take a few online ABX tests that are easy to find and see if for instance they can readily tell 320kb mp3s apart from flac files. Because the differences between those is orders of magnitude larger than the differences between almost any 2 dacs or between cables or whatnot. For myself, that's the sort of thing that clearly convinces me that I'm not under any sort of normal listening circumstances distinguishing differences between any two things that measure as close to the same as dacs and amps and cables do. Also, a simple hearing test can go a long way to enlightening anyone who thinks they are hearing ultrasonics...lol. I can't hear anything above 15khz so I don't waste much of my time worrying about a 1db roll-off from 19k up. I mean it's not like the claims go like this - "under extremely careful listening circumstances I could hear subtle differences between this and that." Instead it's people hooking up gear in their living room and based on what they remember their old gear sounding like deciding they can hear remarkable new dimensions in clarity and soundstage and blah blah blah. There's such a degree of ridiculousness to the claims that they basically fall apart if one is open enough to apply even the simplest logic to them...

krabapple · Nov 28, 2022

Blumlein 88 said:
Blind tests are the best most discriminating method. I find I can detect with 100% reliability some very small differences when using two segments of 5 seconds or less and rapid switching. OTOH, some of those I score 50/50 if segments are 15 or 30 seconds long. I have found anything I only hear using the very short segments which both can fit inside my Echoic memory are so small they have zero relevance to normal music listening. So on one hand if you cannot hear something using short rapid switching listening tests it is a pretty sure bet you cannot hear it. On the other if the difference isn’t large enough to hear with 30 second segments it isn’t big enough to matter for music listening.

And I think this is a very, very important consideration. Far too often debates about blind audio tests bog down in 'best case' results...results that pretty much can only be obtained using optimized protocols that are FAR in excess of the sensitivity available during typical listening. (Or, in one infamous case, require meta-analysis to 'extract' from decades of published data).

Yet it is 'typical listening' (not to mention 'sighted') that audiophiles usually base their claims on. Aka 'real world' listening.

So, if a rigorous test, using careful level matching and instant switching and short, maximally revealing snippets of sound, and possibly after some 'training' on same*, does prove that it is possible to do better than p=0.05 telling A from B ....so effing what? Does it mean we should believe you , Joe Audiophile, are hearing it in your La-Z-boy playing Diana Krall? Nope. Don't kid yourself.

(*I am very much thinking of Amir's tests of high-bitrate mp3s vs lossless here, in case you are wondering)

krabapple · Nov 28, 2022

amirm said:
I have no idea what you are talking about. Here is an example of ABX testing that was done by the very people who popularized such tests:

Double Blind tests *did* show amplifiers to sound different

In many online debates, position taken by some that when amplifiers are used that have flat frequency response and low distortion that no double blind tests have shown them to sound different. Well, I managed to dig up a 32 year old test that says otherwise. What is fascinating is that one of...

www.audiosciencereview.com

How is this not "comparing one device to another to determine if you can tell them apart?"

Amps can sound different...e.g., when one or more of them is clipping. No one claims otherwise. I'm invoking the ghost of Arny Kruger to chide you for not noting that here (you noted it in the original ASR post).

krabapple · Nov 28, 2022

Shadrach said:
ABX testing is not the correct test to discover a preference.

Blind conditions are certainly crucial if you want to pin the basis of the preference on the audio alone.

It's why blind testing is of course used in Toole/Olive speaker preference research. Speakers do sound different; but preference can still be biased by non audio factors.

krabapple · Nov 28, 2022

restorer-john said:
Exactly. I made this point only a few days ago. It means absolutely nothing to anyone else, except the person taking the test. It's not the equivalent of some "preference curve"...

Well, no, if the test is actually well-controlled, and the reporting source is not lying, it provides evidence that the difference under test can be heard. That it is 'real'.

What it doesn't mean, by itself, is that you will hear it.

Sgt. Ear Ache · Nov 28, 2022

A blind test on one individual isn't some sort of "preference curve." But a whole bunch of blind tests on a whole bunch of people can certainly be used to establish a preference curve. Much like blind testing food (pepsi vs coke for instance) can establish a taste preference. You can do a thousand tests and then say 66% preferred Pepsi. Such a test would obviously be invalidated if you could see the brand of each as you tasted them. In the same way, you can conduct a whole series of blind tests on speakers, record the results of those tests, and then examine the speakers to determine what audible characteristics the "most-preferred" models share...from which you might be able to say something like "66% of people seem to prefer such and such a tonality."

DonH56 · Nov 28, 2022

pma said:
It used to be a practical solution. Since 2002 till about 2015, we had numerous meetings of audio fans here usually 2-3 days at some hotel, we hired a conference room for listening sessions. However, with the expansion of social networks, this activity ceased, unfortunately. Now we have what we have. Social networks open for everyone, with all its pros and cons.

Professionals versus con-artists?

Good things versus bad things?

Works either way...

ABX is hard to do well when needing to switch HW and not just files in Foobar or whatever. I have now and then wished I had kept my random AB tester built in college with relays and SSI/MSI chips on a breadboard. Push a button to select a random output, and the logic inside kept track of the order for a few (forgotten how many, maybe 16) trials. I had to manually look at the switcher to get the order, then collate the lists (on paper) from the listeners. These days a bright puppy would probably use a Raspberry Pi or something to drive the relays and keep track, maybe give each listener a couple of buttons to choose A or B for each trial and take most of the manual recording out of it.

DonR · Nov 28, 2022

The difference is always "night and day". Even the simplest of AB tests should be able to prove or disprove that. If the differences become "subtle" then there are likely none.

kemmler3D · Nov 28, 2022

A few thoughts on this.

First, what is the goal (explicit or implicit) when a new person shows up, and posts "I heard a clear difference between ______" - and we say to go do an ABX test?

1. Discourage / chase off a possible troll?
2. Enlighten someone who is mistaken about what they heard?
3. Feel superior to the n00b?

I think #1 is best left to mods and #3 is an unfortunate tendency but not a legitimate reason for response. The only proper goal here is #2.

What we're trying to do is bring people around to the science-based point of view - working from most likely explanations for what they experienced, to least.

If I tell you "Hey, I just heard X" and you tell me "Pfft, doubt it, ABX?" enlightenment is not a likely outcome. I think a better approach would be to send them to a noob-friendly guide to common mistakes in listening tests / a guide to proper listening test methodology. More flies with honey than vinegar, etc.

Anyone who has actually conducted a test of any kind is probably open to a scientific mindset. They're already trying harder than 99% of the people out there. But I don't think we can get them there by frustrating and discouraging them in the first post. So I do agree that the "ABX or it didn't happen" approach isn't a good one.

Yes, that message does have to get across eventually. But you will not successfully get it across before you have helped them clear a few more hurdles of understanding.

Rednaxela · Nov 28, 2022

Blumlein 88 said:
So what other things can we do or that some of you do that is useful? What is a more effective way to engage people who don’t understand things about what can and cannot be heard without chiming in over and over “hey, do an ABX test or it didn’t happen”?

Maybe we should simply formulate what the S in ASR means to us. A manifesto of sorts to refer to instead of asking for the dreaded ABX tests.

It’s not like we try and silence everybody about everything all the time in the name of science - it is a very specific kind of claims combined with a certain argumentation pattern that triggers the ABX or else response. These things are and have to be challenged here for very specific reasons.

Perhaps we can try and put these reasons into words? Might be an interesting exercise.

kemmler3D · Nov 28, 2022

Rednaxela said:
Perhaps we can try and put these reasons into words? Might be an interesting exercise.

Agreed, a friendly, informative introductory post, a "stock reply", seems like it would be helpful. It's not uncommon on different forums (Reddit has this a lot) for new posters to get hit with an auto-reply that covers the FAQs and common issues with first posts.

voodooless · Nov 28, 2022

Rednaxela said:
Perhaps we can try and put these reasons into words? Might be an interesting exercise.

How horrible it would be for unsuspecting members to be shown a standard response to their very specific question

You can probably train a chatbot to do this

.

But seriously, a list of standard responses for ever recurring questions would be great. An ASR FAQ or sorts:

Q: I hear the difference between speaker cable A and B, now what?
A: Don’t worry, this is a completely normal an human thing…. Blablablabla… etc…

Obviously we can eternally bikker over the exact content and phrasing, by at least we’ll be annoying each other, and not new members

DonR · Nov 28, 2022

voodooless said:
How horrible it would be for unsuspecting members to be shown a standard response to their very specific question You can probably train a chatbot to do this .

But seriously, a list of standard responses for ever recurring questions would be great. An ASR FAQ or sorts:

Q: I hear the difference between speaker cable A and B, now what?
A: Don’t worry, this is a completely normal an human thing…. Blablablabla… etc…

Obviously we can eternally bikker over the exact content and phrasing, by at least we’ll be annoying each other, and not new members

I think an FAQ would be a great idea. @amirm has made lots of teaching videos that would be useful there.

kemmler3D · Nov 28, 2022

Q) Some jerk keeps telling me to do an ABX test. What is an ABX test and why is it such a big deal?

A) You are being asked about ABX because you're probably trolling, please GTFO N00B!

OK, great start!

But in all seriousness, if there isn't a good new-member-oriented FAQ it could go a long way. Perhaps we could start drafting it collaboratively in Google Docs or similar.

DonH56 · Nov 28, 2022

IME/IMO the problem is that without actually doing the AB(X) testing a listener is unlikely to be convinced a difference he (she, whatever) heard is not really there. A default answer about the fallibility of hearing and links to perceptual studies would be good to have but likely the listener will be unconvinced even if he bothers to read it. Decades ago I was absolutely certain about what I heard and convinced it would be readily discerned in a blind AB test. I was wrong, and it was a humbling introduction into what I thought I heard, versus what was actually there. But "I know what I heard" leads to cementing bias into place and "unhearing" it is virtually impossible, again IME.

That said there are plenty of things that can be heard and pass an AB(X) difference test, including the way an amplifier interacts with a speaker and so forth. Even then the things mentioned earlier can muck up a test... For example, if one amplifier has a higher noise floor, it may be easy to pick that out, even if the actual musical signals are identical between two (or more) amplifiers. That is a problem I was never able to fully resolve way back when I was running tests.

FWIWFM - Don

ahofer · Nov 28, 2022

Sancus said:
I am NOT saying there are NO DIFFERENCES in a literal academic sense. I am saying the correct advice to 99.99% of the people looking for recommendations is "get a basic level of performance from your electronics and then STOP READING ABOUT IT. Instead, focus on speakers/room treatment/room correction/bass management, any of which is far more important."

Agree.

The difference between well-measuring speakers is even pretty small, a difference 99.9% could easily live with. Part of this hobby is splitting hairs, though. even hairs inside the brain, apparently.

Shadrach · Nov 28, 2022

solderdude said:
Suppose a difference can be found in the upper treble (so young ears only) or requires a very good headphone/speakers and or training and 10 people take 'the test' and 9 fail where there is 1 who was trained, had the right gear and could reliably detect the difference. Does that one test prove audibility or are the other 9 truly showing no audibility ?

It depends on the test. The case you put forward is not a comparison between two products and not a fair test. Everyone would need to listen to the same system.

kemmler3D · Nov 28, 2022

DonH56 said:
a listener is unlikely to be convinced a difference he (she, whatever) heard is not really there.

This is true. However, IMO in most cases perceived differences aren't actually pure imagination, they're usually because of imperfect level matching. Telling someone "yes, you heard something real, but it's not what you thought it was" is far easier to accept than "nope, just your brain playing tricks on you".

Getting someone up to speed on fletcher-munson and loudness effects is pretty doable via a FAQ or something, but is a pain in the ass to do every time someone pipes up with "massive differences between DACs" or whatever.

ahofer · Nov 28, 2022

solderdude said:
Suppose a difference can be found in the upper treble (so young ears only) or requires a very good headphone/speakers and or training and 10 people take 'the test' and 9 fail where there is 1 who was trained, had the right gear and could reliably detect the difference. Does that one test prove audibility or are the other 9 truly showing no audibility ?

This is sort of the case with digital resolution trials, which you can do all over the internet. Just a few trained listeners can tell the difference between redbook and higher. Not that many can even do the higher res mp3 vs redbook. Those that can, however, can do it reliably.

It comes back to what we were discussing in my last post (right advice for 99.9%). The difference may be "audible" but so small as to wonder why we care.

What to do about the ABX test?

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Master Contributor

Grand Contributor

Major Contributor

Master Contributor

Master Contributor

Master Contributor

Addicted to Fun and Learning

Master Contributor

Master Contributor

Similar threads