Study on blind testing: Is ABX worse than other protocols?

oivavoi · Nov 6, 2021

Howdy folks, thought I should share an interesting study I came across, which was published just now:

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power

Listening tests are key to evaluate the perception of difference between confusable auditory stimuli, among other purposes. In room acoustics, the use…

www.sciencedirect.com

It's behind an academic paywall, but if anybody wants a pdf copy do send me a message and I'll send it to you (academic publishing is nothing but evil capitalism!). Or search for it using sci cough hub cough.

There's been some discussions on blind testing here before, including a mammoth thread some years ago. Does blind testing make us "blind" to real differences which are there, and can be perceived with normal relaxed listening, but which are difficult to perceive with blind testing protocols? That's the subjectivist claim, but there is little evidence to back it up systematically. This study here though is probably the most thorough attempt I've seen at looking at different protocols for blind testing, what the limitations may be, and whether some test protocols perform better than others. I'll try to sum it up as best as I can.

They used 134 test persons, and compared various ways of doing it:

The common ABX method, where one is asked to say whether X is identical to either A or B - where X is often played at the end (or sometimes in the middle)
Same/different approach: Just to say whether A and B are identical or not
CR-DTF: Complicated name, but essentially it is quite similar to ABX, only that "X" is always played first, and then one needs to decide whether A or B is similar to X - it's a XAB test, kind of

It's a bit complicated, but here are the protocols they tried out:

The test was about comparing recordings of room acoustics through headphones (Sennheiser 650) - could the listeners perceive the acoustic conditions as different.

So... drumroll... what did they find?
The ABX protocol, which is a common way of doing blind listening in audio, was actually the worst protocol for discerning differences. Same/different testing did better. But the best results were with the CR-DTF test, i.e. the "XAB" test.

I'm posting the table with the results as well (higher score is better):

I found this very interesting, as the ABX method has long been the most common method of doing blind testing in audio. I was surprised by the fact that the same-different test did not score the highest - that's what I would have assumed initially. The highest scoring method was the CR-DTF or XAB test!

I'm not sure if it's possible to do blind testing through Foobar using the CR-DTF or XAB formula today?

Anyways, maybe this can be of interest to some of you. If any of you smart guys on the forum who actually know something about blind testing procedures (unlike me) read the whole paper, I would be interested in hearing what you think of the study.

abdo123 · Nov 6, 2021

The Foobar ABX extension is actually CR-DTFM based on the descriptions.

oivavoi · Nov 6, 2021

abdo123 said:
The Foobar ABX extension is actually CR-DTFM based on the descriptions.

Cool. It's been a while since I last did blind testing with foobar.

SIY · Nov 6, 2021

I’ll try to sci hub it. In organoleptic testing that I ran, triangle was the method of choice. With most ABX testing, the user can control test order, repetition, and length. Ditto triangle.

oivavoi · Nov 6, 2021

SIY said:
I’ll try to sci hub it. In organoleptic testing that I ran, triangle was the method of choice. With most ABX testing, the user can control test order, repetition, and length. Ditto triangle.

Cool, looking forward to hearing your thoughts, if you get the time to read it.

ahofer · Nov 6, 2021

Interesting paper. But in this hobby, *any* controlled testing is often considered too large an ask. We mustn’t let the perfect blind test be the enemy of all controlled testing.

Soniclife · Nov 6, 2021

SIY said:
With most ABX testing, the user can control test order, repetition, and length. Ditto triangle.

I find this critical to get good results, a regimented presentation is not good. If they have excluded this from their testing you have to question why they did.

DVDdoug · Nov 6, 2021

Does blind testing make us "blind" to real differences which are there, and can be perceived with normal relaxed listening, but which are difficult to perceive with blind testing protocols?

This is a common "audiophile excuse" for "failing" the test. IMO, making a listening test blind NEVER makes it worse!

The ABX protocol, which is a common way of doing blind listening in audio, was actually the worst protocol for discerning differences.

That's hard to believe. But an ABX test is simply to determine IF you can reliably hear a difference. It doesn't tell you which is better or what the differences are.

I didn't read the paper but it says "room acoustics" so I'd assume there IS an audible difference so ABX may not be appropriate.

Same/different approach: Just to say whether A and B are identical or not

An A/B test is pretty useless when A & B are actually identical, unless you are tying to fool the listener. Usually we are comparing two different devices or two different file formats and we want to know if there is an audible difference. The "X" helps to remove any bias or placebo effect, etc., to get a statistically useful result.

oivavoi · Nov 6, 2021

ahofer said:
Interesting paper. But in this hobby, *any* controlled testing is often considered too large an ask. We mustn’t let the perfect blind test be the enemy of all controlled testing.

Very much agreed

oivavoi · Nov 6, 2021

Soniclife said:
I find this critical to get good results, a regimented presentation is not good. If they have excluded this from their testing you have to question why they did.

I think their main "context" is academic blind tests in scientific studies audio and acoustics - they tested the protocols they think are most common in such studies. So not so much how we audio guys on forums do it...

They have a fairly thorough discussion of the literature, both on audio and on sensory testing in other disciplines (food for example), so I don't think they've left anything obvious out on purpose. But I may be wrong, I'm not an expert on blind testing at all.

Soniclife · Nov 6, 2021

oivavoi said:
I think their main "context" is academic blind tests in scientific studies audio and acoustics - they tested the protocols they think are most common in such studies. So not so much how we audio guys on forums do it... They have a fairly thorough discussion of the literature, both on audio and on sensory testing in other disciplines (food for example), so I don't think they've left anything obvious out on purpose. But I may be wrong, I'm not an expert on blind testing at all.

If their focus is to find the best quick test then it makes sense, if it's too find the most sensitive test I don't see how it's good.

Study on blind testing: Is ABX worse than other protocols?

oivavoi

Major Contributor

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power

abdo123

Master Contributor

oivavoi

Major Contributor

SIY

Grand Contributor

oivavoi

Major Contributor

ahofer

Master Contributor

Soniclife

Major Contributor

DVDdoug

Major Contributor

oivavoi

Major Contributor

oivavoi

Major Contributor

Soniclife

Major Contributor

Similar threads

Study on blind testing: Is ABX worse than other protocols?

Major Contributor

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power​

Master Contributor

Major Contributor

Grand Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Similar threads

Listening tests in room acoustics: Comparison of overall difference protocols regarding operational power