• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Relevance of Blind Testing

BaaM

Member
Joined
Nov 7, 2019
Messages
58
Likes
96

raistlin65

Major Contributor
Forum Donor
Joined
Nov 13, 2019
Messages
2,279
Likes
3,421
Location
Grand Rapids, MI
I came across a description of a test that seemed to be designed to validate the "long-term evaluation is better" hypothesis:

View attachment 92589

This is from an otherwise good article on audio testing by Stuart Yaniger: https://linearaudio.net/sites/linearaudio.net/files/LA Vol 2 Yaniger(1).pdf

I just took a quick look. Does seem like a good article, and he's clearly behind DBT and volume leveling. And I like he is trying to get at enjoyment of a source. We need more of that kind of qualitative research.

But I don't care for the "long-term evaluation" is better hypothesis. Long term DBT AB testing and ABX testing can both be useful and serve different purposes.

It seems like a bit of a strawman argument to me to claim that somehow the ABX protocol is not as good as longer AB testing. It was never designed for making subjective evaluations between two sources that sound different. So it's a given. Not something that needs hypothesizing. Water is wet, right?

One thing in favor of the ABX testing protocol is audio memory is fleeting, and this is the reason for the X part of the test. And so very subtle differences in sound might be difficult to remember in longer tests. An AB test might come back with a false positive when both devices should be audibly transparent.

I believe that notion of proving long listening AB testing is better generally comes from many subjectivists seeking to discredit ABX testing altogether. Of course, many of them are not properly volume leveling and/or doing sighted testing, but still call it "long listening." So it's also an attempt to muddy the waters.
 

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
7,083
Likes
23,552
Location
Mid-Atlantic, USA. (Maryland)

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,712
Likes
10,408
Location
North-East
I just took a quick look. Does seem like a good article, and he's clearly behind DBT and volume leveling. And I like he is trying to get at enjoyment of a source. We need more of that kind of qualitative research.

But I don't care for the "long-term evaluation" is better hypothesis. Long term DBT AB testing and ABX testing can both be useful and serve different purposes.

It seems like a bit of a strawman argument to me to claim that somehow the ABX protocol is not as good as longer AB testing. It was never designed for making subjective evaluations between two sources that sound different. So it's a given. Not something that needs hypothesizing. Water is wet, right?

One thing in favor of the ABX testing protocol is audio memory is fleeting, and this is the reason for the X part of the test. And so very subtle differences in sound might be difficult to remember in longer tests. An AB test might come back with a false positive when both devices should be audibly transparent.

I believe that notion of proving long listening AB testing is better generally comes from many subjectivists seeking to discredit ABX testing altogether. Of course, many of them are not properly volume leveling and/or doing sighted testing, but still call it "long listening." So it's also an attempt to muddy the waters.

If you've participated in any discussions with subjectivists on the value of blind testing, the claim is invariably made that blind tests, including ABX, are not as sensitive as long-term evaluations. Common claims are made about fatigue, unnatural fast switching, the pressure of taking a test, the need to "absorb subconsciously" the differences over time, etc. I'm sure most, if not all, such claims come from the audiophile belief that they can hear minute differences and can recall them years later for the purpose of an arbitrary A/B comparison ;)

And while such claims are obviously false, there are not many studies to demonstrate that this is the case. At least that I could find. Stuart's test is one such (not a study) and there have been a few others. Stuart didn't prove anything with his test, but I thought the methodology was an interesting approach to test these particular subjectivist claims.
 

PierreV

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
1,449
Likes
4,818
Stuart didn't prove anything with his test, but I thought the methodology was an interesting approach to test these particular subjectivist claims.

Yes, interesting. I have two systems side by side, roughly equivalent in terms of budget, objective and subjective reviews, in house FR mesurements. Yet, I find that I favor one around 70% of the time (and no, it is not the most convenient to turn on).
 

BaaM

Member
Joined
Nov 7, 2019
Messages
58
Likes
96
Stuart's test is one such (not a study) and there have been a few others
So could you please link them?
The higher the number of tests that include this time parameter, the more rational an opinion can be formed about it.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
Recently, I watched a video of John Atkinson's RMAF 2018 presentation "50 years being an audiphile" and he told one interesting story among other things. He said that he attended DBT of amplifiers in the 70s, if i remember correctly, where they basically concluded that all amps sound the same. But after a while, having a regular Quad amp, he found that he didn't enjoy listening to music anymore. So, he got himself a new system, reignited the passion for music listening again and deduced that DBT has some flaws.

He's told that story often.

Its conclusion is flawed, not DBTs. All he gave evidence for was that for him ,'aural' satisfaction depends on more than the sound. He could have tested this by re-doing the DBT *after* he had convinced himself that the Quad amp didn't 'sound good'.

He didn't, because ...why? Well there's the fact that science wasn't really the focus -- and certainly not an arbiter of sound quality -- for Stereophile. Science doesn't sell audio jewelry magazines.
 
Last edited:

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
Indeed.
The placebo effect is very real, yes, and powerful.
A sugar pill can actually cure several illnesses if the subject believes it will work, so it is entirely likely that a piece of hifi will sound better to somebody who expects it to do so.

No, not really.

What it can do as ameliorate effects of the illness that have a large subjective component, i.e., 'feelings' -- pain, stress, fatigue, nausea.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,351
Location
Alfred, NY
“Long term is better” is a valid hypothesis in that it’s specific and falsifiable, there’s just no evidence yet to back it up. My one attempt obviously didn’t give a significant result. Personally, I think it’s incorrect (which is why I’ve put no more energy into it), but at the very least, I showed a method that can be used by someone who wants to prove it true.
 

BaaM

Member
Joined
Nov 7, 2019
Messages
58
Likes
96
“Long term is better” is a valid hypothesis in that it’s specific and falsifiable, there’s just no evidence yet to back it up. My one attempt obviously didn’t give a significant result. Personally, I think it’s incorrect (which is why I’ve put no more energy into it), but at the very least, I showed a method that can be used by someone who wants to prove it true.
Thank you for experimenting, I hope that your essay will be reproduced by others in the future
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
This is why things like e.g. training and fast switching times are important. If they’re to have the best chance of success, subjects should be given a chance to (try to) learn how to identify differences under sighted conditions prior to testing, and should be able to switch at will and near-instantaneously between stimuli during the test.


Except, in the real world, like say this forum, versus the lab, 'subjects' are audio hobbyists, and they *already claim to hear a difference* between A and B.

"Training" is not necessary in this case. They claim to have self-trained. So we are simply testing someone's status quo claim...and their method of determining audio truth.

I would even argue that quick switching isn't necessary, unless that's how the 'subject' has been doing their sighed A/B.

Let the 'subject' repeat their own "A/B" comparison as they normally do...but double blind (and randomize and level match) it. After all, 'even their wives can hear it', so it should be easy.

(In the real world, this doesn't happen either. In every instance I've seen, when golden ears agree to be DBT'd, they demand all sorts of conditions -- which they think will favor them -- to be met, beyond their normal A/B listening practice.)
 
Last edited:

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
I do not deny the psychoacoustic biases to which we are all subject, but I do not think that blind testing is systematically revealing. For example, I have failed quite a few blind tests on whiskeys, which are undoubtedly very different from each other. However, it's an experiment to be done, it's a lot of fun, but I don't draw any definite conclusions from it.
I had read an interesting paper that was shared here on slow listening if you're interested: https://www.audiosciencereview.com/forum/index.php?attachments/aes20547-pdf.42956/


BaaM has determined that DBTs aren't 'systematically revealing' based on his whiskey tests. Alert the media -- or in this case, the fields of sensory testing and psychoacoustics.

I don't care if your listening is fast or slow. If the comparison isn't done blind, it's open to huge error.
 
Last edited:

Wes

Major Contributor
Forum Donor
Joined
Dec 5, 2019
Messages
3,843
Likes
3,790
best thing is to either use a separate amp and hide it

or use an integrated amp that has a real purty faceplate you can look at while listening

nice big VU meters also help with the perception of SQ
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
Except, in the real world, like say this forum, versus the lab, 'subjects' are audio hobbyists, and they *already claim to hear a difference* between A and B.

"Training" is not necessary in this case. They claim to have self-trained. So we are simply testing someone's status quo claim...and their method of determining audio truth.

I would even argue that quick switching isn't necessary, unless that's how the 'subject' has been doing their sighed A/B.

Let the 'subject' repeat their own "A/B" comparison as they normally do...but double blind (and randomize and level match) it. After all, 'even their wives can hear it', so it should be easy.

(In the real world, this doesn't happen either. In every instance I've seen, when golden ears agree to be DBT'd, they demand all sorts of conditions -- which they think will favor them -- to be met, beyond their normal A/B listening practice.)

Well, it depends whether the purpose of the test is to (a) disprove the opinion of some random person on the internet specifically or (b) tend to disprove the contention that there is an audible difference between the components in general.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
Based on how many reviewers notice immediate and obvious differences, then who claim you need extended listening to be able to discern differences in 'fatigue', etc., the evidence is being danced all around, it's just not favorable to those who rely on obfuscation.

Exactly. The reality of the *hobby* is that we are bombarded by subjective reports of instantaneous 'big difference'...or maybe 'after I let it burn in overnight'. Any subsequent demands for 'long term' evaluation as part of a blind protocol, is simply shameless goalpost-shifting.
 

BaaM

Member
Joined
Nov 7, 2019
Messages
58
Likes
96
BaaM has determined that DBTs aren't 'systematically revealing' based on his whiskey test. Alert the media -- or in this case, the fields of sensory testing and psychoacoustics.

I don't care if your listening is fast or slow. If the comparison isn't done blind, it's open to huge error.
If you could stop trolling, I'm sure you would spend a better day!
If you are surrounded by certainties and that you can't stand skepticism, then you have nothing to do on a scientific forum, on the other hand, I can advise you some sects in which you will enjoy yourself.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,197
Likes
3,768
Well, it depends whether the purpose of the test is to (a) disprove the opinion of some random person on the internet specifically or (b) tend to disprove the contention that there is an audible difference between the components in general.

Absolutely. One is audio 'in the trenches'. The other is research.

What I deplore is the constant convenient appeals to one or the other, by subjectivists, as it suits them.

(Fact is, even if high-quality research showed A can be distinguished audibly from B under controlled, optimized conditions , *you* -- meaning, a random person on the internet -- might not be able to do it.)
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
Absolutely. One is audio 'in the trenches'. The other is research.

What I deplore is the constant convenient appeals to one or the other, by subjectivists, as it suits them.

(Fact is, even if high-quality research showed A can be distinguished audibly from B under controlled, optimized conditions , *you* -- meaning, a random person on the internet -- might not be able to do it.)

100% agree.
 

PierreV

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
1,449
Likes
4,818
No, not really.

What it can do as ameliorate effects of the illness that have a large subjective component, i.e., 'feelings' -- pain, stress, fatigue, nausea.

It's a bit wider than that, and there are objectives measurements of the phenomenon in terms of neuro-transmitters release, endogenous opioids release, fMRI, etc... That being said, it doesn't cure cancer, coronavirus pneumonia, broken femurs and the like...

Going back to audio, is there a pleasure difference if dopamine is released by a wallet-emptying faceplate or an abstract S/N ratio? No sure.

In both cases, it seems to be addictive (as expected from a dopamine release): what's the difference between a guy who feels the urge to "upgrade" from a 50kg amplifier with a nice story to a 100kg amplifier with an even nicer story and a guy who feels the urge to abandon a 110dB DAC for a 120 dB one?

The objective economic difference is undisputable though.
 
Top Bottom