• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. We now measure and review equipment for free! Click here for details.

Double Blind Testing FAQ Development

solderdude

Major Contributor
Joined
Jul 21, 2018
Messages
8,702
Likes
18,322
Location
The Neverlands
#41
But is that of importance when the only objective is to determine if in equal conditions devices can be told apart ?

I see this more as a personal 'audible limits' test with specific test signals. I would say that learning your own audibility thresholds is very important so one can experience what is audible to them or not.
For this there are plenty of audibility tests online.
Maybe it should be defined as one of the per-requisites before one starts testing and or find audibility tests to find out what to listen for when trying to find HD or roll-off or phase issues and level differences or what frequency band has what influence with which Q at which + or - level.

Is the idea to also describe audibility levels and how to test this and where to find certain tests ?

That test will only reveal one single aspect of the entire suite of things that can create an audible difference and would that single aspect matter to the individual who wants to know if there are audible differences ?

Is it going to be a tutorial for dummies ? For experienced folks ? for both ?
 

Spkrdctr

Active Member
Joined
Apr 22, 2021
Messages
143
Likes
115
#42
Maximization of Detection:

"It is important that the listener be given every chance to find audible differences that exist between devices. As such, prior training to become familiar with the test material/protocol is highly encouraged. As is feedback during the testing as to whether a detectable difference has been found."
I agree with most of this stuff but this one I can't let go. First I can tell you the tests need to be "can you hear ANY difference"? If this is the case, why would there need to be feedback during the test. Level matching can probably be within .3 of a db. The huge issue is not matching to .1db or all the extreme scientific stuff. Most tests are ruined by "tells". Most times these tells are subconscious. It comes down to having no visible or auditory noise at all during the test and switching. The tells are so slight people do not know they are happening and then you get an 8/10 pick. Heck even a 7/10 pick would make me re-examine the testing protocol. The beauty of a DBT test is neither the person being tested or the person running the test even knows which unit is being tested. Harder to have tells if you don't know what is being tested. 99% of the time the people involved want to talk to someone between listening tests and a good test is a lonely test. So, I'm against any feedback during the test.
 
Last edited:

pozz

Data Ordinator
Forum Donor
Editor
Joined
May 21, 2019
Messages
3,426
Likes
5,386
#43
I agree with most of this stuff but this one I can't let go. First I can tell you the tests need to be "can you hear ANY difference"? If this is the case, why would there need to be feedback during the test. Level matching can probably be within .3 of a db. The huge issue is not matching to .1db or all the extreme scientific stuff. Most tests are ruined by "tells". Most times these tells are subconscious. It comes down to having no visible ore auditory noise at all during the test and switching. The tells are so slight people do not know they are happening and then you get an 8/10 pick. Heck even a 7/10 pick would make me re-examine the testing protocol. The beauty of a DBT test is neither the person being tested or the person running the test even knows which unit is being tested. Harder to have tells if you don't know what is being tested. 99% of the time the people involved want to talk to someone between listening tests and a good test is a lonely test. So, I'm against any feedback during the test.
The feedback is for training before the test. The listener has to be familiar with the errors or differences they are meant to detect.

Your comments about relaxed level matching are incorrect. Broadband low level differences are detectable, and lead to a lot of talk about differences in the subjective listening experience when comparing devices.
 
Last edited:

Spkrdctr

Active Member
Joined
Apr 22, 2021
Messages
143
Likes
115
#44
Ok. Some things required for a valid ABX or ABC/hr test. (The "same/different" test is no better, btw, in practice.)

1) Both signals must be same level, to within .1dB
2) Both signals must be precisely time aligned. 1 sample at 44100 is barely close enough.
3) The "blinding" must be either double (experimenter and subject) or computer-administered with carefully written programming so that you don't get cues from the computer.
4) You must be able to loop and switch at will, with minimal (10 millisecond) delay.
5) Switching must be clickless and continuous. For digital signals, windowing must be used. For analog signals, it's harder, but you have to find a way that is clickless. Looping must avoid clicks and crunches at the loop point. Again windowing.
6) Both negative (A and B the same) controls, and positive (A and B SHOULD be detectable) controls are absolutely required.
7) There is no time limit.
8) 10 trials is about it for fatigue without a rest period of at least as long as the 10 trials took.
9) Training, first with easy signals, then with harder signals, then with the trial signals, is necessary. During training, feedback must be supplied (right/wrong)
10) It is often useful to use the full ABC/hr paradigm, where you rate the "different" signal on the ITU difference scale.

Yes, this is a pain in the behind.
JJ, I wonder if #4 is needed when the test is to see if someone who states they can EASILY hear a difference in "whatever" product, if they have to only tell if there is any difference? Also, #9, for the same type of test. Why train if the supposed difference is huge, where people are sure anyone even a half deaf person can hear a difference. For example, "I can tell my Pass Labs amp instantly versus that $1000 Denon receiver any day any time". I say just set the other parameteres you specified as close as you can and let them have at it. If 1,2,3,7,8 are met for the type of test I'm talking about, would it be terribly wrong? I say this as I have never seen anyone pass that kind of test I'm describing and it is a lot easier to do by the average guy. I would like your opinion since I have never seen anyone pass a test even though it was not set up as strictly as what you suggest, Would it do any harm?

I'm looking as a failure at the less stringent testing protocol as an even bigger failure when looking for ANY difference at all.
 

Spkrdctr

Active Member
Joined
Apr 22, 2021
Messages
143
Likes
115
#45
The feedback is for training before the test. The listener has to be familiar with the errors or differences they are meant to detect.

Your comments about relaxed level matching are incorrect. Broadband low level differences are detectable, and lead to a lot of talk about differences in the subjective listening experience when comparing devices.
Ahhh, I think I have it now. You guys are talking about testing to see if anyone can hear a certain supposedly audible effect. I always think in much bigger terms as first I haven't seen anyone pass a simple "is there any difference at all between them" type of test. Also, this is when listening to music not a 2 to 5 second clip repeated over and over. I see that type of test having no bearing on home audio. If you are not playing music because you already know it will swamp what you are looking for, then who cares? I mean we can advance the technology, but it is already below our hearing threshold if no one can pass a music test. Of course they can pick whatever music they want. The tests you guys want are probably good for a manufacturer to do for their technology progression, but the home owner, couldn't care less if it is not audible in a realistic scenario. I was thinking of tests to debunk snake oil which is the most common reason I can think of for doing a test. Shut those liars down. So now that I have grasped what was obvious to everyone else, I will slink away and let you guys further the cause.

I'm slinking now..........
 

SIY

Major Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
5,988
Likes
12,682
Location
Phoenix, AZ
#46
JJ, I wonder if #4 is needed when the test is to see if someone who states they can EASILY hear a difference in "whatever" product, if they have to only tell if there is any difference? Also, #9, for the same type of test. Why train if the supposed difference is huge, where people are sure anyone even a half deaf person can hear a difference. For example, "I can tell my Pass Labs amp instantly versus that $1000 Denon receiver any day any time". I say just set the other parameteres you specified as close as you can and let them have at it. If 1,2,3,7,8 are met for the type of test I'm talking about, would it be terribly wrong? I say this as I have never seen anyone pass that kind of test I'm describing and it is a lot easier to do by the average guy. I would like your opinion since I have never seen anyone pass a test even though it was not set up as strictly as what you suggest, Would it do any harm?

I'm looking as a failure at the less stringent testing protocol as an even bigger failure when looking for ANY difference at all.
Well, this is back to my point about defining the question to be answered. Can I hear the difference between A and B? Can a specific person hear the difference between A and B? Can average listeners hear the difference between A and B? Can anyone hear the difference between A and B?

These sorts of questions are quite distinct, as are the experimental designs to answer them. Unfortunately, they’re often conflated.
 

pozz

Data Ordinator
Forum Donor
Editor
Joined
May 21, 2019
Messages
3,426
Likes
5,386
#48
I was thinking of tests to debunk snake oil which is the most common reason I can think of for doing a test.
If you relax the protocols the test will be less valid and the results questionable. Easiest way to deal with the random claims about device differences is to create the right sort of setup where 1) the listener has the best chance to hear differences and 2) the results are clearest.
 

preload

Addicted to Fun and Learning
Forum Donor
Joined
May 19, 2020
Messages
680
Likes
833
Location
California
#50
Hi all, in a now shut down thread (for other reasons), I suggested that it might be nice to develop community FAQ on standards for double blind testing of components. That is, a guide that lays out some basic principles (e.g., same signal path EXCEPT for the two components being compared, both rater and "switcher" blind to which component is current active, etc.) as a FAQ that we can point to. Might include numbers of trials, appropriate statistical tests, etc. I thought I would start this thread to garner suggestions. One hopes (maybe not?) that we can converge on a set of mutually-agreed conditions that can then be synthesized into a FAQ. Amir did point me to: the "bible of blind testing", ITU BS1116. https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1116-1-199710-S!!PDF-E.pdf However, it is pretty cumbersome and runs 20+ pages. I am hoping we could synthesize the critical principles in a page or two.

So open to principles to consider including in a FAQ.
1) Who is the intended audience?
2) What is the goal of this FAQ?

I'm not sure how this can be written without first establishing #1 and #2.
 

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
1,318
Likes
2,745
Location
Berlin, Germany
#51
5) Switching must be clickless and continuous. For digital signals, windowing must be used. For analog signals, it's harder, but you have to find a way that is clickless. Looping must avoid clicks and crunches at the loop point. Again windowing.
Do you mean an actual crossfade for switching, or a fade-out followed by a fade-in, which would not be "continuous"?
Crossfade will produce tells when we have a polarity flip or different phase response between DUTs.
It's sort of a dilemma, because I've found that crossfade is the best method to avoid distraction from the auditory stream being interrupted (basically a "reset") by a gap.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
1,053
Likes
2,014
Location
My dining room.
#52
Do you mean an actual crossfade for switching, or a fade-out followed by a fade-in, which would not be "continuous"?
I mean an actual crossfade that "adds to 1" is best.

Going to silence, or adding a click disturbs short-term loudness memory, which is the most sensitive situation from the physiological side.

Crossfade will produce tells when we have a polarity flip or different phase response between DUTs.
Oh that will stand out with a crossfade, for sure. Everything has to be the same polarity, obviously.
It's sort of a dilemma, because I've found that crossfade is the best method to avoid distraction from the auditory stream being interrupted (basically a "reset") by a gap.
Yes. I'd say "loudness memory disrupted" but that isn't far off your description.
 

solderdude

Major Contributor
Joined
Jul 21, 2018
Messages
8,702
Likes
18,322
Location
The Neverlands
#53
This does complicate things when comparing DAC's. It would have to be done with fade out and fade in with a short 'gap' between them ?
The delay between different DACs would give them away.
 
Last edited:

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
1,318
Likes
2,745
Location
Berlin, Germany
#54
As for the crossfade, maybe one could use a dual crossfade, DUT A --> pink noise of adequate level based on the rms level that was present at the moment --> DUT B to avoid the "loudness memory disruption"? Complicates things, of course.

For real-time compare of DACs one could use a 4-channel playback feeding both DACs (which must be synchronized, of course) simultaneously so that the crossfade can be applied to the input signals (as well as precision level match, polarity), and the analog output signals simple mixed together.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
1,053
Likes
2,014
Location
My dining room.
#55
This does complicate things when comparing DAC's. It would have to be done with fade out and fade in with a short 'gap' between them ?
The delay between different DACs would give them away.
You ***MUST*** back out the delay, precisely, and make sure both DAC's have the same clock. If not, you (*&(*& well will hear "differences".

No clicks, no dropouts, for most sensitive test. Yes, this is hard.
 
Joined
Nov 29, 2020
Messages
438
Likes
379
#56
The use of a positive control is not necessarily warranted or needed, in audio, as we are not often testing absolutes, we are testing "claims". Claims are much different. Claim already define a test condition, i.e. "it is obvious in any half decent system", or "it is obvious in my system". We are not often testing absolutes, i.e. that this cable versus that cable will never make a difference in any system. We are defining say whether a cable makes a difference in a given reference system, as per the claim. Given the that claim revolves around performance in a reference system, the control is already inherent in the experiment definition and hence no positive control to test for sensitivity is necessary.
 
Joined
Nov 29, 2020
Messages
438
Likes
379
#57
You have several options. Level difference of .2dB or so should show up. Small frequency shaping, added noise floor, there are lots of things you could do.

Of course they are not the same as a fantasy do-nothing, but there's nothing to be done about that. Show that the test has sensitivity.
Nope, can't do that. This does not work. The whole point for having a positive control in an audio test is to determine if the system is such that it will make the difference between two items. If you don't know what the difference is specifically, then how can you ever test for masking? Hint, you cannot, so it is usually pointless to try. That is why audio tests are either very "scientific" and test for one very tightly controlled parameter for which you can do a positive control test for system making, or, the audio test is very specific to a given test system, condition, in which there is no need for a positive control as the whole point of the test IS testing whether variance of the item being tested exceeds the masking capability of the rest of the system.
 
Joined
Nov 29, 2020
Messages
438
Likes
379
#58
You ***MUST*** back out the delay, precisely, and make sure both DAC's have the same clock. If not, you (*&(*& well will hear "differences".

No clicks, no dropouts, for most sensitive test. Yes, this is hard.
What is critical is the transition between the two, but that can also be a signal-silence-signal transition, which can usually be done much cleaner than an instant switchover.
 
Joined
Nov 29, 2020
Messages
438
Likes
379
#59
The feedback is for training before the test. The listener has to be familiar with the errors or differences they are meant to detect.

Your comments about relaxed level matching are incorrect. Broadband low level differences are detectable, and lead to a lot of talk about differences in the subjective listening experience when comparing devices.
Wrong. How can you be familiar with the error when you don't know what it is, or if it even exists. I can train people to detect readily known errors, i.e. the "sound" of MP3, frequency anomalies, etc. I can essentially give them a toolset. However, I don't know always that there is a difference, or even what it is. I can let them listen to the two items, but familiarity implies a difference exists when it may not. That obviously does not work.
 
Joined
Nov 29, 2020
Messages
438
Likes
379
#60
If you relax the protocols the test will be less valid and the results questionable. Easiest way to deal with the random claims about device differences is to create the right sort of setup where 1) the listener has the best chance to hear differences and 2) the results are clearest.

Nope. All you have to do is recreate the conditions under which the specific claim was made and introduce a blind test protocol that removes every variable except the one specific to their claim. No more, no less. Their statement defines much of the test, one just needs to wrap it in a blind protol with variable control (i.e. level matching). No need to complicate a ham sandwich.
 
Top Bottom