• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Relevance of Blind Testing

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,350
Location
Alfred, NY
I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?
 

TSB

Active Member
Joined
Oct 13, 2020
Messages
189
Likes
294
Location
NL
Great example. It intuitively explains why 10 of 14 is easier to get through luck, than 5 of 7 twice. If you pass 5 of 7 twice, your aggregate score can't be lower than 10 of 14. But you can score 10 of 14 without passing 5 of 7 twice (you could get 4 of 7 then 6 of 7, or 3 of 7 then 7 of 7).

Yet this suggests that when computing the overall confidence of a series of tests, we should use Bayes rule, because we know the subject passed each of the shorter tests. He didn't fail some then make it up by doing better on others.
If you're interested in the probability of passing 3 independent tests of 7 guesses each you should be multiplying together P(x >= 4), not P(X == 5).
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,487
Likes
4,113
Location
Pacific Northwest
I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?
I'm not so sure... When flipping a coin, must you wait in between flips or change the setup in order for the flips to be independent events? Of course not.
I think the concept of independence is critical to this discussion. But I don't think independence relies on a time interval or changing the test setup.
 

TSB

Active Member
Joined
Oct 13, 2020
Messages
189
Likes
294
Location
NL
I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?
If there is no audible difference, each guess is a coin flip. That's what we are calculating here: "how likely is this result by chance, assuming the null hypothesis (no audible difference) is true" .

So yes, they are independent events.
 
Last edited:

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,350
Location
Alfred, NY
If there is no audible difference, each guess is a coin flip. That's what we are calculating here: "how likely is this result by chance, assuming the null hypothesis (no audible difference) is true" .

So yes, they are independent events.
In that sense, yes. In the sense of the original question (essentially, are these two separate experiments), no.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,771
Likes
37,636
Question for the stats folks: consider someone who ABX tests and gets 5 of 7. This is 77.3% confidence (22.7% chance to do that well by guessing). He does this in 3 separate/independent tests, on different days. We could aggregate this as 3*5 = 15 of 3*7 = 21 and compute 15 of 21 is 96.1% confidence (3.9% chance to do that well by guessing).

But we could tackle this differently: There are 3 different tests, each independent and having 23% chance to pass by guessing. If you pass all three, the probability you're guessing should be .23 * .23 * .23 = 1.2%. This would be 98.8% confident.

These numbers are different so they can't both be right. Which is correct?
Coming back to this one. Just ran a simple minded spreadsheet.

1140 trials of 7 or 380 trials of 21.
Did this twice.

First time 15 of the sets of 21 had 15 of 21 correct. 3.95%
Second time 17 of the sets of 21 had 15 of 21 correct. 4.47%

First time only 1 of the 15 sets were 3 consecutive 5 of 7 results. .26% This is the only result a bit outside of the predicted range.

The second time 5 of 17 sets were 3 consecutive 5 of 7, and of those two were together meaning 6 consecutive 5 of 7 results. 1.32%

So as I've said, both predictions are correct, because they aren't the same prediction. 15 of 21 is one prediction, and 15 of 21 made up of three 5 of 7 results is more specific and less common. There is really no disagreement.
 
Last edited:

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
I'm not so sure... When flipping a coin, must you wait in between flips or change the setup in order for the flips to be independent events? Of course not.
I think the concept of independence is critical to this discussion. But I don't think independence relies on a time interval or changing the test setup.

Flipping coins is a good example.

After 21 flips, the probability that >=15 flips are tails is not the same as (indeed it’s higher than) the probability of flipping >=5/7 tails three consecutive times.

You could think of it this way: In the latter case, we’ve imposed an additional restriction on the distribution (loosely speaking, a restriction as to the order of outcomes).

This additional restriction lowers the probability of success.
 
  • Like
Reactions: TSB

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,487
Likes
4,113
Location
Pacific Northwest
... So as I've said, both predictions are correct, because they aren't the same prediction. 15 of 21 is one prediction, and 15 of 21 made up of three 5 of 7 results is more specific and less common. There is really no disagreement.

... This additional restriction lowers the probability of success.
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,771
Likes
37,636
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.
Yes so knowing the results, 5 of 7 on two separate tests, is a subset of all the possible ways to get 10 of 14.
 
  • Like
Reactions: TSB

TSB

Active Member
Joined
Oct 13, 2020
Messages
189
Likes
294
Location
NL
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.
I just used a random number generator to get the number 681354325677. The chance of getting this exact number is 10^-12 and we know nothing else happened, so I have witnessed a miracle. :)
 
Last edited:

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.

Yes, but I don’t see the inconsistency there. If you begin a 21-consecutive-trial test, and after the first 7 trials you have 5 correct responses, then your chances of getting >=15 correct responses from the total 21 trials have improved vs what they were at the beginning of the test.

(This has been a really enjoyable little thought experiment btw:))
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
@MRC01 actually, I think I've worked out why we're at cross purposes.

You wrote:
There are 3 different tests, each independent and having 23% chance to pass by guessing. If you pass all three, the probability you're guessing should be .23 * .23 * .23 = 1.2%.

Note the words I've highlighted in bold. The probability that you passed each of the three tests by guessing is not the same thing (qualitatively or quantitatively) as the probability that you got 15 out of 21 in total and were guessing.

EDIT: realise the diagram had an error in it!
 
Last edited:

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
4,250
Likes
11,556
Location
Land O’ Lakes, FL
@MZKM actually, I think I've worked out why we're at cross purposes.

You wrote:


Note the words I've highlighted in bold. The probability that you passed each of the three tests by guessing is not the same thing (qualitatively or quantitatively) as the probability that you got 15 out of 21 in total and were guessing.

Hopefully this Venn diagram explains the subtle differences fairly well:

View attachment 94158

EDIT: realise the shape and scale are a bit weird, but was working it out in MS paint as I went along, lol...
Wrong user.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,487
Likes
4,113
Location
Pacific Northwest
@MRC01 actually, I think I've worked out why we're at cross purposes. ... ...
Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.
 

ahofer

Master Contributor
Forum Donor
Joined
Jun 3, 2019
Messages
5,045
Likes
9,152
Location
New York City
No, not really.

What it can do as ameliorate effects of the illness that have a large subjective component, i.e., 'feelings' -- pain, stress, fatigue, nausea.
It can also make you sick.
7C12A37B-26BD-443E-90ED-D66AD03B4EE6.jpeg
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,487
Likes
4,113
Location
Pacific Northwest
Yeah, I read that study recently. They called it the "nocebo" effect, opposite of "placebo" effect. What you believe affects how the drug interacts with you (or at least how you perceive that interaction). This can work in your favor (if you think it will help) or against you (if you think it will make you sick).
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,408
Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.

Ok, realised the diagram had an error in it. I'm gonna think about this again and get back to you... Was obviously a bit ambitious to try to include all possible outcomes in the diagram, lol.

But I still just don't see the conundrum. The probability of getting >=15/21 while guessing is not and should not be the same as the probability of getting 5/7 three consecutive times while guessing.

Maybe think about it like this?

The probability of "passing" three consecutive 7-trial tests (getting >=5/7) while guessing is an intersection (hence multiplicative) which comprises:

P[getting >=5/7 in the first test while guessing]
AND P[getting >=5/7 in the second test while guessing]
AND P[getting >=5/7 in the third test while guessing]

~= 0.23 * 0.23 * 0.23
~= 0.012

The probability of passing a 21-trial test (i.e. getting >=15/21) is a union (hence additive) which comprises:

P[passing three consecutive 7-trial tests while guessing]
OR (b) failing at least one 7-trial test but nevertheless giving more than 15 correct responses in total while guessing

~= 0.012 + (0.39 - 0.012)
~= 0.39

Or alternatively, passing 3 consecutive 7-trial tests while guessing is a subset of giving >=15 correct responses out of 21 while guessing.

Let me make one last argument. At the moment, we've arbitrarily broken this 21-trial test into 3 x 7-trial tests. Why not keep going, and break it down into 21 x 1-trial tests? What are the consequences of that according to the logic you're applying to the 3 x 7-trial tests?

If you agree with me that treating any n-trial test as n separate tests is the wrong approach, how can you justify in this case choosing 7 as the relevant number of trials in each sub-test? It seems completely arbitrary to me. And then impossible to avoid falling into a problem similar to Zeno's Paradox...
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,771
Likes
37,636
Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.
Alright I think we are in agreement now. But you still have questions.

Let us try this thought experiment.

I tell you I am running some DBT, and so as not to fatigue listeners they'll only do 7 trials. Then after a rest period they'll do 7 more and after another break 7 more. Each subject will do 21 trials total.

I then ask you to predict how many would score 15 of 21 by random chance alone. Let us say I am doing DBT's for USB cables. You'll give me your answer which was what....3.9%. You'll be about right.

Now if I ask you ahead of time, how many do you think will get 15 of 21 by scoring 5 of 7 all three sets of trials? Your answer isn't 3.9% it will be lower and it will come close to being correct when the testing is done. I don't see any conflict there.

And btw, if my second question is how many score 15 of 21 by scoring 4 of 7, 5 of 7 and 6 of 7 you'll also give a lower estimate than 3.9%. It will also turn out to be close to the real results if we run lots of trials.

Does this make sense?
 
Top Bottom