Relevance of Blind Testing

SIY · Nov 17, 2020

I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?

TSB · Nov 17, 2020

MRC01 said:
Great example. It intuitively explains why 10 of 14 is easier to get through luck, than 5 of 7 twice. If you pass 5 of 7 twice, your aggregate score can't be lower than 10 of 14. But you can score 10 of 14 without passing 5 of 7 twice (you could get 4 of 7 then 6 of 7, or 3 of 7 then 7 of 7).

Yet this suggests that when computing the overall confidence of a series of tests, we should use Bayes rule, because we know the subject passed each of the shorter tests. He didn't fail some then make it up by doing better on others.

If you're interested in the probability of passing 3 independent tests of 7 guesses each you should be multiplying together P(x >= 4), not P(X == 5).

MRC01 · Nov 17, 2020

SIY said:
I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?

I'm not so sure... When flipping a coin, must you wait in between flips or change the setup in order for the flips to be independent events? Of course not.
I think the concept of independence is critical to this discussion. But I don't think independence relies on a time interval or changing the test setup.

TSB · Nov 17, 2020

SIY said:
I would argue that these are not independent events as long as the test setup hasn’t changed. Would you consider a one minute break between trial groups something that would make this two events? Five minutes? An hour?

If there is no audible difference, each guess is a coin flip. That's what we are calculating here: "how likely is this result by chance, assuming the null hypothesis (no audible difference) is true" .

So yes, they are independent events.

SIY · Nov 17, 2020

Timon VDB said:
If there is no audible difference, each guess is a coin flip. That's what we are calculating here: "how likely is this result by chance, assuming the null hypothesis (no audible difference) is true" .

So yes, they are independent events.

In that sense, yes. In the sense of the original question (essentially, are these two separate experiments), no.

Blumlein 88 · Nov 17, 2020

MRC01 said:
Question for the stats folks: consider someone who ABX tests and gets 5 of 7. This is 77.3% confidence (22.7% chance to do that well by guessing). He does this in 3 separate/independent tests, on different days. We could aggregate this as 3*5 = 15 of 3*7 = 21 and compute 15 of 21 is 96.1% confidence (3.9% chance to do that well by guessing).

But we could tackle this differently: There are 3 different tests, each independent and having 23% chance to pass by guessing. If you pass all three, the probability you're guessing should be .23 * .23 * .23 = 1.2%. This would be 98.8% confident.

These numbers are different so they can't both be right. Which is correct?

Coming back to this one. Just ran a simple minded spreadsheet.

1140 trials of 7 or 380 trials of 21.
Did this twice.

First time 15 of the sets of 21 had 15 of 21 correct. 3.95%
Second time 17 of the sets of 21 had 15 of 21 correct. 4.47%

First time only 1 of the 15 sets were 3 consecutive 5 of 7 results. .26% This is the only result a bit outside of the predicted range.

The second time 5 of 17 sets were 3 consecutive 5 of 7, and of those two were together meaning 6 consecutive 5 of 7 results. 1.32%

So as I've said, both predictions are correct, because they aren't the same prediction. 15 of 21 is one prediction, and 15 of 21 made up of three 5 of 7 results is more specific and less common. There is really no disagreement.

andreasmaaan · Nov 17, 2020

MRC01 said:
I'm not so sure... When flipping a coin, must you wait in between flips or change the setup in order for the flips to be independent events? Of course not.
I think the concept of independence is critical to this discussion. But I don't think independence relies on a time interval or changing the test setup.

Flipping coins is a good example.

After 21 flips, the probability that >=15 flips are tails is not the same as (indeed it’s higher than) the probability of flipping >=5/7 tails three consecutive times.

You could think of it this way: In the latter case, we’ve imposed an additional restriction on the distribution (loosely speaking, a restriction as to the order of outcomes).

This additional restriction lowers the probability of success.

MRC01 · Nov 17, 2020

Blumlein 88 said:
... So as I've said, both predictions are correct, because they aren't the same prediction. 15 of 21 is one prediction, and 15 of 21 made up of three 5 of 7 results is more specific and less common. There is really no disagreement.

andreasmaaan said:
... This additional restriction lowers the probability of success.

Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.

Blumlein 88 · Nov 17, 2020

MRC01 said:
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.

Yes so knowing the results, 5 of 7 on two separate tests, is a subset of all the possible ways to get 10 of 14.

TSB · Nov 17, 2020

MRC01 said:
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.

I just used a random number generator to get the number 681354325677. The chance of getting this exact number is 10^-12 and we know nothing else happened, so I have witnessed a miracle.

andreasmaaan · Nov 17, 2020

MRC01 said:
Right. But this restriction represents our knowledge of the actual test results. Suppose we know that somebody scored 5 of 7 on 2 separate tests. Then using 10 of 14 includes other outcomes (like 4 and 6, 3 and 7) that we know did not happen.

Yes, but I don’t see the inconsistency there. If you begin a 21-consecutive-trial test, and after the first 7 trials you have 5 correct responses, then your chances of getting >=15 correct responses from the total 21 trials have improved vs what they were at the beginning of the test.

(This has been a really enjoyable little thought experiment btw

)

MRC01 · Nov 17, 2020

Yes, great discussion. I see both sides and can compute it either way. Just trying to convince myself which applies.

andreasmaaan · Nov 17, 2020

@MRC01 actually, I think I've worked out why we're at cross purposes.

You wrote:

There are 3 different tests, each independent and having 23% chance to pass by guessing. If you pass all three, the probability you're guessing should be .23 * .23 * .23 = 1.2%.

Note the words I've highlighted in bold. The probability that you passed each of the three tests by guessing is not the same thing (qualitatively or quantitatively) as the probability that you got 15 out of 21 in total and were guessing.

EDIT: realise the diagram had an error in it!

MZKM · Nov 17, 2020

andreasmaaan said:
@MZKM actually, I think I've worked out why we're at cross purposes.

You wrote:

Note the words I've highlighted in bold. The probability that you passed each of the three tests by guessing is not the same thing (qualitatively or quantitatively) as the probability that you got 15 out of 21 in total and were guessing.

Hopefully this Venn diagram explains the subtle differences fairly well:

View attachment 94158

EDIT: realise the shape and scale are a bit weird, but was working it out in MS paint as I went along, lol...

Wrong user.

andreasmaaan · Nov 17, 2020

MZKM said:
Wrong user.

Oops, thanks. Edited

MRC01 · Nov 17, 2020

andreasmaaan said:
@MRC01 actually, I think I've worked out why we're at cross purposes. ... ...

Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.

ahofer · Nov 17, 2020

krabapple said:
No, not really.

What it can do as ameliorate effects of the illness that have a large subjective component, i.e., 'feelings' -- pain, stress, fatigue, nausea.

It can also make you sick.

MRC01 · Nov 17, 2020

Yeah, I read that study recently. They called it the "nocebo" effect, opposite of "placebo" effect. What you believe affects how the drug interacts with you (or at least how you perceive that interaction). This can work in your favor (if you think it will help) or against you (if you think it will make you sick).

andreasmaaan · Nov 17, 2020

MRC01 said:
Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.

Ok, realised the diagram had an error in it. I'm gonna think about this again and get back to you... Was obviously a bit ambitious to try to include all possible outcomes in the diagram, lol.

But I still just don't see the conundrum. The probability of getting >=15/21 while guessing is not and should not be the same as the probability of getting 5/7 three consecutive times while guessing.

Maybe think about it like this?

The probability of "passing" three consecutive 7-trial tests (getting >=5/7) while guessing is an intersection (hence multiplicative) which comprises:

P[getting >=5/7 in the first test while guessing]
AND P[getting >=5/7 in the second test while guessing]
AND P[getting >=5/7 in the third test while guessing]

~= 0.23 * 0.23 * 0.23
~= 0.012

The probability of passing a 21-trial test (i.e. getting >=15/21) is a union (hence additive) which comprises:

P[passing three consecutive 7-trial tests while guessing]
OR (b) failing at least one 7-trial test but nevertheless giving more than 15 correct responses in total while guessing

~= 0.012 + (0.39 - 0.012)
~= 0.39

Or alternatively, passing 3 consecutive 7-trial tests while guessing is a subset of giving >=15 correct responses out of 21 while guessing.

Let me make one last argument. At the moment, we've arbitrarily broken this 21-trial test into 3 x 7-trial tests. Why not keep going, and break it down into 21 x 1-trial tests? What are the consequences of that according to the logic you're applying to the 3 x 7-trial tests?

If you agree with me that treating any n-trial test as n separate tests is the wrong approach, how can you justify in this case choosing 7 as the relevant number of trials in each sub-test? It seems completely arbitrary to me. And then impossible to avoid falling into a problem similar to Zeno's Paradox...

Blumlein 88 · Nov 17, 2020

MRC01 said:
Yep, of course it's not. They're clearly 2 different things. One is more specific than the other, and thus lower probability / higher confidence.
The question I'm struggling with is: which of these 2 approaches represents the correct probabilities?
I can think of rational explanations for each, yet I can also think of conundrums that each leads to.

Alright I think we are in agreement now. But you still have questions.

Let us try this thought experiment.

I tell you I am running some DBT, and so as not to fatigue listeners they'll only do 7 trials. Then after a rest period they'll do 7 more and after another break 7 more. Each subject will do 21 trials total.

I then ask you to predict how many would score 15 of 21 by random chance alone. Let us say I am doing DBT's for USB cables. You'll give me your answer which was what....3.9%. You'll be about right.

Now if I ask you ahead of time, how many do you think will get 15 of 21 by scoring 5 of 7 all three sets of trials? Your answer isn't 3.9% it will be lower and it will come close to being correct when the testing is done. I don't see any conflict there.

And btw, if my second question is how many score 15 of 21 by scoring 4 of 7, 5 of 7 and 6 of 7 you'll also give a lower estimate than 3.9%. It will also turn out to be close to the real results if we run lots of trials.

Does this make sense?

Relevance of Blind Testing

Grand Contributor

Active Member

Major Contributor

Active Member

Grand Contributor

Grand Contributor

Master Contributor

Major Contributor

Grand Contributor

Active Member

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Master Contributor

Grand Contributor

Similar threads