Limitations of blind testing procedures

Jakob1863 · Aug 1, 2017

BE718 said:
Honestly Jakob I just don't have the energy to read through your pages of nebulous waffling to try and figure out,what point you are actually making...If any at all.

Which is surprising, as i give you a precise and (concise i´d say

) answer:
"it is as easy to get incorrect results under "blind" conditions as it is in sighted listening tests"
and that line was to "nebulous" for your liking? Wow!

And you especially asked for evidence and yours truly cited 5 references, in which the data was given that confirmed my assertion, even with short summaries so you wouldn´t have to do the hard reading work.
And still you couldn´t figure it out?

The only thing I have been able to draw so far is that you are trying to discredit blind testing by using the rationale that "anything is possible" without any evidence that something specific is happening.

I´m at a loss why you were so deeply misinterpreting, i mean there is an ongoing discussion in this thread where amirm tries to discredit a controlled "blind" listening test we did, but let´s try again in a thought experiment:
If i would write about the various dangers of driving cars, would that be "trying to discredit car driving" or would it be more to encourage to be careful when driving car and to take appropriate measures to make it more safe?

BE718 said:
Jakob, please answer Amirs question concisely.

said the guy, who answered my questions (wrt details of your golden ear went deaf experiences) with concise .......silence??

amirm · Aug 1, 2017

So let me summarize the situation as the back and forth with Jakob is sure getting quite boring.

It is entirely possible to conduct bad blind tests. All tests need to be examined to see if they are of high quality or not. It is abundantly easy for example to get negative results when differences become small. There can also be other "tells" in the makeup of the test that allows positive outcomes nevertheless.

In audiophile discussions though, we are dealing with high-order bits. Namely, differences that cannot be explained to be there on basis of objective data and psychoacoustics research, are routinely touted to be night and day, present for all to hear. In that sense, the tester is declaring a) differences to be very large and b) has hearing that can easily distinguish differences. Taking those as pre-conditions, much of the concerns such as what I listed above goes away. We can blind the experiment and see if the tester gets the same outcome. If he does not, it proves their conclusion of one device being better than the other is wrong. This doesn't give us scientific data to run with it however. All we can say is that they were wrong in their conclusions and for 99% of discussions on the web, such a conclusion would be quite healthy and useful.

Importantly, as I always say, the right conclusion is supported by multiple references. Blind testing is one. The other is understanding of the system and what it does relative to psychoacoustics. And third is objective measurements. Put all of these together and if all the arrows point the same direction, then you have a high confidence conclusion. One that is millions of times better than any sighted test.

Jakob1863 · Aug 1, 2017

amirm said:
You miss the part that destroys your arguments. Take this part of the abstract prior to getting into cultural differences:<snip>

As i think that providing evidence that confirmed my statements about the failure rate in "same/different" tests and that of course (as stated quite often in other postst) corrobates my argument (it is as easy to get incorrect results in "blind" listening tests as it is in sighted listening tests) i´d be interested which argument you´re referring to and furthermore, where i posted it?

Btw, talking about "destroyed arguments", have you realized by now that you misunderstood Prof. Dranove´s critic on Meyer/Moran and why it didn´t support your argument?
Have you found other external reference to support your argument, and in case that you haven´t, would you admit that your argument was wrong?

This is what we routinely face in high-end audio. Equipment that sounds/measures the same is perceived to be different with preference shown for one gear vs the other. You know, like your amplifier test. This is why I said you can't jump to preference when you know a priori that the outcome should be no difference. By using a control in there, you can be sure if there is a difference is perceived first, then proceed to quantify preference.<snip>

It´s just that we did a directional paired comparison, sure, every participant could do random guessing; please remember what i explained that last time, we analyze the observed data (the results of the listening trials) under the premise that the null hypothesis is true. And our null hypothesis was, as you might know, "random guessing" .

That´s a discrimination task or as you´d call it a test for difference.

Jakob1863 · Aug 1, 2017

amirm said:
There you go. In other words, if you had loaned me the amp, I could have just randomly voted for one of them being better without even listening to them. Any assertion then that it provided extra strength for one amp or the other being better would be false then.

This is why a proper test would have called for multiple trial as to make that test statistically strong on itself.

I know it´s hard to remember, but therefore overall 5 listeners did one trial each on the two preamplifiers.
And, not to forget, for a correct result they had to choose the same preamplifier that i did when doing a controlled "blind" listening test with these two preamplifiers. (In case that you´ve forgotten, i did also a five trial test on these two preamplifiers)

Edit: Last sentence added

Jakob1863 · Aug 1, 2017

amirm said:
There was one and only one variable in that scenario: whether the listener knew the identity of sound or not. The fact that this variable completely changed the outcome does not allow you say anything about the reliability of case #1?

Please try to forget for a moment all the bias that seems to block thinking.

In a hypothesis test if you can´t reject the null hypothesis (because of your results) that only means that you couldn´t reject the null hypothesis. It does _not_ mean that the null hypothesis was confirmed. (External reference for this is every good introductory textbook on hypothesis testing)

That only changes (although it is arguably a matter of philosophy) if it were the most perfect controlled (even blind) listening tests thinkable - or at least a quite good approximation to that.

Did you give any information beside the mere "it was a blind test" ?
NO, so it was a "nothing-burger" and everyone, who knows a bit about sensory tests, would give the same answer as i did. (provided not suffering from those bias in the same way you apparently do).

Is that really so hard to get??

Jakob1863 · Aug 1, 2017

amirm said:
So when you link to those papers, it is best to point out that as much as they may be talking about potential biases existing in blind tests, they outright and with strong conviction damn any sighted tests. That is what is at stake here. Your tendency for using the reference for one small purpose while ignoring the larger one seems totally illogical or else, is evidence of not having read those references. Not sure which one is worse.

And now your are making again things up.
I didn´t ignore something and your are, as every so often in the past, posting your fantasies as fact and i´m sure that is worse.

"That´s what at stake here", really? Get real man.
Please look at the title of this thread......."LIMITATIONS of BLIND TESTING PROCEDURES"

amirm · Aug 1, 2017

Jakob1863 said:
Please try to forget for a moment all the bias that seems to block thinking.

I can't get you to do that so not sure why you give me that advice.

In a hypothesis test if you can´t reject the null hypothesis (because of your results) that only means that you couldn´t reject the null hypothesis. It does _not_ mean that the null hypothesis was confirmed.

You are confused i am afraid. I gave you two data points, not one. One data point was sighted test. The other was blind. I am asking you to judge the totality of those outcomes. You seem to be incapable or unwilling to look at half of that data. Plenty of conclusions can be reached in that scenario which would easily be confirmed with additional tests and all the controls you want to put in there.

More specifically, let's take the post I quoted earlier where the person heard night and day differences in USB cables. I am 100% confident that I can test you and him and have both you fail that blind test no matter how perfect you make the blind tests.

Is that really so hard to get??

You speak from no experience. I have participated in and tested countless people both blind and sighted. In that context the scenario that I presented to you had easy answers yet you don't want to go there because you have something to sell. And that is not sound audio science. That is for sure.

amirm · Aug 1, 2017

Jakob1863 said:
Which is surprising, as i give you a precise and (concise i´d say ) answer:
"it is as easy to get incorrect results under "blind" conditions as it is in sighted listening tests"

You say that on basis of what experience? Did you get incorrect results using your own amplifier test?

amirm · Aug 1, 2017

Jakob1863 said:
And now your are making again things up.
I didn´t ignore something and your are, as every so often in the past, posting your fantasies as fact and i´m sure that is worse.

"That´s what at stake here", really? Get real man.
Please look at the title of this thread......."LIMITATIONS of BLIND TESTING PROCEDURES"

I have pointed out limitations in your blind tests and it is not like you welcomed that.

But sure, point to a peer reviewed published blind test and tell us what you disagree with.

Otherwise what you are trying to do is called FUD: Fear, Uncertainty and Doubt. You throw stone at a category of testing and call it done. That is not the path to be convincing. You need to demonstrate competence in your understanding and conclusions with examples as noted above.

Jakob1863 · Aug 1, 2017

amirm said:
I can't get you to do that so not sure why you give me that advice.

An admirer of cheap comebacks?

(SCR)

You are confused i am afraid. I gave you two data points, not one. One data point was sighted test. The other was blind. I am asking you to judge the totality of those outcomes. You seem to be incapable or unwilling to look at half of that data. Plenty of conclusions can be reached in that scenario which would easily be confirmed with additional tests and all the controls you want to put in there.

If you confuse what you wrote with data, then it might explain our difficulties.
And no, you gave not any information about the sighted listening i.e. the person who did the listening.

Plenty of conclusions? Funny, as you did ask "You would draw what conclusion from this data? Is box A better sounding than B or not?"
so, if "plenty" means "two" i´d agree.

More specifically, let's take the post I quoted earlier where the person heard night and day differences in USB cables. I am 100% confident that I can test you and him and have both you fail that blind test no matter how perfect you make the blind tests.

Might be true (wrt to both, your confidence and reality)

You speak from no experience. I have participated in and tested countless people both blind and sighted. In that context the scenario that I presented to you had easy answers yet you don't want to go there because you have something to sell. And that is not sound audio science. That is for sure.

And there again is amirm as master of mystery and imagination.
You really like to make up a story, don´t you?

amirm · Aug 1, 2017

Jakob1863 said:
And no, you gave not any information about the sighted listening i.e. the person who did the listening.

The person was invariant in both tests. Why would you want to know about him?

Jakob1863 · Aug 1, 2017

amirm said:
I have pointed out limitations in your blind tests and it is not like you welcomed that.

Your recollection is incorrect. I didn´t welcome your faulty descriptions (remember that you said, what we did was the same as sending three coins to three different people) and i surely didn´t like it when you provided misguided statistical arguments without external backup.
I did agree to other points, as you might find confirmed by rereading those posts, admitting that it could have happend, but also pointing out that you were following the "everything is possible approach".

But sure, point to a peer reviewed published blind test and tell us what you disagree with.

Which is a different topic, but step by step, what about you answering my question about your misleading attempt with Prof. Dranove´s letter?

Otherwise what you are trying to do is called FUD: Fear, Uncertainty and Doubt.
You throw stone at a category of testing and call it done. That is not the path to be convincing. You need to demonstrate competence in your understanding and conclusions with examples as noted above.

Especially the line "you throw stone at a category of testing and call it done" interests me, maybe i´ve thrown stone at a category (remember this thread is about "Limitations of blind testing procedures"), but where did i "call it done" ? And please be precise with your answer, cite my sentence(s) correctly. I´d really surprised if you´d find one.

Asking for demonstrations of competence is fair; and, just as a reminder, i delivered already a bit in explaining (although shortly) the foundations of null hypothesis testing to you (which you hopefully only had forgotten) and pointed out, where you provided erroneous statistical arguments and used incorrect statements as arguments (remember your "basic rule of statistics...combining results of nonidentical experiments prohibited"?) .

And maybe, just as a proposal, could we stop with this sort of communication style and try a more constructive way of discussion?
I mean, i like joking, teasing and even sarcasm, but this is maybe a bit to much, hm?

Thomas savage · Aug 1, 2017

I'm bored, nothing new is coming to light.

Unless there is significant progress, has any one got something new to say here ? I'm inclined to shut the thread.

Going round and round in circles to the point of having to summarise things ( the same things ) every few pages is , well unfortunate.

It's detracting from the points being made on both sides so is hurting the argument thus counterproductive.

Thomas savage · Aug 2, 2017

Thread closed, there's been no terrible crime but this subject has reached saturation point.

Limitations of blind testing procedures

Jakob1863

Addicted to Fun and Learning

amirm

Founder/Admin

Jakob1863

Addicted to Fun and Learning

Jakob1863

Addicted to Fun and Learning

Jakob1863

Addicted to Fun and Learning

Jakob1863

Addicted to Fun and Learning

amirm

Founder/Admin

amirm

Founder/Admin

amirm

Founder/Admin

Jakob1863

Addicted to Fun and Learning

amirm

Founder/Admin

Jakob1863

Addicted to Fun and Learning

Thomas savage

Grand Contributor

Thomas savage

Grand Contributor

Similar threads