You asked me for a reference on why results of different tests cannot be combined and I provided a Journal of AES one. For your counter you tell me about some forum discussion???
Are you really bringing out eristics at this point of our discussion?
I mean, i pointed out where you misunderstood Prof. Dranoves argument, explained why his argument matters wrt to the basic properties of statistical analysis and you dismiss it all by telling all i did was "for your counter you tell me about some forum discussion???"
No, that wasn´t all i did.
Nothing in your posts invalidates what I said which I might add, continues to be your words rather than quoted references.
As i just referred to the basic principles of null hypothesis significance tests i wasn´t aware that you´d need external reference; i´d had assumed you would remember after reading my post.
But anyway, please tell me the statements that you want to be confirmed by external reference:
-) its a matter of NHST
-) the statistical analysis of any experiment is based on a statistical model and the according test statistic
-) Meyer/Moran used the binomial distribution for their analysis
-) the exact binomial test assumes _independence_ of the samples
-) the participants in the Meyer/Moran experiment did only one trial per person, but (mainly) more than five, sometimes even 10 trials per person
-) it presents a problem if the assumption of the statistical model (used for the analysis of the observed data) were violated right from the beginning
I´m more than happy to supply the references that you´ll request.
Before commenting Dranove´s post that you´ve mentioned now, lets go back to your scan of Prof. Dranoves letter to the editor of the JAES; where he wrote (bold letters done by me):
"On a related note, it appears that Meyer and Moran treat all 500 listening tests as
independent observations for the purpose of statistical testing, when in fact
all tests by a
given subject have
correlated results."
Prof. Dranove clearly pointed to the fact that the assumption of _independent_ samples was violated and he concluded therefore:
"
This means that their test statistics are incorrect"
That was the point that he objected; i hope that you now realize that your description:
" Meyer and Moran set up multiple testing facilities where listeners could come and do the testing on their own. Professor Dranove objected to that practice calling the test statistics "incorrect."
was incorrect.
Anyway, here is professor Dranove's first post in that forum:
<snip>
In a nutshell, he is saying you can't combine the results of tests of people who potentially can tell the difference in a test, with a mass of people who cannot. .....
No, in an nutshell he just says that (despite the problemic violation of the independence assumption) that you can combine the result to assess the group overall, but you can´t conclude from these overall results, that _nobody_ could/can hear a difference.
I recently already mentioned the problem of groupwise results used to deduce to single individuals of the group without knowing their individual results.
It can generate what is known as Simpson's paradox. It is like if I take 100 people to diagnose an engine problem and I put one real mechanic in there. The mechanic finds the problem but the rest cannot but in the combined statistic, the data would overwhelmingly show the problem cannot be found.
So, that is another (well known problem) that Prof. Dranove addressed. And of course Simpson´s paradox can occur, but as you might know if you look into the literature, nobody concludes that one cannot combine the results of different independent experiments - and that was what you originally stated - but you have to combine results correctly.
So contrary to your post, he did not just complain about lack of independence.
Ah, eristics again? This time it would be a classical strawman, as i did not wrote, Prof. Dranove "did just complain about lack of independence".
Quite to the contrary, i wrote (bold letters now used):
"According to Prof. Dranoves comments on the sa-cd.net forum, mentioned by Fitzcarraldo215, Meyer/Moran did try external expertise by mathematician and apparently he confirmed
Dranove´s concerns, which included additional critical points like the surprisingly small number of trials with >= 7 hits in 10 trials. (exspected number was 9, observed number was 3)"
If you read the original paper (have you?) you see the issue as I discussed, i.e. multiple venues and setups were used. Some of those setups may have been less revealing than others. If so, you cannot combine their results with others. That again, is like what professor Dranove is giving an example of.
Prof. Dranove only pointed out, that one has to be carefull when combining results and draw conclusions; surely nothing i´d ever expressed to disbelieve. It just does not confirm what you stated in your post back then.
In your amplifier test where the listener hooked up the amp after being shipped to them, could similarly experience other faults that would result in outcomes that are unique in that test configuration, different than what others tested.
As said before, surely everything can happen. But afai understand (maybe i was mistaken) you and others, like BE718, objected strongly using the "everything is possible approach".
That this all happens in favour at the end to exactly the same preamplifier and in addition again to exactly the same preamplifier that i preferred is not likely, but, as said before, even an unlikely event can happen.
But please, then stick to your argument instead of pointing to imagined basic rules that prohibit the combination of test results from different experiments.
I hope, at this point it is obvious that Prof. Dranoves letter and post does not contain support for your assertion; in practice it is the contrary.
A proper test would have gathered everyone in one place and tested with identical setup which prior to the test, was confirmed objectively that all is as it should be. No level differences. No miswiring. No "tells."
First of all, that would have been not a proper, but a totally different test.
As said before, the rationale behind our test concept was, to create no artificial experimental environment, but instead let the participants (who didn´t know about participating in a controlled listening experiment) do, what they usually do when comparing two different "boxes" and in exactly the same way as they usuall do.
That the experimental control isn´t as strict as in a labor situation i´ve clearly pointed out in the description.
Etc. That you throw out this rigor with a smile is sure sign that you are shopping for results. If an outcome of a test is not to your liking, you write a mountain of criticism. But if it is yours (were these tests of your amps?) all of a sudden all that is require is a sticky note on an amp to make a test valid? I don't think so.
The next try on eristics. "...that you are shopping for results" is just an insult, as is the personal remark that follows.
I´m fine with your "i don´t think so", but, as said before, please stick to it, and refrain from using misguided/sloppy/faulty statistical reasoning.