Jakob1863
Addicted to Fun and Learning
Good comment. The interesting and challenging question, I think, is in the areas where there are known objective differences in sound or performance, but which nevertheless often get negative results in blind tests. And tests about preferences. For example lossy vs lossless, high lossy vs lower lossy, good amps vs bad amps, etc. Or much reflections vs few reflections, and so on. How useful are blind tests here? Should we be guided by statistical means in preference testing, and/or buy gear which doesn't measure well but which often can't be distinguished from gear that measures well in abx tests? That's the thing I've been wondering about.
Btw: do you have any links to studies on reflections with opposing conclusions?
Controlled listening tests (i still don´t like the label "blind test" as it is only related to one bias mechanism) are just a tool that helps in sorting things out. Independent and dependent variables were defined and in a perfectly controlled test the result would show the true relationship between dependent and independent variables. In reality there is no perfectly controlled test (and one of the golden rules says "the more rigorous the control the less the pratical importance of the results") and therefore some confounders are influencing the results, making it more and more difficult to have only the independent variable true input.
At that point there is another golden rule "block out what you can and randomize what you can not block out" .
But before any consideration about protocols start one has to decide what hypothesis/question should be tested and if any generalization of results should be possible. It helps a lot to be as clear as possible in the verbalization of the hypothesis. Imo that point was quite often underestimated in audio tests (related to differences overall; the psychophysic tests were traditionally related to only one parameter).
So, as a tool, controlled listening tests could help in preference decisions but there is no magic in it, which means it must be provided that the test itself does not have impact on the result. Sometimes a bit difficult to realize as it depends on the EUT and the listener so might need an extended training time under the specific test conditions.
Especially if preference overall is tested - at that point i beg to differ with Blumlein88´s description, as the "is saltier question" is called a directional test and usually tests for a difference not a preference, while otoh a preference test is quite often undirectional - because preference is based on a multidimensional perception .