Therefore it seems to be a good advice to combine quantitive methods (i.e. the conventional listening test approach) with qualitative methods, where effects that are only assessed with difficulties within the usual tests, might be revealed.
There were at least several attempts to use a combination of qualitative and quantitative methods over the years. I´ll cite/link some later or tomorrow.....
Nyberg wrote a thesis about :
Dan Nyberg, An Investigation of Qualitative Research Methodology for Perceptual Audio Evaluation :
https://www.diva-portal.org/smash/get/diva2:990443/FULLTEXT01.pdf
Thanks! I think a paper by Nyberg actually was the one I remembered reading!
Definitely. When I am challenged, I am the best at finding and hearing small differences. If not, it is hard to get motivated to do the same.This is interesting to me. For myself, I find my mood and general state of mind has an enourmous effect on my perception of sound quality.
To me, that says that there are no differences, or so small as to be negligible. Maybe it's due to my age, or jaded by getting on for 50 years in audio engineering, but I can no longer get excited by small trivial differences in audio. So when I do listening tests, and I still do them as needed, if the difference isn't obvious, if I have to listen carefully going back and forth to decide on something, it's not worth the effort. As far as ;'m concerned, then , there IS no difference. Or at least, not one worth bothering about.Interesting read. It is a complex area that is not researched well. When testing small differences, it is very common for people to give up and vote randomly just to get through it all.
It certainly indicates that for the population at large. I do the testing to counter the "it can't be audible." Folks say that without understanding the technology, nor bothering to ever test their assumptions. So I do the test to show that when I say something can be audible, it indeed can be. And that there is a lien we don't want to cross and still say the system is transparent.To me, that says that there are no differences, or so small as to be negligible. Maybe it's due to my age, or jaded by getting on for 50 years in audio engineering, but I can no longer get excited by small trivial differences in audio. So when I do listening tests, and I still do them as needed, if the difference isn't obvious, if I have to listen carefully going back and forth to decide on something, it's not worth the effort. As far as ;'m concerned, then , there IS no difference. Or at least, not one worth bothering about.
To me, that says that there are no differences, or so small as to be negligible. Maybe it's due to my age, or jaded by getting on for 50 years in audio engineering, but I can no longer get excited by small trivial differences in audio. So when I do listening tests, and I still do them as needed, if the difference isn't obvious, if I have to listen carefully going back and forth to decide on something, it's not worth the effort. As far as ;'m concerned, then , there IS no difference. Or at least, not one worth bothering about.
I've not been a fan of ABX testing. I much prefer AA, AB, BB, BA testing where the choice is same/different. I've found that to be very good for finding out whether a difference exists at all. If no difference exists, then any discussion of which is better becomes meaningless. If a difference exists, which can very easily be found from a statistical analysis of the responses, then one can start looking for what sort of differences, and expressing preferences.
Far too many tests, I think, try and decide on which is better before even deciding on whether they're different at all.
S.
The interesting part is that a "same/different" test seems to be even more challenging for participants than an ABX test, not subjectively but wrt the false response proportion, which is usually astonishingly high in the "same trials" (i.e. AA and BB) being usually around 70 - 80 % .
Although is seems to be plausible to listen first for a difference and then for a possible preference (and is usually done in most cases in this order in industrial consumer tests ), in practice most listeners seem to be instantly "looking" for a preference when comparing two devices.
That is in my experience the reason that most people seem to do better in A/B comparisons than in pure difference tests, although they usually still need some time for accommodation (avoiding the somewhat misleading term "training") .
It certainly indicates that for the population at large. I do the testing to counter the "it can't be audible." Folks say that without understanding the technology, nor bothering to ever test their assumptions. So I do the test to show that when I say something can be audible, it indeed can be. And that there is a line we don't want to cross and still say the system is transparent.
But yes, seeing how few to no one goes and replicates my results, and having tested so many others in controlled testing, that my successes don't indicate even the audiophiles can hear them.
Why do you say some people do better in A/B comparisons than pure difference tests?
My limited experiences is people feel better and believe they do better. When even a modicum of control is in place turns out not to be the case.
I think a big confounding issue is people wanting to go straight for preference when they can't or don't demonstrate they can hear a difference.
Because that it is what the data suggests.
To have participants in a test feel comfortable and confident basically isn´t a bad thing. But wrt "modicum of control" i´m talking about results of controlled sensory experiments.
As said before, people not used to do controlled tests still need some accommodation time even if the test protocol does fit better to their usual/normal routine.
I´ve cited some results from comparion of ABX to other protocols (like A/B) but the paired comparison used as a "same/different" is a more difficult one. I´ve cited already earlier a crosscultural study comparing the results when using this "same/different" protocol. The proportion of false responses in the "same trials" is a bit different between different countries (afair one of the asian countries showed the highest "miss rate" ) but is surprisingly large in all countries and is robust over different product categories. The proportion of false responses is usually somewhere between 70 - 80% when evaluation the same stimulus in listening tests, in food tests and even in tests of cigarettes (wrt certain features).
This represents a problem in the traditional statistical analysis but offers the possibility of a more modern analysis where the results of the "same trial" establish a socalled identicallity norm to which the results of the "different trials" can be compared. But that is up to now not trickled down to the normal test routines.
In fact - as outlined above - the big confounding issue is the presentation of "same trials" in tests. As said before, in my experience people are always instantly evaluating if they prefer something when comparing "things", they apparently don´t care so much about a processing order where they should first find a difference and than a preference.
If an established preference exists it means that a difference must exist; in the converse it must not be so.
Of course there exist different models of the internal evalution/judgement processes of humans and which one approximate better a specific situation varies. Researchers are often amazed about the differences between model predictions and real world results.
What about triangle tests? Or duo-trio tests?
I personally like the triangle tests.
Not sure I understand the difference. If you listen to only A and X in ABX test and answer same/different as "X is A"/"X is B", isn't that the same as your AA, AB, ...?I've not been a fan of ABX testing. I much prefer AA, AB, BB, BA testing where the choice is same/different.
Not sure I understand the difference. If you listen to only A and X in ABX test and answer same/different as "X is A"/"X is B", isn't that the same as your AA, AB, ...?
It is actually because you can take the ABX test that way even if B is presented.Could be but then it isn´t an ABX test anymore.