Just uploaded a video on the basics of performing controlled tests in audio. It was motivated by saving myself text in having to write all of this down when telling someone who to do these tests right. And as a counter to a few online personalities to keep saying they do "blind" tests yet when...
www.audiosciencereview.com
Seems increased commentary in recent weeks about ABX tests. Much of it stemming from people who come to ASR to set us straight about trusting our ears. I do agree with some who have said that calls for ABX or it didn’t happen have become almost like a club to beat people over the head with, and...
www.audiosciencereview.com
Ipse Dixit.
I will not particularly criticise Amir, but I'm looking forward to the day when he finally fixed the groundloops/pin1 problem plaguing his AP2 measurements, something really basic.
I generally use the methods propsed and endorsed by the VDT (Verein Deutscher Tonmeister) for subjective testing.
And I use deliberately a protocol where it is known to the listening panel (I am way to biased to just trust myself) only that we test if a specific difference in hardware, that is unknown to the panel in nature or specifics until AFTER the test is completed.
Plus, they are made aware that I might run a Bavarian fire drill (which I sometimes did), that is introduce deliberately a clearly audible difference (e.g. 1dB louder), just to keep everyone on their toes.
This eliminates any BIAS, because if you do not know the difference (and there may either be no difference or a clearly audible one) you must drop bias and actually listen.
I always include two identical units and two identically different ones. They are otherwise (excluding the change being evaluated) identical and confirmed by AP2 to measure identical sufficiently to at least pass the minimum requirements set by the ABX crowd.
Listening itself is single blind, that is listeners are faced with four outwardly identical boxes, identified by symbols, to avoid numerical or alphabetical bias.
The listening actually mostly resembles sighted listening as common among audiophiles, to avoid adding stress. Though we commonly have staff swap out units and do the legwork. The point being, I want to know what listeners actually hear, not what they expect.
In one test we laced enough same colour units, so two were one cour and two another, naturally differences were set across the colours.
I found that for most of our experienced listeners colour preference overrode listening! Rerunning the same test with all colours the same a few weeks later showed preferences that with good statistical confidence were based on the other physical differences (not colour).
This was actually one of the tests that caused me to do deeper research into preference (in general) and go from alphabet ID to symbols and take many more precautions to really "blind" the panel to the differences, going as far as covering serial numbers and swapping symbols during lunch break etc.
Listeners are asked to give scores for preference, not to determine same/different. Simply rate the unit with a square for how you like it, then a unit with a different symbol and so on.
Statistical analysis looks how likely the individual and overall preference ratings were to be random or if a preference for an individual is likely to relate to the actual physical differences.
Ultimately my objective with my tests was never to prove the existence or absence of a difference, but to determine what kind of "sound" was found appealing by listeners, in order to make products that appealed to a market segment we wanted to have sales in.
And yes, gender, age and cultural background actually have marked correlations as well, though my sample size for that was rather more limited.
As said, it's a subject that is well off topic for this thread and again, possibly for this venue. I greatly appreciate the work Sean Olive & Todd Welti are doing at Harman regarding frequency response preferences.
But sadly the same approach is not extended to even perceptual coding to improve it. About the only real example I am aware of is MP3 where "JJ" (who was influential to my approach) applied a similar system, which I guess is why MP3 passed the test of time. It is not transparent, but where it fails it does so sounding "good", so much that some listeners prefer 128k VBR MP3 to the CD source, as sounding better.
Anyway, I will leave it here.
Thor