Oh okay. I thought there was some CSV file with the data for the plots somewhere that I am missing.
You mean 40 listeners are woefully small?
Is there any statistics based reason to think sample size is too low or is it your personal conviction that it is too small?
I am not an expert in statistics but when samples are not interrelated, 20-30 samples should give you enough data to calculate mean and control limits, and calculate STD Dev. In any case, I think two PhDs, would have the necessary statistics knowledge between them to ensure they have enough data points before they publish it, don't you think?
Or maybe you think 4 speakers are too small? What would change if there were 20?
As I understand it, the conclusion is that nobody is immune to the biases created by sighted listening, regardless of their "expertise" and training. Whether you think think you are immune to biases is also irrelevant. Maybe it was an extreme conclusion in 1994 but today, 20 years later, I'd say it is common sense.
The way I understand the graph is like this : when people can see the speakers they are listening, they rate it full a full rating point higher or lower than they would if they could not see it. Order does not change much indeed, except S is the worst rated speaker sighted and T is the worst unsighted, yet the ratings change significantly. People said D is 2 rating points better than S when they could see it, they think they are more or less the same when they cannot.
View attachment 327752