@Blumlein 88 and @amirm may have been a little careless with some word choices, but the thrust of what they are saying is that "relative" choices are generally preserved between groups. "Relative" means the rank is preserved, but not the absolute value or absolute difference. The P - I distinction deserves special consideration.

One can see from the graph the significant result:

P,I >* B >** M

* for 15/16 individual groups, and the combined group (p<0.0001)

** for all individual groups, and the combined group (p<0.0001)

Even though individual scores and differences between scores vary, the rank is well preserved.

The difference between P and I is less clear. Since the tabular data equivalent to the graph is not in the paper, one has to eyeball it. But it is clear that for 9/16 groups there

**is not**a statistically significant difference. For 4/16 groups there**is**clearly a statistically significant difference, with 3 preferring P and 1 I. For 3/16 the significance is unclear from the graph (since I don't trust pixel measurements). When all the groups are combined the mean difference between P and I is 0.336 (from the paper), with a significance of p=0.0214.Some will argue that is significant, but from my experience with this type of data, since the p=0.05 cutoff is arbitrary, other ways of viewing the data are relevant, and I would prefer further tests to draw a conclusion about P vs. I.

Cheers, SAM

I agree with you.

My interpretation of data presented here is as follows:

1. In absolute terms, people generally have different preference in Speaker.

2. Relative scoring of each speaker remains somewhat the same no matter which group is selected

3. All groups rated the speakers in the same order most of the time (but not always). <---this means that people generally speaking have the same preference but not always)

4. Trained listeners were more picky therefore the ratings are generally lower compared to other groups.