Sure, there could probably add this test first, to disqualify those who can't differentiate them, no problem with that, as long as it isn't used to conclude that speakers are undistinguishable, which is the shortcut that some would be tempted to do, since it's a flawed test anyway and not giving us data that conclude anything. It would just conclude that bias exists. I already know that. Now it wouldn't be a "simpler" test, at all, because at least the speakers would need to be positioned at the same position if the goal is to have listeners mistake one for the other. And if I'm correct, in ABX, the speed at which you switch matter no? If you have to redo the setup each time, you don't remember how the other one sounded. So I'm not even sure Bias could be concluded, you can't say they can't differentiate them if you've listen the other one 2 minutes ago. So a more complicated test, that don't give us any more conclusion.A
Yes, I completely understand the fun of this test. Should it make the front page? If you read your comment and really understand what "bias" is you will understand why without a control you have a lot of opportunity to create an exercise in expectation and confirmation bias. Entire multimillion dollar tests are often brought into question due to this sort of leaning toward a result.
The fact that the speakers corelate with the Harman score means maybe nothing without an independent variable/control and some proof that the listeners actually can pass a placebo test. Interestingly in some ways the speaker do not correlate with the Harman score depending on how you look at it. (such as the OSD speaker)
Anyway, not trying to poo-poo fun making. I do fun testing all the time, however all my comparison testing is rooted in subjective decisions due to my testing limitations and I am suggesting here that this test is more subjective than objective. They are 2 ends of the same stick and the best tests still must have subjective aspects to actually exist so having subjectivity influence the test is normal. I do think this test has quite a bit more subjective weight than idealized and it would be possible to address much of that. "tricking" the listener is one of the most common methods used. I mean literally every single pharmaceutical test uses a placebo. Maybe the word trick, triggers you. That trick is simply really important. You could use a reference instead/additional - such a a fulcrum speaker (which why the ABX is good model, it gets harder with more than two speakers though - think 3 body problem)
Close but not really. That would mean no significant data to confirm the superiority of one over the other, not mean anything about the difference, only ABX can do that.Statistical analysis of this experiment in one easily digestible graph:
You can compare two speakers by checking for overlap between the red arrows.
Technical details: the dots show the estimated marginal means obtained from the model I detailed earlier in this thread, score by speaker, adjusted for song (fixed effect) and listener (random effect). Arrows show Bonferroni-adjusted intervals.
- If the arrows overlap: no significant difference between two speakers.
- If they don't overlap: statistically significant difference between two speakers.
Awesome! I would just suggest to rephrase as "no significant difference between the rating given by the listeners" as if I understand it well, the listeners were asked to rate the sound, not if they could hear a difference.Statistical analysis of this experiment in one easily digestible graph:
View attachment 147840
You can compare two speakers by checking for overlap between the red arrows.
Technical details: the dots show the estimated marginal means obtained from the model I detailed earlier in this thread, score by speaker, adjusted for song (fixed effect) and listener (random effect). Arrows show Bonferroni-adjusted intervals.
- If the arrows overlap: no significant difference between two speakers.
- If they don't overlap: statistically significant difference between two speakers.
It actually does show that significantly different scores were assigned to certain speakers, after controlling for song and listener. I did not test for superiority, nor for preference.Close but not really. That would mean no significant data to confirm the superiority of one over the other, not mean anything about the difference, only ABX can do that.
good point - I've clarified the wording.Awesome! I would just suggest to rephrase as "no significant difference between the rating given by the listeners" as if I understand it well, the listeners were asked to rate the sound, not if they could hear a difference.
I know the first implies the second, but the second does need to imply the first (=two speakers can sound very different but the listener likes both the same and gives the same points).
Just a minimal semantics detail, but the difference is significant
Not sure what you mean, ok significantly "different scores" But subjective superiority , or preference is what is being scored!It actually does show that significantly different scores were assigned to certain speakers, after controlling for song and listener. I did not test for superiority, nor for preference.
As an example, take the estimated marginal mean score for OSD and compare that with the estimated marginal mean score for the Revel. Conditional on this model an "average" listener (i.e. someone similar to the listeners in this panel) would assign the Revel a score that is about 1.4 points higher than the score they assigned to the OSD, if the same song was being played. That difference is statistically significant, even after correcting for multiple testing.Not sure what you mean, ok significantly "different scores" But subjective superiority , or preference is what is being scored!
I did not suggest otherwise, I was debating the term "difference in speakers" that was ambiguous.As an example, take the estimated marginal mean score for OSD and compare that with the estimated marginal mean score for the Revel. Conditional on this model an "average" listener (i.e. someone similar to the listeners in this panel) would assign the Revel a score that is about 1.4 points higher than the score they assigned to the OSD, if the same song was being played. That difference is statistically significant, even after correcting for multiple testing.
I'll let the statistician talk too, and they can correct me if I'm wrong, I'm an engineer that did just bit of that, but I can already tell you you can't have these numbers, the "population" is not significant.Looks like we have some skilled statisticians here. Would be nice to have some intuitive measures. Based on these results, given the same setup, given that we have heard none of them before, etc.
For example, for each of the speakers, what's the chance that I would rate the speaker as the best of the 4? In other words, what % of people would choose the KEF, Revel, JBL, OSD respectively, based on their own listening only?
Not my world.You forget the overriding "subjectivity test" because at the end of the day, after weeks of research and listening tests, we end up with a speaker short list we are so proud of only to hear our significant other simply say "they're all ugly, can you find something prettier?" Best to start with a list of pretty speakers, let the real decision maker select the top 3, then among those we apply our research and listening skills.
ABX is where X is also either A or B. You are looking to see if the listeners can accurately chose if X matches A or B. Speed is a factor but really not an issue if slow, in fact you can allow the listener to have control over the speed and the volume in the test. ABX is not AB.Sure, there could probably add this test first, to disqualify those who can't differentiate them, no problem with that, as long as it isn't used to conclude that speakers are undistinguishable, which is the shortcut that some would be tempted to do, since it's a flawed test anyway and not giving us data that conclude anything. It would just conclude that bias exists. I already know that. Now it wouldn't be a "simpler" test, at all, because at least the speakers would need to be positioned at the same position if the goal is to have listeners mistake one for the other. And if I'm correct, in ABX, the speed at which you switch matter no? If you have to redo the setup each time, you don't remember how the other one sounded. So I'm not even sure Bias could be concluded, you can't say they can't differentiate them if you've listen the other one 2 minutes ago. So a more complicated test, that don't give us any more conclusion.
Did the listeners know what the selection of speakers under test was beforehand?