Master Preference Ratings for Loudspeakers

flipflop · May 30, 2020

You could put bluetooth speakers into a 'Wireless' category with the color purple.

MZKM · May 30, 2020

flipflop said:
You could put bluetooth speakers into a 'Wireless' category with the color purple.

That could be if Amir ever measured a portable Bluetooth speaker like those JBL Charge ones.

For things like the KEF LS50W, don‘t know if it should be like “Active (Digital)”, “All-in-One”, etc.

flipflop · Jun 1, 2020

Since you can now calculate the post-EQ scores, would you be interested in doing it for every single speaker if I provided you with the same type of EQ corrections that were posted in the Vantoo T0 thread?

MZKM · Jun 1, 2020

flipflop said:
Since you can now calculate the post-EQ scores, would you be interested in doing it for every single speaker if I provided you with the same type of EQ corrections that were posted in the Vantoo T0 thread?

Nah man, that’s too much work (all the data points are different and different points per octave and thus requires manually matching up the points which are close enough and deleting the excess).

edechamps · Jun 14, 2020

I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:

bobbooo said:
The only way to properly estimate whether two particular speakers would be 'equally' preferred would be to see how many standard deviations (represented by the Greek letter σ, sigma) their scores are from each other. But you would still need to specify a confidence interval, so if the score difference is:

≤ 0.8 (1σ) => 68% confidence higher score preferred
≤ 1.6 (2σ) => 95% confidence higher score preferred
≤ 2 (2.5σ) => 98.8% confidence higher score preferred
≤ 2.4 (3σ) => 99.7% confidence higher score preferred
≤ 4 (5σ) => 99.9999% confidence higher score preferred

My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.

richard12511 · Jun 14, 2020

edechamps said:
I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

This is the interpretation that makes the most sense in my mind.

richard12511 · Jun 14, 2020

edechamps said:
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:

My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.

This is something I've been wanting to understand. Thank you for this very clear and easy to follow explanation.

Sancus · Jun 14, 2020

edechamps said:
I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Yeah I dunno what's up with that graph vs the description in the paper. The paper also says 70 points, and yet there's 75 on the graph. The points in the image ARE rather large AND overlap, so I tried to use the centers as much as possible, but that does introduce some error. Not enough to drive it up as much as you say, I don't think. If Olive was actually responding to queries about this paper I'd ask him about these discrepancies, for sure. Sadly...

amirm · Jun 14, 2020

napilopez said:
@amirm , in case you were curious.

I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.

I purchased that ebay inifinity. It had a $60 shipping cost which hurt. But we need to know so it will be here around June 23rd.

napilopez · Jun 14, 2020

amirm said:
I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.

I purchased that ebay inifinity. It had a $60 shipping cost which hurt. But we need to know so it will be here around June 23rd.

Looking forward to seeing that review! I've been wanting to buy an old infinity because all of the IL series seem to have such great spins.

napilopez · Jun 15, 2020

edechamps said:
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:

My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.

Thanks for this @edechamps. I do not know nearly enough to make a contribution here, but your reasoning makes sense to me. Unfortunately, I wish it didn't because those preference estimations seem too strong for my liking.

Davedaring · Jun 17, 2020

Off topic- where do we ask to update the review index? Last speaker review was May 12....

MZKM · Jun 17, 2020

Davedaring said:
Off topic- where do we ask to update the review index? Last speaker review was May 12....

Ask Pozz.

RickSanchez · Jun 18, 2020

Davedaring said:
Off topic- where do we ask to update the review index? Last speaker review was May 12....

The Speaker Review index is now current up to the Focal Aria 906's. Sorry for the delay.

Davedaring · Jun 18, 2020

RickSanchez said:
The Speaker Review index is now current up to the Focal Aria 906's. Sorry for the delay.

Thank you!

edechamps · Jun 21, 2020

edechamps said:
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear.

And here we go:

The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.

A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:

If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.

Robbo99999 · Jun 23, 2020

edechamps said:
And here we go:

View attachment 70050

The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.

The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.

A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:

If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.

If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

View attachment 70051

Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.

I think that matrix chart is the most elucidating, it's a little intimidating on first glance but I realised after about 1 min of looking at it how it worked & what it was showing, and I've not really been following this thread either. That's quite a powerful chart.

MediumRare · Jun 24, 2020

napilopez said:
I'm missing a lot of data for the 3020i as it was measured when I performed a lot fewer measurements overall(not even complete front hemisphere vertical data), luckily the front hemisphere has the biggest impact on the shape of the curves, so using what I have and faking the rest, I get a: 3.9/6.8/7.

Edit: I did the concept 20 as I was curious because it basically looks like a better 3020i, and indeed it got a 4/7/7.2

I just sent a pair of 3020i to Amir for testing.

napilopez · Jun 24, 2020

MediumRare said:
I just sent a pair of 3020i to Amir for testing.

Nice! Looking forward to it.

richard12511 · Jun 24, 2020

edechamps said:
And here we go:

View attachment 70050

The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.

The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.

A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:

If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.

If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

View attachment 70051

Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.

That's awesome. Definitely not overboard.

Master Preference Ratings for Loudspeakers

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Major Contributor

Major Contributor

Founder/Admin

Major Contributor

Major Contributor

Member

Major Contributor

Major Contributor

Member

Addicted to Fun and Learning

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Similar threads