• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Master Preference Ratings for Loudspeakers

OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
4,244
Likes
11,487
Location
Land O’ Lakes, FL
You could put bluetooth speakers into a 'Wireless' category with the color purple.
That could be if Amir ever measured a portable Bluetooth speaker like those JBL Charge ones.

For things like the KEF LS50W, don‘t know if it should be like “Active (Digital)”, “All-in-One”, etc.
 
Last edited:

flipflop

Addicted to Fun and Learning
Joined
Feb 22, 2018
Messages
927
Likes
1,240
Since you can now calculate the post-EQ scores, would you be interested in doing it for every single speaker if I provided you with the same type of EQ corrections that were posted in the Vantoo T0 thread?
 
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
4,244
Likes
11,487
Location
Land O’ Lakes, FL
Since you can now calculate the post-EQ scores, would you be interested in doing it for every single speaker if I provided you with the same type of EQ corrections that were posted in the Vantoo T0 thread?
Nah man, that’s too much work (all the data points are different and different points per octave and thus requires manually matching up the points which are close enough and deleting the excess).
 

edechamps

Addicted to Fun and Learning
Forum Donor
Joined
Nov 21, 2018
Messages
910
Likes
3,620
Location
London, United Kingdom
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:

The only way to properly estimate whether two particular speakers would be 'equally' preferred would be to see how many standard deviations (represented by the Greek letter σ, sigma) their scores are from each other. But you would still need to specify a confidence interval, so if the score difference is:

≤ 0.8 (1σ) => 68% confidence higher score preferred
≤ 1.6 (2σ) => 95% confidence higher score preferred
≤ 2 (2.5σ) => 98.8% confidence higher score preferred
≤ 2.4 (3σ) => 99.7% confidence higher score preferred
≤ 4 (5σ) => 99.9999% confidence higher score preferred

My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.
 
Last edited:

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,700
I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

This is the interpretation that makes the most sense in my mind.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,700
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:



My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.

This is something I've been wanting to understand. Thank you for this very clear and easy to follow explanation.
 

Sancus

Major Contributor
Forum Donor
Joined
Nov 30, 2018
Messages
2,923
Likes
7,616
Location
Canada
I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Yeah I dunno what's up with that graph vs the description in the paper. The paper also says 70 points, and yet there's 75 on the graph. The points in the image ARE rather large AND overlap, so I tried to use the centers as much as possible, but that does introduce some error. Not enough to drive it up as much as you say, I don't think. If Olive was actually responding to queries about this paper I'd ask him about these discrepancies, for sure. Sadly...
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,376
Likes
234,555
Location
Seattle Area
@amirm , in case you were curious.
I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.

I purchased that ebay inifinity. It had a $60 shipping cost which hurt. But we need to know so it will be here around June 23rd.
 

napilopez

Major Contributor
Forum Donor
Joined
Oct 17, 2018
Messages
2,111
Likes
8,448
Location
NYC
I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.

I purchased that ebay inifinity. It had a $60 shipping cost which hurt. But we need to know so it will be here around June 23rd.

Looking forward to seeing that review! I've been wanting to buy an old infinity because all of the IL series seem to have such great spins.
 

napilopez

Major Contributor
Forum Donor
Joined
Oct 17, 2018
Messages
2,111
Likes
8,448
Location
NYC
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:



My goal here is to double-check these results.

I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.

Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.

I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".

Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.

Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:

Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred

This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)

Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.

Thanks for this @edechamps. I do not know nearly enough to make a contribution here, but your reasoning makes sense to me. Unfortunately, I wish it didn't because those preference estimations seem too strong for my liking.
 

Davedaring

Member
Joined
Apr 4, 2020
Messages
17
Likes
5
Off topic- where do we ask to update the review index? Last speaker review was May 12....
 

edechamps

Addicted to Fun and Learning
Forum Donor
Joined
Nov 21, 2018
Messages
910
Likes
3,620
Location
London, United Kingdom
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear.

And here we go:

ratings.png


  • The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
  • The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.
A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:
  • If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
  • If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

matrix.png


Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.
 
Last edited:

Robbo99999

Master Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
6,878
Likes
6,674
Location
UK
And here we go:

View attachment 70050

  • The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
  • The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.
A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:
  • If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
  • If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

View attachment 70051

Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.
I think that matrix chart is the most elucidating, it's a little intimidating on first glance but I realised after about 1 min of looking at it how it worked & what it was showing, and I've not really been following this thread either. That's quite a powerful chart.
 

MediumRare

Major Contributor
Forum Donor
Joined
Sep 17, 2019
Messages
1,949
Likes
2,275
Location
Chicago
I'm missing a lot of data for the 3020i as it was measured when I performed a lot fewer measurements overall(not even complete front hemisphere vertical data), luckily the front hemisphere has the biggest impact on the shape of the curves, so using what I have and faking the rest, I get a: 3.9/6.8/7.

Edit: I did the concept 20 as I was curious because it basically looks like a better 3020i, and indeed it got a 4/7/7.2
I just sent a pair of 3020i to Amir for testing. :p
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,700
And here we go:

View attachment 70050

  • The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
  • The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.
A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:
  • If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
  • If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.

I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.

From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…

View attachment 70051

Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.

That's awesome. Definitely not overboard.
 
Top Bottom