That could be if Amir ever measured a portable Bluetooth speaker like those JBL Charge ones.You could put bluetooth speakers into a 'Wireless' category with the color purple.
Nah man, that’s too much work (all the data points are different and different points per octave and thus requires manually matching up the points which are close enough and deleting the excess).Since you can now calculate the post-EQ scores, would you be interested in doing it for every single speaker if I provided you with the same type of EQ corrections that were posted in the Vantoo T0 thread?
The only way to properly estimate whether two particular speakers would be 'equally' preferred would be to see how many standard deviations (represented by the Greek letter σ, sigma) their scores are from each other. But you would still need to specify a confidence interval, so if the score difference is:
≤ 0.8 (1σ) => 68% confidence higher score preferred
≤ 1.6 (2σ) => 95% confidence higher score preferred
≤ 2 (2.5σ) => 98.8% confidence higher score preferred
≤ 2.4 (3σ) => 99.7% confidence higher score preferred
≤ 4 (5σ) => 99.9999% confidence higher score preferred
I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:
My goal here is to double-check these results.
I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.
Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.
I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".
Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.
Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:
Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred
This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)
Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.
I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.
I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.@amirm , in case you were curious.
I am most curious! I kept wanting to sign up for consumer reports online to get the listing but forget.
I purchased that ebay inifinity. It had a $60 shipping cost which hurt. But we need to know so it will be here around June 23rd.
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear. For reference, @bobbooo pointed out some time ago that:
My goal here is to double-check these results.
I looked at the actual-versus-predicated ratings data that @Sancus digitized. I did a maximum likelihood estimation of the residuals against a normal distribution and the results indicate a good fit with a standard deviation of 0.889 - a bit higher than the 0.8 Olive claims in his paper. A straight STDEV() on @Sancus's data itself returns an even higher number: 0.895 (due to outliers I presume). Maybe these differences are down to random error when @Sancus digitized the figure, or maybe Olive used a different fitting process, I don't know. (Cynics will undoubtedly posit that Olive deliberately rounded down to make the number look better…) Still, 0.8 is close enough I guess, and that's what I'll use in the rest of this post.
Next, I'm trying to understand this "confidence" thing. What @bobbooo described is the classic 68–95–99.7 rule, but the thing is, I'm not convinced it's been applied correctly in this case. Indeed, the calculation describes the confidence that the actual score falls under 1σ (or 2σ, or 3σ, etc.) of the predicted score. But that's not what we care about, do we? When we compare, say, speaker A with a predicted score of 5.0 and speaker B with a score of 5.8, we are trying to estimate "how likely is it that the average listener will give B a higher score than A", not "how likely is it that the average listener will give speaker A a score between 4.2 and 5.8 (or, equivalently, speaker B a score between 5.0 and 6.6)". My understanding is that the 68% confidence rule can be used to compute the latter, but I'm really not sure about the former.
I think maybe the correct question to ask is "given two normal distributions with standard deviation σ=0.8, one with a mean of 5.0 and the other a mean of 5.8, what is the probability that a number picked from the latter distribution will be higher than a number picked from the former distribution".
Okay, let's formalize this. Let X be a random variable representing the actual rating of the lower predicted rating speaker. Let Y be a random variable representing the actual rating of the higher predicted rating speaker. We want to know the probability that Y>X, or, in other words, that Y-X (let's call that Z) is above zero. Fortunately the sum of two normally distributed random variables is a simple case, and we deduce that Z is also normal with a mean of mean(Y)-mean(X) and identical standard deviation (σ=0.8). In our example, the mean of Z is 5.8 - 5.0 = 0.8. According to an online calculator, P(Z>0) is… 84%.
Thus we conclude that, when a speaker has a predicted rating that is 0.8 higher than another speaker, the probability that a typical listener will prefer the former speaker is 84%, not 68% as @bobbooo calculated. Using the online calculator I just linked we can compute the probability for a bunch of other score differences:
Score equal (mean(Z)=0) → 50% chance to be preferred (which is obvious of course, but that serves as nice self-validation)
Score higher by 0.25 → 62% chance to be preferred
Score higher by 0.5 → 73% chance to be preferred
Score higher by 1.0 → 89% chance to be preferred
Score higher by 1.5 → 97% chance to be preferred
Score higher by 2.0 → 99% chance to be preferred
Score higher by 3.0 → 99.99% to be preferred
This leads me to conclude that, statistically, small score differences are more significant than we first assumed. (Though once we go above a ~1.5 score difference @bobbooo's original numbers seem to roughly converge with mine.)
Disclaimer: I am not a statistician (far from it), so it's quite possible my reasoning is wrong. Please do shout if you think something seems off in my reasoning.
Off topic- where do we ask to update the review index? Last speaker review was May 12....
Thank you!The Speaker Review index is now current up to the Focal Aria 906's. Sorry for the delay.
I am currently in the process of investigating the notion of "confidence intervals", with the vague goal of plotting Preference Ratings as a box plot to make the uncertainty clear.
I think that matrix chart is the most elucidating, it's a little intimidating on first glance but I realised after about 1 min of looking at it how it worked & what it was showing, and I've not really been following this thread either. That's quite a powerful chart.And here we go:
View attachment 70050
A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:
- The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
- The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.
- If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
- If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.
I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.
From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…
View attachment 70051
Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.
I just sent a pair of 3020i to Amir for testing.I'm missing a lot of data for the 3020i as it was measured when I performed a lot fewer measurements overall(not even complete front hemisphere vertical data), luckily the front hemisphere has the biggest impact on the shape of the curves, so using what I have and faking the rest, I get a: 3.9/6.8/7.
Edit: I did the concept 20 as I was curious because it basically looks like a better 3020i, and indeed it got a 4/7/7.2
I just sent a pair of 3020i to Amir for testing.
And here we go:
View attachment 70050
A mathematically equivalent, and perhaps more useful, way of looking at the above plot is the following:
- The box (±0.5) indicates the 50% prediction interval of the rating. In other words: there is a 50% chance that the average listener will give this speaker a rating that falls within the box.
- The lines (whiskers) (±0.9) are similar to boxes but with a 75% interval.
- If boxes barely overlap (predicted score differs by 1.0), there is a 91% chance that the average listener will prefer the higher-rated speaker.
- If lines (whiskers) barely overlap (predicted score differs by 1.8), the probability is 99%.
I think this chart makes it quite clear that many speakers find themselves in the same "category", meaning that the model can't really tell which one would be preferred.
From there we can use the same principles to generate a matrix chart to compare every possible pair of speakers. I might have gone a bit overboard on that one…
View attachment 70051
Both charts are live over at Loudspeaker Explorer, in the Preference Ratings section at the very bottom.