Random musings on The Preference Ratings of speaker reviews

echopraxia · Sep 23, 2020

Blumlein 88 said:
Yes, but if I add a sub to the 305 it can play loud enough, and does it really become as satisfying as the little Genelec alone? Does the sub really turn a mid-4 into a high 6? I think it might but only with plenty of caveats and comparing more apples rather than apples and oranges.

This is always going to seem to be a paradox when you compare apples to oranges (sub score vs no sub score), because the reality is speaker preference is a multi-dimensional quantity (e.g. bass extension, bass quality, mids quality, treble quality, spatial quality -- or something like that) being boiled down into a one-dimensional quantity here for the sake of generating a single final score.

There are many good reasons to do this: It allows items to be straightforwardly sorted (which is very convenient), it allows one winner out of any two speakers to be chosen objectively, etc. But it will always cause confusion about ranking when we poke into specific details that betray the fact that the true space being modeled here is in fact multidimensional. Humans definitely can perceive different qualities of a speaker separately (even if not separated perfectly crisply or without error). To illustrate this, consider the fact that we sometimes (1) struggle to choose a favorite among a set of two speakers (e.g. when each have different strengths and weaknesses), and (2) those two speakers are very audibly different from one another. Together, these two points can form a logical (informal) proof that the perception going on here is at least fractionally more than 1-dimensional.

The solution to that confusion is to realize that when we project multidimensional points into 1D space, two resulting values that are similar or the same does not always mean that the corresponding original N-dimensional points are the same.

To put it into speaker preference terms, it means this: If two speakers score 5.0 exactly, it does not mean both those speakers will sound the same. They each could have completely different strengths and weaknesses, which when weighted and combined result in the same overall preference score. And if that preference score is good enough, maybe you'll even find yourself unable to choose a winning speaker; imagine a situation where one speaker has incredible treble but annoying bass issues, and another has incredible bass but annoying treble issues. Which speaker do you choose? Maybe they're in some sense "equal" overall, even though each speaker is fundamentally different from the other in this example.

MZKM · Sep 23, 2020

Blumlein 88 said:
Yes, the 20 hz assumption looks to make more sense.

I don't disagree. However, then the max theoretical score is ~9.4 instead of 10 (even though @edechamps pointed out even mine wont realistically get a 10 as you can't simultaneously get perfect scores for both NBD_PIR & SM_PIR).

So, since it's not out of 10, it is a bit hard to quickly conceptualize how close to perfect a rating is.

What I could do is turn these ratings into %. I currently do this for my own purposes, but I don't do it on the score, I do it on each score component; I am pretty sure it keeps the rankings the same.
For instance, using the Dynaudio LYD 5
SCORE: 5.7
SCORE w/ sub: 8.3

% Score: 82%
% Score w/ sub: 90%

Screen Shot 2020-09-23 at 8.10.07 AM.png

Blumlein 88 · Sep 23, 2020

MZKM said:
I don't disagree. However, then the max theoretical score is ~9.4 instead of 10 (even though @edechamps pointed out even mine wont realistically get a 10 as you can't simultaneously get perfect scores for both NBD_PIR & SM_PIR).

So, since it's not out of 10, it is a bit hard to quickly conceptualize how close to perfect a rating is.

What I could do is turn these ratings into %. I currently do this for my own purposes, but I don't do it on the score, I do it on each score component; I am pretty sure it keeps the rankings the same.
For instance, using the Dynaudio LYD 5
SCORE: 5.7
SCORE w/ sub: 8.3

% Score: 82%
% Score w/ sub: 90%

View attachment 84434

Might as well leave it like it is for scale of 10. Echopraxia has explained simply how a single number isn't good enough to scale a multi-dimensional issue. It also explains why it would work better or seem to at the higher end of the scale. There is less variability to get that high score and therefore more similarity even in the various things that go into the rating. And even then yes a pair of speakers scoring an 8 probably still aren't going to sound the same. I suppose in theory if the output capability is the same with low distortion a pair of speakers scoring a perfect 10 should in fact sound the same as they'd have the same output sound field.

waynel · Sep 23, 2020

MZKM said:
I don't disagree. However, then the max theoretical score is ~9.4 instead of 10 (even though @edechamps pointed out even mine wont realistically get a 10 as you can't simultaneously get perfect scores for both NBD_PIR & SM_PIR).

So, since it's not out of 10, it is a bit hard to quickly conceptualize how close to perfect a rating is.

What I could do is turn these ratings into %. I currently do this for my own purposes, but I don't do it on the score, I do it on each score component; I am pretty sure it keeps the rankings the same.
For instance, using the Dynaudio LYD 5
SCORE: 5.7
SCORE w/ sub: 8.3

% Score: 82%
% Score w/ sub: 90%

View attachment 84434

Out of curiosity what would the spinorama look like for a speaker with a preference score of 10? would there only be a unique solution or could there be a range of spinoramas? how wide would the dispersion be?

edechamps · Sep 23, 2020

waynel said:
Out of curiosity what would the spinorama look like for a speaker with a preference score of 10? would there only be a unique solution or could there be a range of spinoramas? how wide would the dispersion be?

You can't get to 10, but you can get to 9.9999… by having a 15 Hz -6 dB point, a perfectly flat, horizontal on-axis response, and a perfectly flat PIR with infinitesimal slope. (It is unlikely that this has any meaning in reality. The model tends to break down when pushed to extremes because it wasn't trained on extremes.)

There are multiple solutions because the model doesn't care about the sign of the PIR slope. (Which doesn't make sense either.)

waynel · Sep 23, 2020

edechamps said:
You can't get to 10, but you can get to 9.9999… by having a 15 Hz -6 dB point, a perfectly flat, horizontal on-axis response, and a perfectly flat PIR with infinitesimal slope. (It is unlikely that this has any meaning in reality. The model tends to break down when pushed to extremes because it wasn't trained on extremes.)

There are multiple solutions because the model doesn't care about the sign of the PIR slope. (Which doesn't make sense either.)

Does the model give higher preference scores to wider dispersion vs narrow? Does it say anything about a preferred dispersion width?

Thanks
Wayne

Webninja · Sep 23, 2020

Good thread, and to me the big takeaway is that “with sub” is a serious sub, as I was imagining a decent 12” would cover it.

What subs in the market would qualify for the “with sub”? Because that would help with the money side of the equation when comparing with and without sub.

edechamps · Sep 23, 2020

waynel said:
Does the model give higher preference scores to wider dispersion vs narrow? Does it say anything about a preferred dispersion width?

It's complicated. The model both penalizes (NBD_PIR) and rewards (SM_PIR) higher PIR slopes (PIR slope can be thought as a proxy for dispersion width), and the interaction between the two is hard to reason about. This unfortunate state of affairs is likely accidental and might be a result of @Sean Olive misusing r² for his definition of SM.

MZKM · Sep 23, 2020

waynel said:
Does the model give higher preference scores to wider dispersion vs narrow? Does it say anything about a preferred dispersion width?

Thanks
Wayne

There is a target slope for the curves, but that was attained from averaging the better scorers. Olive even states in the paper that the target slope changes with the speaker's dispersion (a 2-way likely will have a steeper PIR slope than a 4-way). However, the SM score component does favor narrow directivity, and if a speaker has ultra wide dispersion, it scores very low. However, when talking about ultra wide dispersion speakers, I think I recall reading that Toole/Olive stated they don't sound as good as a normal wide dispersion speaker, so it's not bad to penalize them; but this means it also rewards narrow directivity speakers, like some of the KEF models.

Blumlein 88 · Sep 23, 2020

edechamps said:
It's complicated. The model both penalizes (NBD_PIR) and rewards (SM_PIR) higher PIR slopes (PIR slope can be thought as a proxy for dispersion width), and the interaction between the two is hard to reason about. This unfortunate state of affairs is likely accidental and might be a result of @Sean Olive misusing r² for his definition of SM.

It struck me that using 1/2 octaves across the whole bandwidth was probably not optimal. It may have worked fine for the data available. It seems the way the ERB changes with frequency that using deviations from the desired response would incorporate ERB into a better more predictive version of the formula. Without that data and some additional testing however that is just a guess on my part.

mhardy6647 · Sep 23, 2020

restorer-john said:
A 4" or 5" wooger isn't going to get the job done, let's be honest...

Oy, that takes me back!

oldsysop · Sep 23, 2020

restorer-john said:
By the time you are using a 15" in a three way, you'll just look at those toy speakers and giggle...

+1

MZKM · Sep 23, 2020

Webninja said:
What subs in the market would qualify for the “with sub”?

A Rythmik L12 (<$600) is -6dB @ 12Hz.

And for those without EQ/DSP, it has a built-in single band PEQ adjustment to tackle a room mode, which is nice.

restorer-john · Sep 23, 2020

mhardy6647 said:
Oy, that takes me back!

Yep, back in the day on AK.

Blumlein 88 · Sep 23, 2020

MZKM said:
A Rythmik L12 (<$600) is -6dB @ 12Hz.

And for those without EQ/DSP, it has a built-in single band PEQ adjustment to tackle a room mode, which is nice.

The LV12 for the same money might have a touch more extension and does have flatter and more extended upper end. Useful to 300 hz. Also with PEQ for one band. The LV12 would be the choice for an LRS. It is a larger cabinet and front ported.

I need a few. Then I have to decide whether to PEQ my room at its 17 hz mode, or 34 hz or 51 hz.

HooStat · Sep 24, 2020

MZKM said:
I don't disagree. However, then the max theoretical score is ~9.4 instead of 10 (even though @edechamps pointed out even mine wont realistically get a 10 as you can't simultaneously get perfect scores for both NBD_PIR & SM_PIR).

The model under-predicts for the best speakers. The model is not forced through 0,0 or 10,10, and it isn't allowed any curvature. So, as the predicted scores get higher, it is more likely that the actual preference score (if it could be known) is even higher than the predicted one.

MZKM · Sep 24, 2020

Blumlein 88 said:
The LV12 for the same money might have a touch more extension and does have flatter and more extended upper end. Useful to 300 hz. Also with PEQ for one band. The LV12 would be the choice for an LRS. It is a larger cabinet and front ported.

I need a few. Then I have to decide whether to PEQ my room at its 17 hz mode, or 34 hz or 51 hz.

I doubt you actually want to crossover that high.

With EQ, you could get the response to be decent (bring down 300Hz-400Hz) and crossover at 50Hz.

The dip in the lower/mid treble can be in part attributed to the measurement being near-field (Amir showed the simulations when <3m away, and it also has a dip), but it would still likely need raised a bit.

However, this will reduce max SPL, which already isn’t great.

Sancus · Sep 24, 2020

MZKM said:
I doubt you actually want to crossover that high.

With EQ, you could get the response to be decent (bring down 300Hz-400Hz) and crossover at 50Hz.

The dip in the lower/mid treble can be in part attributed to the measurement being near-field (Amir showed the simulations when <3m away, and it also has a dip), but it would still likely need raised a bit.

However, this will reduce max SPL, which already isn’t great.

Isn't that measurement on the wrong tweeter axis anyways? He said he did the distortion measurements before fixing it.

MZKM · Sep 24, 2020

Sancus said:
Isn't that measurement on the wrong tweeter axis anyways? He said he did the distortion measurements before fixing it.

Right, but the main difference with that was a reduction in treble. So, the EQ needed in the treble is less that what is shown in that photo, but some EQ would still be recommended.

richard12511 · Sep 24, 2020

Blumlein 88 said:
While looking at the Preference ratings that @MZKM keeps up to date for us I noticed a few things that stuck in my mind.
https://sites.google.com/view/speakerdata/preference-ratings-graphs

DISCLAIMER: yes I know we have discussed the foibles of this rating formula and that it can't be a one stop quality number for choosing speakers. Also all my ratings in this post are your regular anecdotal sighted listening long term ownership ratings of a purely subjective nature.

Since I have a couple JBL LSR305's I noticed the rating for them with a sub is better than any stand alone speaker tested thus far except for the top spot Genelec 8341a. So does this really seem likely? I know why the sub addition boosts the preference rating as the low end counts for 30% of preference as perceived by listeners. Still that is quite a boost. It means for less than $700 I could eclipse all stand alone stereo pairs with one exception of those tested by ASR at this time. Or maybe I need a sub for each channel in which case it goes to $1000. I'm assuming use of the matching JBL LSR310 here. Maybe it requires a more capable sub to get full benefit. Also there are some choices not much more expensive than the 305s which would get you a rating with sub above the Genelecs.

I have some Revel F12's in a video system. They are clearly much preferred for music over the LSR305s. If the 305s warrant an acceptable rating of 65 on a scale of 100 then the F12 should get a 75% rating. I have an LSR310 sub. When paired with the 305s they in some ways begin to approach the F12s, but in other important ways never do. Maybe they get 72% rating. I rather doubt the F12 spin-o-rama would get a preference rating from the formula equal or nearly so of the Genelec 8341a's.

Also I can, and have paired the F12s with the LSR310 sub. It elevates the result of those too. Buy not by as much. Maybe half the gain. Still would I be getting near top of the heap results this way? I don't quite think so. Yet the preference formula would have us believing maybe it could.

And what to make of the recent LRS Maggie with a rating of -.25........?????? I'm one who thinks Amir's listening description is about right and his measurements surely are. Yet a negative .25 seems really down there. What would a 6x9 car speaker in a properly sized box or baffle score? Just for kicks maybe those of you up to date in the car world could suggest a good 6x9 car speaker we could screw to a trapezoidal shaped 1/2 inch plywood open baffle and see if it beats an LRS. Oh, and an LRS jumps to 5.01 with a sub? I expect a big jump with a sub on that speaker. But that seems like something is off in the low end ratings of the formula.

I am not sure what I hope to come from this thread. The Preference formula seems to work okay at 4 and above to me. Below that I'm doubting it considerably.

The CEA2034 spin graphs seem quite good at pointing out a good speaker from a poor one. Yet sometimes a great looking spin graph gets a lower Preference score than I expect. Something just isn't right about that preference formula though.

Adding subs definitely makes a huge difference, so the huge jump "w/sub" really doesn't surprise me. I don't have the 8341, and I don't have the 305p at home(stuck at my closed office), but I do have the 308p here. IMO, the 308p + 4 subs sounds better than any of my other (better) speakers on their own.

People often talk of being "blown away" by the sound of their new speakers they just purchased. "Night and day difference" from what they had before. The new speaker is in a "different league". Maybe it's just a difference in how I interpret those phrases, but I've never really had that experience when hearing a new set of speakers, with 2 exceptions. For me, with new speaker purchases, it's always been more like "ok yeah I guess that's slightly better", but nothing extraordinary. Maybe 5-10% better at best. As I mentioned, though, there are 2 exceptions. The first exception where I really heard what I would consider to be a "night and day difference" was when I upgraded from my TV's internal speakers to my first hifi system, which was an Infinity Beta 5.1 system. I really was "blown away" by the improvement in sound quality from that upgrade. The other exception was when I added 4 18" subs to my main system. Sometimes I like to engage "Pure Direct" just to remind myself how much better the system sounds with subs. So yeah adding well integrated subs is a huge improvement (imo). I'm not surprised it can catapult lesser speakers over much better speakers, and my experience leads me to believe it's correct to do so.

Random musings on The Preference Ratings of speaker reviews

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Senior Member

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Grand Contributor

Senior Member

Major Contributor

Grand Contributor

Grand Contributor

Addicted to Fun and Learning

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Similar threads