• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. We now measure and review equipment for free! Click here for details.

Master Preference Ratings for Loudspeakers

napilopez

Major Contributor
Joined
Oct 17, 2018
Messages
1,331
Likes
4,149
Location
NYC
Would it be easier for you if I gave the text files to you at 24 PPO? I already did so in my own tests. You can also do that very easily in REW from the export option.
 
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #402
Would it be easier for you if I gave the text files to you at 24 PPO? I already did so in my own tests. You can also do that very easily in REW from the export option.
I’m on my work computer so I don’t have REW. Have you seen what difference the score is with 1/24?
 
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #404
I've attached 24 ppo
Thanks, I just compared it on speaker 2, and it went from a 5.85 to a 5.81. Wondering why, it's likely because there are more measurements in between the samples, so the average deviation will decrease.
 

napilopez

Major Contributor
Joined
Oct 17, 2018
Messages
1,331
Likes
4,149
Location
NYC
AHA! I've got the list of Paper 1 speakers. Journalist skills coming in handy.

Here is the article from the August 2001 edition of Consumer Reports used to select the Paper 1 speakers:

s-l1600.jpg

Along with brief descriptions of each:
s-l1600 (1).jpg

To save you the time reading, by matching the 23 bookshelf speakers above to Table 1 in the paper, we can isolate the ones used in the study.

Here they are as ranked by Consumer Reports. In brackets is their 'Accuracy score' based on CR's flat sound power model, as well as CR's "Bass Handling" score. In the article, CR says the overall ranking is mostly based on the accuracy score, with some weight given to " the ability to play bass notes very loud without distortion."
  1. Pioneer S-DF3-K [89/Excellent]
  2. Bose 301 Series IV [89/Very Good]
  3. Cambridge Soundworks Model Six [88/Very Good]
  4. BIC America Venturi DV72si [90/Good]
  5. Infinity Entra One [87/Excellent]
  6. JBL Northridge Series N28 [85/Excellent]
  7. Polk Audio RT15i [88/Good]
  8. Yamaha NS-A638 [83/Excellent]
  9. Bose 141 [89/Very Good]
  10. JBL Studio Series S26 [82/Very Good]
  11. Infinity Interlude IL10 [76/Excellent]
  12. Klipsch Synergy SB-3 Monitor [76/Excellent]
  13. KLH 911B [79/Good]
And here is how they actually ranked in Olive's double-blind tests, along with images for a visual. In brackets is each speaker's previous ranking for quick reference.
  1. Infinity Interlude IL10 [11]
    x108iL10C-F.jpeg
  2. JBL Series S26 [10]
    h109S26.jpeg
  3. Infinity Entra One [5]
    x108ENTRA1B (1).jpeg
  4. Pioneer S-DF3-K [1]
    1684946-8b364ad4-pionner-sdf3k.jpg
  5. JBL Northridge Series N28 [10]
    415RG400XKL._AC_.jpg
  6. Klipsch Synergy SB-3 Monitor [12]
    unnamed.jpg
  7. Polk Audio RT15i [7]
    83287545_614.jpg
  8. Cambridge Soundworks Model Six [3]
    71Ju6KN0W6L._AC_SX425_.jpg
  9. BIC America Venturi DV62si [4]
    81WBAyogpHL._AC_SX425_.jpg
  10. Bose 301 Series IV [2]
    x018301iVB-F.jpeg
  11. Yamaha NS-A638 [8]
    DSC_0193.jpg
  12. Bose 141 [9]
    s-l600.jpg
  13. KLH 911B [13]
    kbthaqfimphzubeddzfx (1).jpg
    (Poor KLH can't catch a break)
So, as we already knew, they are mostly 2-way bookshelves. And unsurprisingly, most of the top performers are Harman speakers. But the one consistent thing among top performers is the use of what appear to be purposeful waveguides.

@amirm , in case you were curious.

P.S. I see a pair of Infinity IL10 for $79 a pair on eBay. Just sayin'
 
Last edited:

andreasmaaan

Major Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
4,802
Likes
4,868
I'm starting to wonder whether we should have two models: the original Olive model, for those who only trust the original double-blind peer-reviewed paper, don't like wild guesses, and/or are comparing speakers that are known to be similar to those in the Olive test (i.e. non-coaxial monopoles). And a separate "experimental" model that could be built from scratch based on what we know about perception of spinoramas (e.g. the importance of the DI curve, the relative unimportance of overall tilt), that might make more sense to use for non-standard speakers and would come with a fat warning that it is not directly backed by double-blind testing data and should therefore be taken with a huge grain of salt. The experimental model could be calibrated against the Olive model by aligning the scores for "standard" speakers. But even then, if we are asked to quantify how accurate that experimental model would be, our best answer would be ¯\_(ツ)_/¯
I think this is a great idea.

However, purporting to give an experimental "preference" model would be overstepping IMHO, given that we couldn't possibly derive it from valid preference data.

I would suggest instead that we simply give speakers a rating for each of:
  • ON deviation from flat
  • LFX
  • PIR (or ER) deviation from line of best fit (defined mathematically such that it is independent of slope, in contrast to the Olive paper).
Optionally, additional sub-ratings could be given for HER and VER.

A nonlinear distortion rating could also be added, although IMHO this would not be possible on the basis of the limited distortion measurements Amir currently performs.

Finally, if we wanted to be a bit more experimental about it (which I think we should be), we could apply an equal-loudness based weighting to each metric.

In other words, a deviation from flat in the ON at say 15kHz would be penalised less harshly than a deviation at say 3kHz. And so forth...

This weighting could be derived from existing data from psychoacoustic research.
 
Last edited:
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #408
I think this is a great idea.

However, purporting to give an experimental "preference" model would be overstepping IMHO, given that we couldn't possibly derive it from valid preference data.

I would suggest instead that we simply give speakers a rating for each of:
  • ON deviation from flat
  • LFX
  • PIR (or ER) deviation from line of best fit (defined mathematically such that it is independent of slope, in contrast to the Olive paper).
Optionally, additional sub-ratings could be given for HER and VER.

A nonlinear distortion rating could also be added, although IMHO this would not be possible on the basis of the limited distortion measurements Amir currently performs.

Finally, if we wanted to be a bit more experimental about it (which I think we should be), we could apply equal-loudness based weightings to each metric.

In other words, a deviation from flat in the ON at say 15kHz would be penalised less harshly than a deviation at say 3kHz. And so forth...

This weighting could be derived from existing data from psychoacoustic research.
I was thinking about experimenting with seeing how just using the % weighting for each parameter would look like compared to the formula weighting. The score range may not be the same, but I would assume the ranking would stay the same; if so, I could then experiment with normalizing the PIR and running the NBD score just on that, and having that take the combined % that the original NBD_PIR & SM_PIR take up.
 

andreasmaaan

Major Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
4,802
Likes
4,868
I was thinking about experimenting with seeing how just using the % weighting for each parameter would look like compared to the formula weighting. The score range may not be the same, but I would assume the ranking would stay the same; if so, I could then experiment with normalizing the PIR and running the NBD score just on that, and having that take the combined % that the original NBD_PIR & SM_PIR take up.
Normalising the PIR and running the NBD score on it sounds like a very elegant solution to me :)

I'm not sure I understood the first part about using the % weighting for each parameter as opposed to the formula weighting?
 
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #410
Normalising the PIR and running the NBD score on it sounds like a very elegant solution to me :)

I'm not sure I understood the first part about using the % weighting for each parameter as opposed to the formula weighting?
The formula is this:
1590545660018.png


the weighting is this:



You can’t just multiply the scores by these weights though (unless it’s a perfect score), as the numerical value wouldn‘t match, but I would assume the ranking would stay the same.

To give an example, you can curve a set of data, the numerical values will change, but the order won’t.
 

andreasmaaan

Major Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
4,802
Likes
4,868
The formula is this:
View attachment 65702

the weighting is this:



You can’t just multiply the scores by these weights though (unless it’s a perfect score), as the numerical value wouldn‘t match, but I would assume the ranking would stay the same.

To give an example, you can curve a set of data, the numerical values will change, but the order won’t.
You know, this is something I'd never understood in the paper. If you add up the 4 coefficients in the formula, you get 12.11, not 12.69. I can't work out why this would be the case?

Anyway, if it were up to me, I wouldn't be trying to tweak the Olive model. I think it is what it is, and without the raw data, we can't know whether tweaks made to it would increase or decrease correlation with listener preference.

Instead, I'd simply build a new model that doesn't purport to be a preference model, basically along the lines I suggested in post #407. An "ASR Performance Rating", if you will ;)
 
Last edited:
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #412
You know, this is something I'd never understood in the paper. If you add up the 4 coefficients in the formula, you get 12.11, not 12.69. I can't work out why this would be the case?

Anyway, if it were up to me, I wouldn't be trying to tweak the Olive model. I think it is what it is, and without the raw data, we can't know whether tweaks made to it would increase or decrease correlation with listener preference.

Instead, I'd simply build a new model that doesn't purport to be a preference model, basically along the lines I suggested in post #407.
The best score (to get a 10) for LFX isn’t a 1.0, that’s why.
 
Last edited:

edechamps

Addicted to Fun and Learning
Forum Donor
Joined
Nov 21, 2018
Messages
677
Likes
2,730
Location
London, United Kingdom
The process isn't perfect, but I took care to make sure the graphs were as close as I could get them with no smoothing.
How did you do it? @pierre has a technique to digitize spinorama charts, did you use that?
 

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,518
I think this is a great idea.

However, purporting to give an experimental "preference" model would be overstepping IMHO, given that we couldn't possibly derive it from valid preference data.

I would suggest instead that we simply give speakers a rating for each of:
  • ON deviation from flat
  • LFX
  • PIR (or ER) deviation from line of best fit (defined mathematically such that it is independent of slope, in contrast to the Olive paper).
IMHO there are 2 paremeters with PIR which are important: one is average (squarreed?) deviation from best fit line and the other is slope of the best fit line, as if slope is too small or too high it would impact the overall tonal balance.

P.S. What is "ON"?
 

andreasmaaan

Major Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
4,802
Likes
4,868
IMHO there are 2 paremeters with PIR which are important: one is average (squarreed?) deviation from best fit line and the other is slope of the best fit line, as if slope is too small or too high it would impact the overall tonal balance.
I agree. However, it's not established that any particular slope is superior (or likely to be most preferred). This is illustrated by the two Olive studies, where preferred slope proved to be dependent on the average slope of each sample.

Deviation, OTOH, is better established to correlate with listener preference.

Therefore, I think that deviation, but not slope, should be a rated parameter.

P.S. What is "ON"?
On-axis. I'm just borrowing the terminology from the Olive paper.
 

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,518
I agree. However, it's not established that any particular slope is superior (or likely to be most preferred). This is illustrated by the two Olive studies, where preferred slope proved to be dependent on the average slope of each sample.

Deviation, OTOH, is better established to correlate with listener preference.

Therefore, I think that deviation, but not slope, should be a rated parameter.
If we can agree that PIR with slope of 5 deg would intuitively sound "bright" and the one with slope of 45 deg would sound "dark" we can also agree that indicates there is an optimally sounding region of slopes somewhere in between. :)

For example, that is how I adjusted PIR of Sony speaker, I didn't only make it smoother but I also increased a slope a little to avoid too bright tonal balance.

On-axis. I'm just borrowing the terminology from the Olive paper.
Aha. In that case my vote would instead go in favor of using LW measured within +/-15 deg or similar.
 
OP
M

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
2,425
Likes
5,173
Location
Land O’ Lakes, Florida
Thread Starter #418
If we can agree that PIR with slope of 5 deg would intuitively sound "bright" and the one with slope of 45 deg would sound "dark" we can also agree that indicates there is an optimally sounding region of slopes somewhere in between. :)

For example, that is how I adjusted PIR of Sony speaker, I didn't only make it smoother but I also increased a slope a little to avoid too bright tonal balance.



Aha. In that case my vote would instead go in favor of using LW measured within +/-15 deg or similar.
Keep in mind that EQing to obtain a slope is not the same as inherent slope, which is achieved based on the directivity of the loudspeaker, and of which no real consensus has been reached as to what is ideal.

Listening window includes +/-10° vertical which, if my quick calculation is correct, is +/-17in at 8ft away, which is too large of an arch in my opinion unless you are looking for speakers for use in a home theater with tiered seating. I also don’t fully like using LW for accounting for no toe-in, as even if a 30° difference is pretty accurate, LW is using a 60° arch for the horizontal, so for toe-in purposes it really should be like
=average(0°, average(+/-10°) , average(+/-20°) , average(+/-30°))
 
Last edited:

Blumlein 88

Major Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
9,860
Likes
13,317

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,518
Keep in mind that EQing to obtain a slope is not the same as inherent slope, which is achieved based on the directivity of the loudspeaker, and of which no real consensus has been reached as to what is ideal.
Sure, it definitely isn't. When doing EQ I'm eyeballing where the natural slope would be if I remove the non-linearities. I guess I'm trying to say that the more non-linearities exist in the 200-20khz region the greater are the chances for the least square line to miss the natural slope of the speaker.
 
Top Bottom