• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The Performance score for Speaker reviews is problematic.

abdo123

Master Contributor
Forum Donor
Joined
Nov 15, 2020
Messages
7,446
Likes
7,955
Location
Brussels, Belgium
I understand that Amir doesn't generate these, but I had to make a thread about this since the scores are on the Speaker review index.

The score seems to reflect the FR of on-axis, anechoic measurements (???) instead of the predicted in-room FR curve. This is problematic in so many ways because the highest scoring speakers in the index now have perfect anechoic measurements but not the flattest predicted in-room response.

Knowing that purchasing a Speaker is a matter of choosing what you want to compromise for in the sound, and that no speaker can generate 'the perfect sound' in a review (Unlike electronics reviews in which you can perfectly tell from the data whether something will sound right or not) all of this makes it extreemly difficult for people to make informed decisions about a Speaker purchase of one of the speakers reviewed on this site.

To summarize, we have a score that doesn't really reflect real performance plus the ambiguous nature of speaker reviews making it really difficult for readers to make a decent decision regarding a purchase.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
Did you read the underlying research or is this more of a general opinion?
 

Sancus

Major Contributor
Forum Donor
Joined
Nov 30, 2018
Messages
2,926
Likes
7,643
Location
Canada
The score isn't precise, but I don't agree at all that it's problematic or misleading. In fact, it's been shown that it correlates pretty well with the listening tests and Amir's judgement, just not perfectly.

The reason it continues to be used is that any alternative scoring would not be based on ANY research, and thus would be worse, and getting rid of it entirely would make a database of hundreds of speakers pretty much unsortable.

You can use it to identify the top 5 or 10 speakers that have been reviewed in your budget/other criteria, and not only is that all it needs to be used for, but it is in fact very good for that purpose.

In any case we've had a zillion threads about this topic and I don't think it's going anywhere so not much point to going over all the arguments again.
 

Jdunk54nl

Addicted to Fun and Learning
Joined
Aug 5, 2020
Messages
969
Likes
1,049
Location
Arizona
more of a general opinion, since no one here lives in an acoustically treated underground bunker (hopefully?)

If you read and understand the research, this comment would most likely not be made. Once you understand the research, you can better know how to interpret the score and what it really is telling you. Just like all of Amir’s data. Understand what it means and don’t rely on Amir. He uses the research to make his informed opinions, but you should know what it means as well.

Ultimately though, Yes the score has some issues. But if you understand the research, those issues don’t matter.
 
Last edited:

dfuller

Major Contributor
Joined
Apr 26, 2020
Messages
3,406
Likes
5,255
PIR is (as I understand it) more far-field listening than nearfield which has a different ratio of direct:reflection.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
more of a general opinion, since no one here lives in an acoustically treated underground bunker (hopefully?)
There are sure to be a number of replies addressing each point of your initial post, but if you get acquainted with the reasoning behind the score I think your opinion might change. It is based on research aiming for the best response, when all is said and done, at home.
 
OP
A

abdo123

Master Contributor
Forum Donor
Joined
Nov 15, 2020
Messages
7,446
Likes
7,955
Location
Brussels, Belgium
The score isn't precise, but I don't agree at all that it's problematic or misleading. In fact, it's been shown that it correlates pretty well with the listening tests and Amir's judgement, just not perfectly.

The reason it continues to be used is that any alternative scoring would not be based on ANY research, and thus would be worse, and getting rid of it entirely would make a database of hundreds of speakers pretty much unsortable.

You can use it to identify the top 5 or 10 speakers that have been reviewed in your budget/other criteria, and not only is that all it needs to be used for, but it is in fact very good for that purpose.

In any case we've had a zillion threads about this topic and I don't think it's going anywhere so not much point to going over all the arguments again.

That's a very good point, but now my confusion is even bigger. What should people focus on in the speaker review? Predicted In-room response or anechoic measurments?

I was just so shocked to see the reviews expecting the 'best speaker measured on this forum' and then ending up with flaws here and flaws there to the point where you can't really tell whether they're recommended because 'okay nothing is fatally flawed here' or because they're superior to the competition. The distinction between 'Okay' and 'Great' is so much clearer in Electronics reviews.
 

Jdunk54nl

Addicted to Fun and Learning
Joined
Aug 5, 2020
Messages
969
Likes
1,049
Location
Arizona

Sancus

Major Contributor
Forum Donor
Joined
Nov 30, 2018
Messages
2,926
Likes
7,643
Location
Canada
That's a very good point, but now my confusion is even bigger. What should people focus on in the speaker review? Predicted In-room response or anechoic measurments?

Well one leads to the other, good anechoic measurements give you a good PIR. The PIR is a generated, hypothetical model of a room response at farfield listening distance and the measurements are raw data. The preference score in fact does include PIR as a variable, an explanation of it(math heavy) is here. I believe the Klippel PIR and the formula PIR are calculated slightly differently though.

We don't really have a good, comprehensive explanation post on how to read the reviews(unfortunately -- but it would be a ton of work to make a good one that is both not too long and comprehensible). Amir's first 305p review explains most of the basic graphs pretty well.

I generally look at the PIR as a shortcut to "does this speaker have any major tonality issues?" and that's it. It's not particularly clear that a good PIR always means good sound, as the PIR is a single line graph, whereas we hear reflections and direct sound as different aspects. Hypothetically a speaker can do things that make the on axis sound or off axis sound worse, to compensate for the other one, resulting in a better PIR, but whether or not that would result in better sound depends on many factors.

In a perfect world with unlimited budget you want something with perfectly flat on-axis response and perfectly smooth off-axis response in all 3 dimensions, with extremely low distortion at unlimited playback levels. Of course, that doesn't exist, so there are always trade offs. There aren't really clear answers to how wide or narrow the beam width should be, how important vertical directivity is, or how audible distortion is and how much of it is acceptable. A lot of that is a matter of opinion.

However, you can't go too far wrong picking among the higher scoring speakers(+/- 1pt) within your budget, generally speaking. The KH310 and the Genelec 8341A are some of the best speakers ever reviewed for example, they both sound fantastic, and picking between them would be a more a matter of preference and application than simply "well looking at these reviews I can 100% predict which would sound better in your room." Same goes for the Kef R3 and the Revel M105/M106, or anything else in the same ballpark.

The score and the review graphs help you narrow things down but that's all they can do. In the end your decision is up to you. :)
 

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,897
Likes
16,901

napilopez

Major Contributor
Forum Donor
Joined
Oct 17, 2018
Messages
2,146
Likes
8,716
Location
NYC
Thank you for all your responses, it seems obvious that i have a lot to learn!

To expand on what some of the others said -- though really, buy Toole's book, it's a gold mine -- and try to summarize the key tenets of the research, here's what you need to know:

The most thorough research we have about the best-sounding speakers in double-blind tests suggest the top performers will have a flattish anechoic on-axis response and smooth directivity behavior.

Good directivity behavior will typically lead to a 'good' predicted in-room response, which should generally tilt about 8-10 dB from 20 Hz to 20kHz and be free of major deviations.

But it's crucial to understand that for most speaker designs, if a speaker has a flat on-axis and good directivity, than a tilted, smooth predicted in-room response results automatically. A speaker should almost never measure perfectly flat(non-tilted) in room.

The predicted in-room response is derived mostly from the off-axis sound of the speaker. It is calculated by applying various weightings to different angles of a speaker's directivity. This is because certain angles are more likely to be reflected early, and therefore have a larger perceptual impact.

The preference score then takes both the on-axis and PIR into account. It is the result of a couple of seminal papers in the early 2000s that evaluated dozens of speakers among even more listeners. The ones that performed best had a flat on-axis and smooth, tilted PIR. The score is far from perfect, especially for the individual listener, but it appears to be a useful predictor of preference over a wide range of potential listeners.

Both on-axis and PIR are important because in a typical living room setup we are hearing a substantial mix of the direct sound -- the first sound to hit our ears -- and the reflected sound (especially the earliest, single-bounce reflections). Both have a large impact on our perception of tonality.

The direct sound is generally perceptually dominant as it's the first sound to hit our ears. But our ears allow a small window of time in which reflections can actually contribute to the perceived tonality of a spraker. These earliest and loudest reflections essentially get perceptually "summed" to the direct sound. A speaker with bad directivity will have bad reflections, and bad reflections will adversely affect the perceived tonality.

I see it kind of like food with a weird aftertaste. Even if the direct sound/initial bite is good, the reflections/aftertaste can ruin it if not controlled.

There's a lot more nuance to it than the above paragraphs, such as the difference between horizontal and vertical reflections, the fact that nearfield speakers in a treated room are (somewhat) less affected by bad directivity, and the fact all the above is somewhat moot if a speaker can't get loud enough for your tastes or have enough bass for your music. But I hope that clears some of it up.
 
Last edited:

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
7,079
Likes
23,522
Location
Mid-Atlantic, USA. (Maryland)
That's a very good point, but now my confusion is even bigger. What should people focus on in the speaker review? Predicted In-room response or anechoic measurments?

The danger is trying to boil it down like that.

It's like when people only use SINAD to rate DAC's. It isn't that simple.

People should focus on understanding what it all means.

The more you understand, the less important that score, or the panther is.

For those who really aren't interested in learning much and just want someone to tell them if it's ok, the ratings and panthers are a great start, from someone that has nothing to gain from that sale.

However, there is so much more than that buried in all those numbers and graphs, so taking the time to learn what they mean will be a lot more useful to you in any long run.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,336
Likes
6,705
I understand that Amir doesn't generate these, but I had to make a thread about this since the scores are on the Speaker review index.

The score seems to reflect the FR of on-axis, anechoic measurements (???) instead of the predicted in-room FR curve. This is problematic in so many ways because the highest scoring speakers in the index now have perfect anechoic measurements but not the flattest predicted in-room response.

Knowing that purchasing a Speaker is a matter of choosing what you want to compromise for in the sound, and that no speaker can generate 'the perfect sound' in a review (Unlike electronics reviews in which you can perfectly tell from the data whether something will sound right or not) all of this makes it extreemly difficult for people to make informed decisions about a Speaker purchase of one of the speakers reviewed on this site.

To summarize, we have a score that doesn't really reflect real performance plus the ambiguous nature of speaker reviews making it really difficult for readers to make a decent decision regarding a purchase.

I think you have a few misunderstandings.

The score is actually based heavily on the predicted in room response. Some even say that's its biggest problem.

Also, you don't want a flat in room response. Research shows that most of us prefer a flat anechoic response, not a flat in room response. A flat in room response is usually indicative of a speaker that is too bright, and most would not prefer it.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,336
Likes
6,705
Thank you for all your responses, it seems obvious that i have a lot to learn!

Indeed, but it's great that you recognize that so early :D. As others have already said, reading Floyd Toole's book will give you a great head start when it comes to selecting future loudspeakers you're likely to prefer.

If you don't want to start by buying a book, I would recommend reading this thread. It's a different forum, but still a great read. That's where I started, and that thread convinced me to buy and read Toole's book :). You'll also get to read dissenting opinions in that thread, as well as Toole's direct responses to those.
 

youngho

Senior Member
Joined
Apr 21, 2019
Messages
487
Likes
800
Can anyone point me to a good discussion regarding the limitations of the speaker prediction model if the correlation coefficient is 0.86? What might account for or contribute to the remaining 0.14 (I know that's not really a thing)?

For example, obvious candidates include:

Distortion
Diffraction (I hope I'm not misattributing Kevin Voeck's words when I recall that he assigned some degree of credit to these first two for the Salon 2's superior performance in listening tests)
Directivity (overall amplitude, also relatively constant versus smoothly increasing with frequency)
Individual listener preference ("Listeners with more experience in critical listening prefer less bass and treble than listeners with less experience" from Olive at https://www.listeninc.com/wp/media/Perception_and_-Measurement_of_Headphones_Sean_Olive.pdf, also "the target variations at both ends of the spectrum are substantial, with untrained listeners simply choosing “more of everything”" from Toole at https://www.aes.org/e-lib/online/browse.cfm?elib=17042#:~:text=The results show that listeners,individual taste and listener training, also "Ando et al. (2000) found that musicians judge refl ections to be about seven times greater than ordinary listeners, meaning that they derive a satisfying amount of spaciousness from refl ections at a much lower sound level than ordinary folk" from his book)
Possibly even the listener's task at hand (Toole from above link: "Is this a consequence of the different experimental methods: the different listener tasks? In one, listeners adjusted the bass and/or treble balance in a single loudspeaker model; in the other they rated spectral balances and other attributes in randomized comparisons of different products. It is a subtle but important difference awaiting an explanation" and from his book with comments like "Perhaps the listening circumstances allowed professionals to shift between listening modes—recreational and working (in which they would typically be in a dominant direct sound field).")

Thanks!
 

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,939
Location
Oslo, Norway
Indeed, but it's great that you recognize that so early :D. As others have already said, reading Floyd Toole's book will give you a great head start when it comes to selecting future loudspeakers you're likely to prefer.

If you don't want to start by buying a book, I would recommend reading this thread. It's a different forum, but still a great read. That's where I started, and that thread convinced me to buy and read Toole's book :). You'll also get to read dissenting opinions in that thread, as well as Toole's direct responses to those.

I agree, that's a great thread. But I do think that it's appropriate to treat some of the claims with caution. IMO Kevin Voecks from Harman makes some claims which are too strong, for example that "there is a universal definition of what sounds good" (referring to these articles: http://www.aes.org/e-lib/browse.cfm?elib=12794 and https://secure.aes.org/forum/pubs/conventions/?elib=12847 )

There are also several studies which in fact show pronounced cross-cultural differences in preference for how sound should be reproduced. Some examples:
"A cross-cultural comparison of preferred spectral balances for headphone-reproduced music"
https://www.jstage.jst.go.jp/article/ast/38/5/38_E1714/_article/-char/ja/

"Do we hear differently? Comparing spatial hearing between East-Asian and North-American listeners"
https://www.jstage.jst.go.jp/article/ast/41/1/41_E19222/_article/-char/ja/

"A cross-cultural comparison of salient perceptual characteristics of height channels for a virtual auditory environment"
https://link.springer.com/article/10.1007/s10055-015-0269-1


There are also a couple of other studies which suggest that young people who have gotten used to low-quality youtube audio start to develop a preference for that kind of audio quality (I saved them in my library a while back, but couldn't find it now with a quick search. Will try to relocate them)

I do think that these preference studies can provide useful information about what the average listener who is accustomed to a certain kind of music and a certain kind of loudspeaker will prefer, though!
 
Top Bottom