• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

I cannot trust the Harman speaker preference score

Do you value the Harman quality score?

  • 100% yes

  • It is a good metric that helps, but that's all

  • No, I don't

  • I don't have a decision


Results are only viewable after voting.

Newman

Major Contributor
Joined
Jan 6, 2017
Messages
3,520
Likes
4,358
Lemme get this straight.
Because of my comment about tone controls I have an anti researcher bias?
No. My post was directed to saumbear, and I needed to include your post in part, so that readers understood what he was saying when he wrote “this is why” at the start of his post. I wasn’t taking issue with you.

I thought it was obvious who I was talking to, since the first line of my post reflected the last line of sarumbear’s post. But since it is not obvious, I have gone back and edited my post for clarity.

Thanks for bringing this up and sorry for the confusion. Cheers
 

Sancus

Major Contributor
Forum Donor
Joined
Nov 30, 2018
Messages
2,926
Likes
7,640
Location
Canada
The problem is that it is not perceptually based and doesn't take into account masking.

While it's indicative of a problem with the speaker it's not very good at predicting audibility or effect on sound quality. A speaker with higher THD can sound better than one with lower THD as has been demonstrated by researchers like Alex Voishvillo.

This is a problem that we've struggled with before in various threads, finding a metric that has better correlation with audibility but that isn't too difficult to calculate or visualize.

Are there any newer metrics that you'd suggest as a solution to that problem? I'm aware of eg DS Metric and some others.
 

Sean Olive

Senior Member
Audio Luminary
Technical Expert
Joined
Jul 31, 2019
Messages
334
Likes
3,065
If this was the contention at the start of this thread, it would have been a very brief set of mutual agreements - nobody is advocating using the predicted preference score as the sole assessment element of speakers. However, what it actually started with was...

...which is incorrect. The preference rating has a meaningful correlation with speaker preference. It's not sufficiently predictive to be the sole parameter we assess. It is, however, also not meaningless.
Curious -- what's with the center rear surround? I was under the impression that a speaker placed directly behind the listener could trigger front/back reversal and basically sound like it was in front.



The value of the score to me has always been the fact that I don't have time to read hundreds of spinoramas. So if I'm looking for a speaker according to some criteria(budget, size, whatever) then the score helps narrow the most likely choices. It doesn't relieve you from the requirement to actually make a choice yourself, or to read spinoramas of closely ranked speakers, or to consider other factors like SPL output. People who compare individual speakers by scores to the decimal point are missing the point entirely. And yes, some people do this on the forum.

But it does save a lot of time, and will save even more when the review list is a thousand speakers, which it will be some day.
It is a reduced set of speakers that we have set up in our Hyperion Lab where we measure listeners' HRTFs. The room is used to conduct spatial and sound quality assessments of immersive technologies like up-mixers, different music formats, and compare virtual sources produced by 3D headphones, soundbars,etc to the actual sources (the loudspeakers in the room). The system provides a ground truth.

Since front back confusions are common in 3D binaural rendering systems it's important to have a real source at 0 and 180 degrees. Finally there are some immersive formats including MPEG-H 22.2 that include a rear speaker ( both middle and upper layer). Usually there are very few front-back reversals in the loudspeaker systems because there are different objects in the speakers, different room reflections, and the listener can move their head to make it the localization less ambiguous. The reversal issue is mostly with headphones where no head-tracking is used or individualized HRTS are not available.


440px-Multi_Channel_Audio_Diagram.svg.png
 
Last edited:

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Controlling variables in experiments is important to make valid conclusions between the measured effects and their causation. Designing tests and measurements to provide sensitive, discriminating and reproducible results is also important. Mono tests provide that. Stereo tests do not.

I understand that, but Harman's research seems to have ditched those variables which it could not (easily) control regardless of their value.

We found that speakers that score well in mono generally score well in stereo, and that there is little to gain from doing routine tests in stereo, and very much to lose. It wasn't a question of taking the "easy way" out. It was a question of making the best decision from a purely scientific rationale based on the data we had. You have a different opinion. I can live with that.

But since people listen to speakers in stereo, one can also interpret the data in the inverse manner: speakers that score well in stereo may not score well in mono (for whatever reason). Since Quad ESL63s are highly regarded when listened to as intended – in stereo – and reasonably accurate speakers it is at least plausible that mono listening is introducing a form of bias (my guess is wider directivity).

I am well aware that the bass will change as you move a speaker in a room. No need to send me measurements. I have made thousands myself. So tell me, how you would design a controlled listening tests to compare four loudspeakers based on these measurements. What criteria would you use, wouldn't this criteria be a bias as you are making assumptions that intended by the designer. how would you account for biases in the blind tests from the listener being provide localization cues (speakers are in different locations)? If a speaker A has better bass in position X but this produces a different pattern of reflections that puts it at a disadvantage over Speaker B is position Y is it still a valid test? Have you thought about this?

I know that you've made thousands yourself. I just wanted you to know that I am not just making things up.

I have thought about it and I understand how difficult it would be to set up such a test.
But if the testing is not done in stereo then it's best not to do it at all.
 

phoenixdogfan

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
3,333
Likes
5,230
Location
Nashville
It is a reduced set of speakers that we have set up in our Hyperion Lab where we measure listeners' HRTFs. The room is used to conduct spatial and sound quality assessments of immersive technologies like up-mixers, different music formats, and compare virtual sources produced by 3D headphones, soundbars,etc to the actual sources (the loudspeakers in the room). The system provides a ground truth.

Since front back confusions are common in 3D binaural rendering systems it's important to have a real source at 0 and 180s. Finally there are some immersive formats including MPEG-H 22.2 that include a rear speaker ( both middle and upper layer). Usually there are very few front-back reversals in the loudspeaker systems because there are different objects in the speakers, different room reflections, and the listener can move their head to make it the localization less ambiguous. The reversal issue is mostly with headphones where no head-tracking is used or individualized HRTS are not available.


440px-Multi_Channel_Audio_Diagram.svg.png
I made a 12.1.10 layout on my Smyth A16 Realizer using my LS 50 Metas for the HRTF measurements. Set the layout up with a center rear channel, and presents no front back reversal issues whatsoever.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
The score is somewhat valuable.

What is the value exactly?
It tells you about preference in mono, but in what way is that useful or worse still an accurate metric of how speakers in stereo would be preferred?
 

Mad_Economist

Addicted to Fun and Learning
Audio Company
Joined
Nov 29, 2017
Messages
543
Likes
1,618
What is the value exactly?
It tells you about preference in mono, but in what way is that useful or worse still an accurate metric of how speakers in stereo would be preferred?
Mono and stereo preference ratings are correlated, which you seem to be willfully ignoring in this thread. I am honestly dubious that you are engaging in good faith.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
How would you account for positional biases in the blind tests from fact that the listener can now identify the different speakers purely based on different localization cues? And now by virtue of the different positions you have also created a different set of reflection patterns that may enhance or degrade the perceived sound quality of the loudspeaker? How would you deal with that?

It seems to me that your proposed methodology would have so many uncontrolled experimental biases it would never get past the first stage of scientific peer review -- but that's just my opinion.

So instead of dealing with all those (perhaps unsurmontable issues) you chose to avoid them and set up a test that pretends they don't exist in real life.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Mono and stereo preference ratings are correlated, which you seem to be willfully ignoring in this thread. I am honestly dubious that you are engaging in good faith.

Nie try. Except they aren't:

UvF7rx5.png
 

Sean Olive

Senior Member
Audio Luminary
Technical Expert
Joined
Jul 31, 2019
Messages
334
Likes
3,065
Curious -- what's with the center rear surround? I was under the impression that a speaker placed directly behind the listener could trigger front/back reversal and basically sound like it was in front.



The value of the score to me has always been the fact that I don't have time to read hundreds of spinoramas. So if I'm looking for a speaker according to some criteria(budget, size, whatever) then the score helps narrow the most likely choices. It doesn't relieve you from the requirement to actually make a choice yourself, or to read spinoramas of closely ranked speakers, or to consider other factors like SPL output. People who compare individual speakers by scores to the decimal point are missing the point entirely. And yes, some people do this on the forum.

But it does save a lot of time, and will save even more when the review list is a thousand speakers, which it will be some day.
Exactly. The statistical confidence of the predictions does not allow comparisons beyond 0.5- 1 point so it is moot to get down to 1 decimal place.

I wish I had the time, money and support to improve the model but there is little interest outside this thread..
 

Mad_Economist

Addicted to Fun and Learning
Audio Company
Joined
Nov 29, 2017
Messages
543
Likes
1,618
Nie try. Except they aren't:

UvF7rx5.png
Except they are. The hierarchy of preference is the same in both conditions - the quad speakers, being dipolar, are an interesting case where the preference score typically falls short, but they remain least preferred in monophonic and stereophonic listening.
 

Mad_Economist

Addicted to Fun and Learning
Audio Company
Joined
Nov 29, 2017
Messages
543
Likes
1,618
So instead of dealing with all those (perhaps unsurmontable issues) you chose to avoid them and set up a test that pretends they don't exist in real life.
Your complaints, again, seem to amount to "this is imperfect". We all agree: it is imperfect. All models are imperfect, that is why they are models.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Except they are. The hierarchy of preference is the same in both conditions - the quad speakers, being dipolar, are an interesting case where the preference score typically falls short, but they remain least preferred in monophonic and stereophonic listening.

I am honestly dubious that you are engaging in good faith.
 

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Your complaints, again, seem to amount to "this is imperfect". We all agree: it is imperfect. All models are imperfect, that is why they are models.

The methodology is unfit for pupose and the test is based in an assumption I disagree with.
 

Mad_Economist

Addicted to Fun and Learning
Audio Company
Joined
Nov 29, 2017
Messages
543
Likes
1,618
I am honestly dubious that you are engaging in good faith.
Your sarcasm does little to dispel my impressions here.
The methodology is unfit for pupose and the test is based in an assumption I disagree with.
What level of predictive capability would you require to verify that a model is fit for purpose? Or do you work backwards from the number of variables included to determine validity, regardless of the actual correlation of the model's outputs with results?
 

phoenixdogfan

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
3,333
Likes
5,230
Location
Nashville
I've always seen the preference score as a starting point. It seems heavily weighted toward smooth directivity and flat anechoic FR. Those two items are essential to finding great speakers, but they don't speak to the speaker's power handling or distortion characteristics--both of which are important as well. As a result I see the score as an initial filter, but hardly the last word.

A completely accurate unitary measurement would have to weigh all the factors (directivity, FR, power handling, distortion), create a score for each and then use them to create a weighted average which exactly matched the utility curve of the prospective user, and, since it's doubtful every user would have the same utility curve, a universally valid unitary measure is probably impossible.
 

Sean Olive

Senior Member
Audio Luminary
Technical Expert
Joined
Jul 31, 2019
Messages
334
Likes
3,065
So instead of dealing with all those (perhaps unsurmontable issues) you chose to avoid them and set up a test that pretends they don't exist in real life.
You are the one that proposed the new speaker test methodologies , not me. They are your issues - not mine. You solve them

Seems like you are taking the easy way out by complaining about our test methods but coming up with no good scientific alternatives.
 

ROOSKIE

Major Contributor
Joined
Feb 27, 2020
Messages
1,936
Likes
3,525
Location
Minneapolis
This is a problem that we've struggled with before in various threads, finding a metric that has better correlation with audibility but that isn't too difficult to calculate or visualize.

Are there any newer metrics that you'd suggest as a solution to that problem? I'm aware of eg DS Metric and some others.

I think we have to listen which is why I really want @amirm and Erin to keep the subjective end up on speaker reviews.

I used to go to a family oriented Sunday morning DJ curated dance party.

At some point one speaker blew a single woofer. (out of a dozen or so speakers in the studio)
I noticed it right away(I was fairly near it) and yet not one single other person (out of maybe 50-75)seemed to notice until I pointed it out.
Even after pointing it out some had trouble hearing it or believing me. I did not find it that subtle and doubt many here would either.
Who knows why it was so well masked for so many, electronic music is crazy stuff though.

Anyway just saying distortion is personal thing.

I have had to listen to each speaker, I do like to rock out regularly and some speakers just can and others can't. (With many in the middle, 'cept the middle is no place for me) so I have had to identify personally through listening which sets will handle the louder times well and do so in a way that truly pleases me. I needs to sound better and better when loud not just able to go there on a meter and the dynamics must be freely flowing without any sense of compression. Some of EAC's measurements I think allude to these parameters at least a bit, though I don't know - listening is the thing here.
I don't have golden ears, I do think I am fairly sensitive to the onset of stress in a speaker so I go with my assessment.

If one doesn't listen loud it does seem that this question is almost irrelevant.
Very few speakers have stress and distortion issues at a medium and lower volume and Mr. Olive stated it here again that at 80db averages the tests were not really affected by these issues.
Now my fairly regular 90db ave listening session are and deff anyone who goes louder yet (I sometimes do some real rocking)better test in person in their space.
 
Last edited:

Sean Olive

Senior Member
Audio Luminary
Technical Expert
Joined
Jul 31, 2019
Messages
334
Likes
3,065
This is a problem that we've struggled with before in various threads, finding a metric that has better correlation with audibility but that isn't too difficult to calculate or visualize.

Are there any newer metrics that you'd suggest as a solution to that problem? I'm aware of eg DS Metric and some others.
Thanks for the reference. I wasn't aware of this thesis (2007) but metric produces high correlations (r = 0.95) with listening test data versus THD metrics. Other metrics are from Earl Geddes (GeddLee) and Moore et al. and there may be others.

We did some listening tests on headphones using different distortion metrics and found applying Non-Coherence from SoundCheck correlated best with listening test results.
 

Sean Olive

Senior Member
Audio Luminary
Technical Expert
Joined
Jul 31, 2019
Messages
334
Likes
3,065
Nie try. Except they aren't:

UvF7rx5.png
You seem to be focussed on 1 graph, and 1 data point, and that is your only argument. Not a compelling argument.

Read the entire paper again, or have a conversation with Floyd Toole, which I did yesterday. The data between mono and stereo are highly correlated. The exception is one speaker whose spatial ratings varied significantly in stereo depending on the music program. No confidence intervals or statistical tests were done. Are they statistically significant differences? In the case of the stereo test I would guess no.
 
Top Bottom