The frailty of Sighted Listening Tests

patate91 · Aug 14, 2020

@whazzup I recommend that you have a look at thé Erin's web site, here's a good exemple

https://www.erinsaudiocorner.com/loudspeakers/micca_mb42xiii/

whazzup · Aug 14, 2020

patate91 said:
@whazzup I recommend that you have a look at thé Erin's web site, here's a good exemple

https://www.erinsaudiocorner.com/loudspeakers/micca_mb42xiii/

And the point that you want to make is...? You like Erin better than Amir because he writes more words...? You'll donate money to Erin and not to Amir?

patate91 · Aug 14, 2020

whazzup said:
And the point that you want to make is...?

That's a good exemple of a sighted listening test done with a will to minimise biais (reviewer and reader).

preload · Aug 14, 2020

Rusty Shackleford said:
I think @bobbooo did a thorough job of addressing these issues. However, you’re adding the conclusion that if trained listeners can’t perform as well sighted as blind then the training is of no value and, further, sighted listening is of no value. That’s a straw man. No one is arguing that. This isn’t a binary choice between “perfect” and “worthless.”

Exactly the issue I had. I actually replied that I thought it was a strawman as well, but I edited it out. Thank you @Rusty Shackleford for calling it out.

There are two completely different issues here. Let's not obfuscate them.
#1 is whether being trained makes you less biased or more accurate when providing sighted listening impressions.
#2 is whether it's possible for sighted listening impressions of loudspeakers to provide useful information (albeit not as accurate/reliable as a blinded test)

#1 is still up for debate, from my perspective.
#2 has already been addressed by myself and others, including an evaluation of published evidence, and confirmed by Sean Olive.

whazzup · Aug 14, 2020

patate91 said:
That's a good exemple of a sighted listening test done with a will to minimise biais (reviewer and reader).

How did Erin minimise the bias for sighted listening? Because he spends more time talking about his subjective / professional opinions versus the objective data, and Amir didn't?

whazzup · Aug 14, 2020

preload said:
Exactly the issue I had. I actually replied that I thought it was a strawman as well, but I edited it out. Thank you @Rusty Shackleford for calling it out.

There are two completely different issues here. Let's not obfuscate them.
#1 is whether being trained makes you less biased or more accurate when providing sighted listening impressions.
#2 is whether it's possible for sighted listening impressions of loudspeakers to provide useful information (albeit not as accurate/reliable as a blinded test)

#1 is still up for debate, from my perspective.
#2 has already been addressed by myself and others, including an evaluation of published evidence, and confirmed by Sean Olive.

Great~ no issue with being corrected.

So most if not all, are in agreement that 'it's possible for sighted listening impressions of loudspeakers to provide useful information (albeit not as accurate/reliable as a blinded test)'.

Now the process of training a listener and his/her daily work is the one under question?

And is there a consensus on whether the 'experienced listeners' in cited studies are or are not representive of the abilities of a 'critical listener'? Because we can also run rings around this if everyone has a different view of who they represent.

patate91 · Aug 14, 2020

whazzup said:
How did Erin minimise the bias for sighted listening? Because he spends more time talking about his subjective / professional opinions versus the objective data, and Amir didn't?

I think you'll have to read studies about biais : what they are, what they do, etc. There's also a lot of vulgarisation on YouTube (quality varies a lot), at least in french.

whazzup · Aug 14, 2020

patate91 said:
I think you'll have to read studies about biais : what they are, what they do, etc. There's also a lot of vulgarisation on YouTube (quality varies a lot), at least in french.

Aww...now you're being vague....you said it's a good example but cannot point out what's good about it?

Nowhere in the review he mentioned he did blind testing. So the only difference is he listened to it before doing the measurements? And that is good enough for you? So we're only talking about the bias of knowing the measurements before the listening tests? That is the difference between Amir and Erin?

aarons915 · Aug 14, 2020

preload said:
Exactly the issue I had. I actually replied that I thought it was a strawman as well, but I edited it out. Thank you @Rusty Shackleford for calling it out.

There are two completely different issues here. Let's not obfuscate them.
#1 is whether being trained makes you less biased or more accurate when providing sighted listening impressions.
#2 is whether it's possible for sighted listening impressions of loudspeakers to provide useful information (albeit not as accurate/reliable as a blinded test)

#1 is still up for debate, from my perspective.
#2 has already been addressed by myself and others, including an evaluation of published evidence, and confirmed by Sean Olive.

I think most agree with #2 and sighted impressions can be valuable but as far as #1, the Harman training makes no claim that it reduces sighted bias, the study they conducted shows that bias affected the experienced listeners more than the inexperienced listeners. Finally, Harman to this day still conducts blind testing with their trained listeners, there would be no need to do that if they felt the trained listeners wouldn't be biased in any way.

Dr. Olive's posts made all of this pretty clear in my opinion so since blind testing isn't feasible at this time, the main takeaway of the thread is that the CTA-2034 measurements are king followed by the subjective impressions.

patate91 · Aug 14, 2020

whazzup said:
Aww...now you're being vague....you said it's a good example but cannot point out what's good about it?

Nowhere in the review he mentioned he did blind testing. So the only difference is he listened to it before doing the measurements? And that is good enough for you? So we're only talking about the bias of knowing the measurements before the listening tests? That is the difference between Amir and Erin?

I'm not vague, it seems that your knowledge on the subject are limited.

On Erin's site the visual is different too. Again I think your knowledge are limited. The subject is very interesting and it changed a lot of things in my life

whazzup · Aug 14, 2020

patate91 said:
I'm not vague, it seems that your knowledge on the subject are limited.

On Erin's site the visual is different too. Again I think your knowledge are limited. The subject is very interesting and it changed a lot of things in my life

Nah, I read through some of your posts on the SVS thread again. It appears you're just unhappy Amir didn't do more work to relate his subjective opinions to the data and Erin did. So it's really not about the validity of 'sighted testing'. You just want people to hand you the knowledge.

Sure, I have limited knowledge in a lot of areas. That's why I ask simple, straightforward questions to get straightforward answers.

patate91 · Aug 14, 2020

whazzup said:
Nah, I read through some of your posts on the SVS thread again. It appears you're just unhappy Amir didn't do more work to relate his subjective opinions to the data and Erin did. So it's really not about the validity of 'sighted testing'. You just want people to hand you the knowledge.

Sure, I have limited knowledge in a lot of areas. That's why I ask simple, straightforward questions to get straightforward answers.

I and other people have anwser to your questions more than once. You don't accept them or you need to learn a little bit more on the subject in order to accept them.

Rusty Shackleford · Aug 14, 2020

whazzup said:
So what is YOUR interpretation of the role of a critical listener and how much weight do you place on their sighted evaluations?

Sure, you can disregard my questions 2a/2b, but still, would you care to address questions 1 and 3? Your interpretation of the study and Olive's words, of course.

As I’ve said before, I’m agnostic on many of these issue. All I’m looking for is clarity and consistency.

To be clear, I think training absolutely has value. I think people (if not all people) can be trained to be more discerning listeners and, as the studies show, that this means they will be more consistent and articulate in their feedback.

Do I think that training will allow a sighted listener to hear two speakers that measure very similarly weeks apart and render an accurate better/worse judgment? I’m still not sure.

The one point I’ve been trying to make, and where this debate seems to have run aground, is that whatever we say about critical trained listeners (or whatever term we want to use) cannot apply only to Amir. We now have Harman’s definition of a trained listener. In addition to How to Listen, we also have Sound Gym and countless other apps. Moreover, many audio-related college degrees include critical listening courses. (See Jason Corey’s excellent work.) Not to mention all the many audio engineers and others who make their living based on their listening skills.

However, in this thread and elsewhere (believe it or not, he participated in a thread on this very Olive article elsewhere years ago and made many of the same points), Amir has said that, whatever definition of critical/trained listener we come up with, research based on those definitions don’t apply to him, because his skills are superlative. Given that, there’s no research that will settle this debate.

Thomas savage · Aug 14, 2020

Rusty Shackleford said:
Amir has said that, whatever definition of critical/trained listener we come up with, research based on those definitions don’t apply to him, because his skills are superlative. Given that, there’s no research that will settle this debate.

Not my understanding of what Amirm has said repeatedly in this thread .

But hayho.

Rusty Shackleford · Aug 14, 2020

Thomas savage said:
Not my understanding of what Amirm has said repeatedly in this thread .

But hayho.

Amir wrote: “At the end of the day, you can't know the limits or effectiveness of what I do. So best to take a back seat and not try to pontificate based on studies that I keep telling you does not read on this situation.”

If he wants to clarify that Harman trained listeners are his equals, then that will clarify things and negate what I said. Otherwise, if we “can’t know the limits or effectiveness” of his skills, it seems it’s pointless to bring research to bear on this question.

krabapple · Aug 14, 2020

amirm said:
Whether you do a test blind or sighted, there is no guarantee of correctness. Every test has a margin of error. Turn a sub on and off in your room. Do you need a double blind test to trust what it does in your room? No. A blind test would generate the same as sighted test.

Sightest tests have higher error rate than blind tests, all else being equal. But they are also extremely fast and low effort. For this reason, the industry uses them as quick tests and then perform occasional double blind test as a backstop. No different than a lab test at a doctor's office which is quick but has lower accuracy than one sent to an external lab.

Again, look at Olive research:

Ranking of speakers G, D and T did not change in sighted versus blind. Only speaker S changed.

Best engineering is about optimization and getting 90% right for 10% of the effort/cost. We are not purists here with infinite budget and time to double blind tests speakers.

If we could show sighted tests to be mostly wrong as they are in electronics, then sure, we would not attempt it. But reality is that difference between speakers is quite large and is able to overpower listener bias when said listener a) has no stake in the outcome and b) has better thresholds of detections of impairments.

As I have said, we do this in the industry all the time where the outcome really matters. Jobs and company reputations are at stake. Yet we do it because the risk/reward is appropriate.

Apart from this being a test of four, and only four, loudspeakers, simple 'ranking' is a crude measure. Alternately, a takeaway from that graph is that blinding the test reduced the strength of preference for the first two and dislike for the third, enough that the four actually become remarkably similar to each other, preference-wise. In fact the error bars for A ,B,C,D all appear to overlap once the test is blinded.

A point *against* sighted testing.

You have said that there are perhaps at best a few hundred 'trained listeners' . I wanted to know when and how we are to know who is a 'trained listener' and what degree a faith we are expected to put in their sighted evaluations.

You are now adding the claim that the (undisputed) 'quite large' differences between speakers 'overpowers' cognitive bias when the listener 1) has 'no stake' (which strikes me as problematic: one element of cognitive bias is that it is not necessarily conscious; that the listener *feels* they are stake-free is not sufficient) and 2) is good at detecting impairments -- presumably either a native talent , or a product of....listener training?

I don't think clarity has been achieved here. And I don't give a tinker's damn for the excuse that the 'industry' takes shortcuts because it lacks infinite money etc...I want to know what the value to consumers is, of any given person'ts 'sighted' report of loudspeaker quality. If there are in fact only 'a few hundred if that' trained listeners in the world, the safe presumption is that a random person's sighted report, such as encountered endlessly in audiophile forums and publications, and of course loudspeaker marketing, is worthless.

whazzup · Aug 14, 2020

aarons915 said:
...as far as #1, the Harman training makes no claim that it reduces sighted bias, the study they conducted shows that bias affected the experienced listeners more than the inexperienced listeners. Finally, Harman to this day still conducts blind testing with their trained listeners, there would be no need to do that if they felt the trained listeners wouldn't be biased in any way.

Dr. Olive's posts made all of this pretty clear in my opinion so since blind testing isn't feasible at this time, the main takeaway of the thread is that the CTA-2034 measurements are king followed by the subjective impressions.

Unfortunately, it is unclear whether experienced listeners (in study) = critical listeners.
Although Olive has also mentioned that it is incorrect to generalize the results of this study to all sighted / blind testing and experienced / untrained listeners. And this is also a point that Amir has pointed out.

It's true the theoretical part of harman's training (that's online) makes no claim of reducing sighted bias. But again, no one disputes that blind testing is less biased, and Harman is still using blind testing. The issue is how close sighted testing can get to blind testing, especially when done by critical listeners.

Sean Olive said:
We define a trained listener as someone who has normal audiometric hearing and has passed Level 8 or higher in our How to Listen Training software. We also look at their performance in actual tests in terms of how discriminating and consistent they are in rating products.

Sean Olive said:
And yes, achieving level 10 in How to Listen doesn't guarantee you will be good at evaluating loudspeakers, which is why we monitor your performance in this task.. But my experience indicates that the training combined with actual experience in tests tests to correlate with performance in tests.

Just the software training is insufficient. Active supervision is required to decide the real capability of the listener in training. Unfortunately we have no clue how this is determined.

Sean Olive said:
That is a difficult question to answer. How can you test the reliability of a sighted tesr? How do you test reliability other than repeating the evaluation ? As we've shown even trained listeners are biased by price, brand, design, etc. I would certainly want to see measurements that reinforce what the listener is reporting.

There's an inherent difficulty in validating sighted testing. So it falls on measurements as evidence. Amir has mentioned multiple examples where he did sighted fault finding and his team subsequently validated that with measurements.

Erin has been doing sighted testing too with objective measurements.

So......Can we trust both of them? Or none of them?

And 4 questions to ask Olive (if he's willing to descend on this mess again) would be:

1. How far away are the 'experienced listeners' (in the study) from actual 'critical listeners' (professionals who deal with audio evaluation / fault finding)?
2. How do they decide the young Jedi / listener is now ready? By consistently finding faults in speakers? Sighted / Blind / Both?
3. Does the training in its entirety (~8 months as mentioned in article) reduce sighted bias, or have any specific portions that seek to reduce sighted bias?
4. In practice, how much of the training / tests conducted are done sighted, and how much blind?

krabapple · Aug 14, 2020

MattHooper said:
Now, as applied to the subjective review crowd in general, I can see a protest "We are fine provisionally accepting sighted speaker impressions from Amir (a "trained listener") but forget about the stereophile/absolute sound, untrained audiophiles and all the other riff-raff. That stuff is useless."

I find it so. Except as anthropology.

Except I don't find it to be useless. When I encounter or audition a new speaker I often look up what reviewers and other audiophiles are saying about it. I don't care much about reading someone's emotional reaction "I was swept away...blah, blah..." I pay attention to whether the person is characterizing the sound, and how well they do it. When I see a consensus happening on the general character of a speaker, I most often find it to be "accurate" to my own impressions of that speaker. (Sometimes this is when I've heard the speaker after being intrigued by the descriptions/reviews I'm reading, but often enough I'm looking for these subjective impressions after I first heard the speaker and formed my own impression). When I find a reviewer who seems to be pretty accurate in this way and/or whose tastes I have divined over time, I can find their subjective take on a speaker to be somewhat informative.

Maybe you 'often find' that because you tend to remember when it happens, more than when it does not. That too, is a form of bias.

Really, there are excellent, excellent reasons to require blind protocols.

Sean Olive, a Harman trained listener, sometimes uses sighted tests...but is well aware of the limitations and biases involved. One presumes that when he makes a claim from a sighted trial, he acknowledge those caveats. I would like to see how he does that. They could be a model for Amir's sighted reports here, for example.

whazzup · Aug 14, 2020

krabapple said:
Apart from this being a test of four, and only four, loudspeakers, simple 'ranking' is a crude measure. Alternately, a takeaway from that graph is that blinding the test reduced the strength of preference for the first two and dislike for the third, enough that the four actually become remarkably similar to each other, preference-wise. In fact the error bars for A ,B,C,D all appear to overlap once the test is blinded.

A point *against* sighted testing.

You have said that there are perhaps at best a few hundred 'trained listeners' . I wanted to know when and how we are to know who is a 'trained listener' and what degree a faith we are expected to put in their sighted evaluations.

You are now adding the claim that the (undisputed) 'quite large' differences between speakers 'overpowers' cognitive bias when the listener 1) has 'no stake' (which strikes me as problematic: one element of cognitive bias is that it is not necessarily conscious; that the listener *feels* they are stake-free is not sufficient) and 2) is good at detecting impairments -- presumably either a native talent , or a product of....listener training?

I don't think clarity has been achieved here. And I don't give a tinker's damn for the excuse that the 'industry' takes shortcuts because it lacks infinite money etc...I want to know what the value to consumers is, of any given person'ts 'sighted' report of loudspeaker quality. If there are in fact only 'a few hundred if that' trained listeners in the world, the safe presumption is that a random person's sighted report, such as encountered endlessly in audiophile forums and publications, and of course loudspeaker marketing, is worthless.

Maybe it's easier to start with answering:
Do you consider Amir and Erin's reviews as 'random person's worthless sighted reports'? If not, is it because they have objective data?

And if they did not have the measurements, you'll discount them as worthless too, is that correct?

amirm · Aug 14, 2020

krabapple said:
You have said that there are perhaps at best a few hundred 'trained listeners' . I wanted to know when and how we are to know who is a 'trained listener' and what degree a faith we are expected to put in their sighted evaluations.

First, you don't have to put faith in me. You all have to decide what you put your faith in. For me, Sean said it best:

Sean Olive said:
In ranking objective/subjective measurements in terms of how reliable and trustworthy test they are I would say:
#1 A well-controlled double-blind listening test.
#2 Meaningful Objective Measurements that Predict #1
#3 A sighted listening test.

Shortly after I started to measure and listen to speakers, I realized the truth in the above list. That no matter how good I thought spinorama measurements were, it could not fully predict my preference. Simple things like how loud could play is not in there. Nor would that have been something that is tested in research. I put lower scores for exceptionally measuring Neumann KH80 DSP for that reason. Ditto for a Genelec. Both of these were sighted observations. Are you going to sit there and say my listening tests that caused me to hear static out of the Genelec when I turned it up is not valid and should have been performed blind?

In other areas such as distortion, you are now in my wheelhouse. I have shown my skills in this area many times including in challenges from likes of you and late Arny Krueger. That is my resume. There is no such proof for the listeners in Harman studies. Trained in that context meant tonality of speakers, not ability to hear very small impairments.

You have this incredulity about testing electronics sighted that while I understand and agree with, is being misapplied to speakers. Take this comment from Arny who as you know, advocated blind tests more than anybody:

Ironically I think he is wrong about the above. Individual acoustic products make such small difference that placebo plays a strong role in people thinking they are doing good. But we digress.

Bottom line: our measurements are powerful predictor of listener preference. They are unfortunately not quite perfect so I listen and provide my brief feedback. This does not at all shake the tree of blind tests being good and necessary. It simply completes a picture that needs completing. That's all.

The frailty of Sighted Listening Tests

Active Member

Addicted to Fun and Learning

Active Member

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Active Member

Active Member

Grand Contributor

Active Member

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Founder/Admin

Similar threads