The frailty of Sighted Listening Tests

Coach_Kaarlo · Aug 14, 2020

bobbooo said:
I think there’s been some repeated misunderstandings / misrepresentations of each side of the debate by the other on here. These are the two opposing hypotheses/claims under debate:

Experienced/trained listeners are no less susceptible to sighted bias than average

Experienced/trained listeners are less susceptible to sighted bias than average

Drs Toole and Olive’s study has been cited in support of claim 1. Below are the results from the paper that can be used to compare experienced listeners' preference ratings to the average for experienced and inexperienced listeners as a whole, for blind and sighted tests. Note: the speaker ratings are likely naturally compressed due to listeners' contraction bias (not using extreme ends of the scale), which is common in subjective evaluations. Rescaling the rating axis from 4/5 to 8/9 is simply done to make the data more readable, and visually correct for this contraction bias, so there's no conspiracy there.

Average for experienced and inexperienced listeners (same data as the first graph in Sean Olive’s blog often reproduced on here, but in a different format):
View attachment 77959

Experienced listeners (the more pertinent graph to this discussion, which I don't think has been discussed yet):
View attachment 77960

So for experienced and inexperienced listeners as a whole shown in the first graph, on average only the preference order of speakers S and T changed places between sighted and blind listening, but for experienced listeners only, the preference order completely changed - when listened sighted it was: D, G, T, S, whereas the preference order during blind listening was: S, D, T, G. The difference in score between the blind and sighted ratings given for the same speaker by the experienced listeners is also larger on average than this difference for all listeners. This suggests the experienced listeners were at least equally (if not more) affected by sighted bias than the average of all types of listeners.

The study also compared how sensitive the listeners were to changing acoustic variables in sighted and blind listening, in this case two speaker positions, 1 and 2.

Average for experienced and inexperienced listeners:
View attachment 77962

Experienced listeners:
View attachment 77963

Both graphs show speaker location had a strong influence on preference when blind, yet little effect when sighted, again showing that experienced listeners are just as affected by sighted bias as all listeners are, which in this case deafens ('blinds' ) them to actual acoustic changes caused by speaker positioning, which they recognised fine during actual blind listening. All these results support claim 1 above.

Now, from what I can tell, the two main objections to the study seem to be:

(a) The listeners’ bias is not representative of and much greater than Amir’s possible bias, due to them being Harman employees and three of the four speakers being Harman brands
(b) The study's definition of experienced/trained listener is too inclusive

Starting with objection (a), I think @preload made some great points here. Simply investing a large amount of money in, owning and very much liking a brand’s products and design philosophy can in itself foster a subconscious brand loyalty and so cognitive bias. Sure this would likely not be as much as the bias the Harman employees had for their own speakers, but there are all the other possible biases @Sean Olive mentioned that are still on the table and common possibilities to all sighted listening tests. Even if objection (a) is valid, and an extreme position is maintained that the only valid results are for the non-Harman speaker ‘T’, the last graph above showing experienced listener ratings does show a significant change in rating given for this speaker in ‘position 2’ between sighted and blind, shifting it from being ranked third sighted, to last blind, which notably runs counter to any possible bias against speaker T due to it being a rival brand, suggesting the remaining biases, that are common to all sighted listening tests, play a relatively large role. The graph above for all listeners shows the same shift in ranking of speaker T in position 2 from sighted to blind, and a similar change in rating (again less than for experienced listeners), echoing the results from the first two graphs of this post, again showing the experienced listeners were at least equally if not more affected by sighted bias, even when listening to a speaker they had no vested interest in.

So what about objection (b)? Here's how experienced/inexperienced listeners are defined in the study (my emphasis):

The bolded parts imply an experienced listener is one who has had at least critical listening experience and controlled listening test experience. This doesn't sound too inclusive to me. And even if it is, and doesn't meet the requirements of a 'highly experienced/trained' listener (whatever they are), it makes sense that this experience is a continuum of ability, which would mean at worst the study is suggestive evidence that even highly experienced listeners are also no less susceptible to sighted bias than others (claim 1). What scientific research is there in evidence of the opposing claim 2 at the beginning of this post, that experienced listeners are less susceptible to sighted bias? If there is none, then claim 1 is on stronger ground. If you take the extreme (and I'd say irrational) view that this study contains zero evidence for claim 1, then the two claims are on equal footing, and you should remain agnostic. The fact remains that claim 2 is a claim of exception however, that goes against not only this study, but cognitive science as well - I'm not aware of any scientific studies showing sighted biases can be noticeably reduced through knowledge of them and training. In fact, this would be a prime example of the (ridiculously named, but very real) G.I. Joe fallacy. When it comes to cognitive biases, knowing really isn't half the battle - in fact it's not even close:

It should be noted that as Sean said here, Harman now have a more exacting definition of a trained listener – passing level 8 or higher in their How to Listen software, with normal audiometric hearing, and showing good discrimination and consistency in their sound ratings. I believe Amir has said he reached level 5/6 (still much better than audio dealers who only passed level 3), and I presume ‘normal’ hearing precludes people with notable presbycusis that can start to become significant in terms of sound judgement variability after around age 50 (as Floyd Toole has humbly described with reference to his own hearing and I mentioned in this post). 'Normal hearing' would obviously also preclude those with notable NIHL which could occur due to such activities as, ahem, routinely listening to headphones at ‘earlobe resonating' volumes . Of course, Amir has specific training in identifying small lossy digital compression artefacts (I believe primarily via IEMs/headphones, speakers being notoriously harder to hear sound imperfections with), but the relevance of this specific skill to discerning differences in speakers’ acoustic attributes at normal listening volumes and distances, and to what extent if any this skill could balance out the high stipulations for a Harman trained listener above is debateable.

But the bigger picture here is that sighted bias is just the tip of the iceberg in terms of the nuisance variables needed to be controlled for listening tests to be useful in drawing reliable conclusions. Some of these have been controlled for here, but there are major exceptions in addition to standard sighted bias: measurement bias (from seeing the spinorama before listening), no level-matching, and no instantaneous A/B switching (instead mostly comparing speakers over days, weeks and months, relying on long-term auditory memory which is notoriously unreliable). And this isn’t even considering the fact that this is a single listener whose perceptions are not as generalizable as a collection of listeners, or any of the other methodological controls put in place in a scientifically controlled double-blind study Sean mentioned here. The gulf between those studies and the listening tests here really is huge.

Please note: this post is in no way either an attack on Amir, or a demand (or even a request) to change his listening methodology (this would obviously be impractical for one person and especially during a pandemic, and he's doing all of this for free so I would never demand anything). I don’t think anyone else is taking these positions either, and of course we are all incredibly grateful for the frankly mind-boggling amount of work he’s put in to this project. However, it has been claimed that the subjective impressions are ‘data’, from which conclusions can be drawn about the accuracy and validity of Sean Olive’s speaker preference rating formula. If this is the case, this necessitates the same analysis and scrutiny of the ‘measuring instrument’ and method of data collection as has been exacted on the Klippel NFS data. If this is objected to or ignored, then it's simply inconsistent and unscientific to maintain the subjective judgements are data, and not informal impressions (which is what they seemed to start out as, and personally I was fine with). I am also not saying the impressions have zero utility either - they can definitely point in interesting directions for fully controlled listening tests to investigate further. But any claims made by anyone that conclusions can be drawn about the validity of the preference formula from these impressions are not really tenable, as partially-controlled, sighted, single-listener tests are simply incongruent with the well-controlled, double-blind tests by hundreds of listeners the formula is based on.

This!

IMHO Well done on summarising the relevant and ignoring the irrelevant in one post.

Blumlein 88 · Aug 14, 2020

bobbooo said:
I think there’s been some repeated misunderstandings / misrepresentations of each side of the debate by the other on here. These are the two opposing hypotheses/claims under debate:

Experienced/trained listeners are no less susceptible to sighted bias than average

Experienced/trained listeners are less susceptible to sighted bias than average

Drs Toole and Olive’s study has been cited in support of claim 1. Below are the results from the paper that can be used to compare experienced listeners' preference ratings to the average for experienced and inexperienced listeners as a whole, for blind and sighted tests. Note: the speaker ratings are likely naturally compressed due to listeners' contraction bias (not using extreme ends of the scale), which is common in subjective evaluations. Rescaling the rating axis from 4/5 to 8/9 is simply done to make the data more readable, and visually correct for this contraction bias, so there's no conspiracy there.

Average for experienced and inexperienced listeners (same data as the first graph in Sean Olive’s blog often reproduced on here, but in a different format):
View attachment 77959

Experienced listeners (the more pertinent graph to this discussion, which I don't think has been discussed yet):
View attachment 77960

So for experienced and inexperienced listeners as a whole shown in the first graph, on average only the preference order of speakers S and T changed places between sighted and blind listening, but for experienced listeners only, the preference order completely changed - when listened sighted it was: D, G, T, S, whereas the preference order during blind listening was: S, D, T, G. The difference in score between the blind and sighted ratings given for the same speaker by the experienced listeners is also larger on average than this difference for all listeners. This suggests the experienced listeners were at least equally (if not more) affected by sighted bias than the average of all types of listeners.

The study also compared how sensitive the listeners were to changing acoustic variables in sighted and blind listening, in this case two speaker positions, 1 and 2.

Average for experienced and inexperienced listeners:
View attachment 77962

Experienced listeners:
View attachment 77963

Both graphs show speaker location had a strong influence on preference when blind, yet little effect when sighted, again showing that experienced listeners are just as affected by sighted bias as all listeners are, which in this case deafens ('blinds' ) them to actual acoustic changes caused by speaker positioning, which they recognised fine during actual blind listening. All these results support claim 1 above.

Now, from what I can tell, the two main objections to the study seem to be:

(a) The listeners’ bias is not representative of and much greater than Amir’s possible bias, due to them being Harman employees and three of the four speakers being Harman brands
(b) The study's definition of experienced/trained listener is too inclusive

Starting with objection (a), I think @preload made some great points here. Simply investing a large amount of money in, owning and very much liking a brand’s products and design philosophy can in itself foster a subconscious brand loyalty and so cognitive bias. Sure this would likely not be as much as the bias the Harman employees had for their own speakers, but there are all the other possible biases @Sean Olive mentioned that are still on the table and common possibilities to all sighted listening tests. Even if objection (a) is valid, and an extreme position is maintained that the only valid results are for the non-Harman speaker ‘T’, the last graph above showing experienced listener ratings does show a significant change in rating given for this speaker in ‘position 2’ between sighted and blind, shifting it from being ranked third sighted, to last blind, which notably runs counter to any possible bias against speaker T due to it being a rival brand, suggesting the remaining biases, that are common to all sighted listening tests, play a relatively large role. The graph above for all listeners shows the same shift in ranking of speaker T in position 2 from sighted to blind, and a similar change in rating (again less than for experienced listeners), echoing the results from the first two graphs of this post, again showing the experienced listeners were at least equally if not more affected by sighted bias, even when listening to a speaker they had no vested interest in.

So what about objection (b)? Here's how experienced/inexperienced listeners are defined in the study (my emphasis):

The bolded parts imply an experienced listener is one who has had at least critical listening experience and controlled listening test experience. This doesn't sound too inclusive to me. And even if it is, and doesn't meet the requirements of a 'highly experienced/trained' listener (whatever they are), it makes sense that this experience is a continuum of ability, which would mean at worst the study is suggestive evidence that even highly experienced listeners are also no less susceptible to sighted bias than others (claim 1). What scientific research is there in evidence of the opposing claim 2 at the beginning of this post, that experienced listeners are less susceptible to sighted bias? If there is none, then claim 1 is on stronger ground. If you take the extreme (and I'd say irrational) view that this study contains zero evidence for claim 1, then the two claims are on equal footing, and you should remain agnostic. The fact remains that claim 2 is a claim of exception however, that goes against not only this study, but cognitive science as well - I'm not aware of any scientific studies showing sighted biases can be noticeably reduced through knowledge of them and training. In fact, this would be a prime example of the (ridiculously named, but very real) G.I. Joe fallacy. When it comes to cognitive biases, knowing really isn't half the battle - in fact it's not even close:

It should be noted that as Sean said here, Harman now have a more exacting definition of a trained listener – passing level 8 or higher in their How to Listen software, with normal audiometric hearing, and showing good discrimination and consistency in their sound ratings. I believe Amir has said he reached level 5/6 (still much better than audio dealers who only passed level 3), and I presume ‘normal’ hearing precludes people with notable presbycusis that can start to become significant in terms of sound judgement variability after around age 50 (as Floyd Toole has humbly described with reference to his own hearing and I mentioned in this post). 'Normal hearing' would obviously also preclude those with notable NIHL which could occur due to such activities as, ahem, routinely listening to headphones at ‘earlobe resonating' volumes . Of course, Amir has specific training in identifying small lossy digital compression artefacts (I believe primarily via IEMs/headphones, speakers being notoriously harder to hear sound imperfections with), but the relevance of this specific skill to discerning differences in speakers’ acoustic attributes at normal listening volumes and distances, and to what extent if any this skill could balance out the high stipulations for a Harman trained listener above is debateable.

But the bigger picture here is that sighted bias is just the tip of the iceberg in terms of the nuisance variables needed to be controlled for listening tests to be useful in drawing reliable conclusions. Some of these have been controlled for here, but there are major exceptions in addition to standard sighted bias: measurement bias (from seeing the spinorama before listening), no level-matching, and no instantaneous A/B switching (instead mostly comparing speakers over days, weeks and months, relying on long-term auditory memory which is notoriously unreliable). And this isn’t even considering the fact that this is a single listener whose perceptions are not as generalizable as a collection of listeners, or any of the other methodological controls put in place in a scientifically controlled double-blind study Sean mentioned here. The gulf between those studies and the listening tests here really is huge.

Please note: this post is in no way either an attack on Amir, or a demand (or even a request) to change his listening methodology (this would obviously be impractical for one person and especially during a pandemic, and he's doing all of this for free so I would never demand anything). I don’t think anyone else is taking these positions either, and of course we are all incredibly grateful for the frankly mind-boggling amount of work he’s put in to this project. However, it has been claimed that the subjective impressions are ‘data’, from which conclusions can be drawn about the accuracy and validity of Sean Olive’s speaker preference rating formula. If this is the case, this necessitates the same analysis and scrutiny of the ‘measuring instrument’ and method of data collection as has been exacted on the Klippel NFS data. If this is objected to or ignored, then it's simply inconsistent and unscientific to maintain the subjective judgements are data, and not informal impressions (which is what they seemed to start out as, and personally I was fine with). I am also not saying the impressions have zero utility either - they can definitely point in interesting directions for fully controlled listening tests to investigate further. But any claims made by anyone that conclusions can be drawn about the validity of the preference formula from these impressions are not really tenable, as partially-controlled, sighted, single-listener tests are simply incongruent with the well-controlled, double-blind tests by hundreds of listeners the formula is based on.

I think this is a good post. You put a lot of effort into it.

I hate to say it, but it can be partly torpedoed with one thing. Experienced listeners in the data you used "Does Note Equal" trained listeners.

Does not make lots of difference to the principles of most of your ideas in the post, but it dilutes the strength upon which you can lean on the test results done by Harman. And that, as often is the case, is the biggest issue, a lack of good data directly on the subject.

I know some of it has been discussed in this now sprawling thread, but maybe it would be a good time to start a new thread to see if we can reach a consensus on the most practical and effective sighted evaluation guidelines for speakers. I believe the same guidelines would be the best for both inexperienced, experienced and trained listeners. I would think trained listeners will do the best in the end. We'll still lack the data about just how corruptible even trained listeners are to sighted bias.

To me the two big deals are level matching as well as it can be done with speakers (you can ever fully do this) and instantaneous switching with a second speaker. Preferably a highly rated reference speaker, but at least a 2nd speaker for a baseline reference. These often can pierce the veil of sighted bias with DACs or amps or gear which is much closer in performance than speakers are. I see no reason they aren't valuable as methods for comparing speakers.

bobbooo · Aug 14, 2020

Blumlein 88 said:
I think this is a good post. You put a lot of effort into it.

I hate to say it, but it can be partly torpedoed with one thing. Experienced listeners in the data you used "Does Note Equal" trained listeners.

I addressed that (maybe not fully) in the post. There's crossover between the two. I believe Amir has said most of his training has come from critical listening experience, which is word for word one of the stipulations for an 'experienced listener' from the study (and implicitly given primacy over the other stipulation of controlled listening test experience). So there's likely a continuum from experienced listener (= experienced critical and controlled test listener from the study's definition) to 'trained listener' (the latter not been given a concrete definition). What does have a concrete definition is the more recently delineated 'Harman trained listener', which specifically requires passing level 8 or above on How to Listen (among other things).

I go on to say that even if you reject this for some reason (I can't see a good one), there is still no acoustic scientific research supporting claim 2, so you should remain agnostic. But claim 2 does go against the cognitive science literature, whereas claim 1 is consistent with it, which tips the balance to the latter, even in this extreme view case.

And as I said, sighted bias is just one of the many nuisance variables not controlled for, which is the bigger picture here.

Racheski · Aug 14, 2020

I think you have some good points in this post, but overall your conclusion does not follow from your premise, and is invalid.

bobbooo said:
Experienced/trained listeners are no less susceptible to sighted bias than average

Experienced/trained listeners are less susceptible to sighted bias than average

bobbooo said:
"...any claims made by anyone that conclusions can be drawn about the validity of the preference formula from these impressions are not really tenable, as partially-controlled, sighted, single-listener tests are simply incongruent with the well-controlled, double-blind tests by hundreds of listeners the formula is based on."

amirm · Aug 14, 2020

Blumlein 88 said:
I hate to say it, but it can be partly torpedoed with one thing. Experienced listeners in the data you used "Does Note Equal" trained listeners.

I must have made this point fifty times. I even add the word "critical" to trained listeners yet folks keep mixing the two. It is like saying anyone who drives a car is "experienced" even though such experience brings with it no specific, expert qualification.

Blumlein 88 · Aug 14, 2020

amirm said:
I must have made this point fifty times. I even add the word "critical" to trained listeners yet folks keep mixing the two. It is like saying anyone who drives a car is "experienced" even though such experience brings with it no specific, expert qualification.

Nah! I think you've made the point at least 100 times.

amirm · Aug 14, 2020

bobbooo said:
I addressed that (maybe not fully) in the post. There's crossover between the two. I believe Amir has said most of his training has come from critical listening experience, which is one of the stipulations for an 'experienced listener' from the study (and implicitly given primacy over the other stipulation of controlled listening test experience). So there's likely a continuum from experienced listener to 'trained listener' (the latter not been given a concrete definition). What does have a concrete definition is the more recently delineated 'Harman trained listener', which specifically requires passing level 8 or above on How to Listen (among other things).

I go on to say that even if you reject this, there is still no acoustic scientific research supporting claim 2, so you should remain agnostic. But claim 2 does go against the cognitive scientific literature.

No. Really no. You don't have an intuitive sense for what I am talking about. I will give you two examples.

There was an organization called SDMI made up of all major music labels and tech companies like my employer, Microsoft. They put out a call for proposal for a watermarking system. The marking system has to embed its bits in the stream yet be inaudible. Warner Music provided a dozen 24-bit, 96 kHz studio masters as test content. Microsoft research wanted to propose a solution and claimed that they had achieved transparency. Due to my reputation in the company as a critical listener, and partnering with that team across many technologies, they came to me and said they wanted me to test to see if I could tell the mark was completely inaudible. They said none of their people could detect the mark. They gave me the files which were 3 to 4 minute marks. Did not tell me which file was the master, and which was marked.

It was a daunting ask as only a few bits plus error correction are inserted across millions of bits that make up the music. Not knowing which was the original and which was the modified file made the challenge even more complicated as there was no reference. This was the mother of all blind tests.

I sat there at first thinking it was an impossible task. But knowing how the algorithm worked, I zoomed into possible areas where it may cause audibility issues. After a bit, I managed to find a few milliseconds, a note or two that sounded different. I called the manager of Microsoft research team and told him I had found an audible difference. He said that was impossible. I gave him a demo and exact point in the track where the problem was. He goes back and they quickly find the issue, confirming that what I had found was indeed, objectively, a problem.

Another time we wanted to see if people could tell perceptually lossless audio from its original. This is where you use a lossy codec but you tell its perceptual model to make no compromise in fidelity. This typically yields 2:1 compression so a CD file goes from 1.4 mbit/sec to 700 kbps. And this was with our advanced codec which at far lower bit rates would get transparency. We sent out the blind test to large group of audiophiles at Microsoft. I usually took such tests but I was too busy. One day my codec/signal processing team manager comes to my office and says they are frustrated as the audiophiles were not able to tell much and he wanted me to take a listen. I told him I was too busy. But he insisted. So I said hang in there, give me the files. As he stood by my doorway, I listen and instantly find an audible problem in AB testing. He goes and investigates and indeed there was a bug causing that quality difference.

This, is what being a critical listener is about. Ability to focus so strongly on impairments that you find them with well above average ability. You will not get to this point by taking a few blind tests. That is "experience" in testing. It does not make a critical listener. It took me many months for my mind and ears just click into this mode. And then years of continuous testing.

Since then, I have taken and passed many online critical double blind tests. All required the specific skills I developed years ago. Granted, with age comes loss of high frequencies. And grumpiness in spending too much time on things.

And to be fair, I can also give you stories of being totally wrong in my testing at Microsoft. So error exists but it is not quantified in the few studies out there that didn't include people like me. All I know is that I and my trained critical listeners at Microsoft were highly effective in sighted nd testing.

At the end of the day, you can't know the limits or effectiveness of what I do. So best to take a back seat and not try to pontificate based on studies that I keep telling you does not read on this situation.

Thomas savage · Aug 14, 2020

Blumlein 88 said:
I think this is a good post. You put a lot of effort into it.

I hate to say it, but it can be partly torpedoed with one thing. Experienced listeners in the data you used "Does Note Equal" trained listeners.

Does not make lots of difference to the principles of most of your ideas in the post, but it dilutes the strength upon which you can lean on the test results done by Harman. And that, as often is the case, is the biggest issue, a lack of good data directly on the subject.

I know some of it has been discussed in this now sprawling thread, but maybe it would be a good time to start a new thread to see if we can reach a consensus on the most practical and effective sighted evaluation guidelines for speakers. I believe the same guidelines would be the best for both inexperienced, experienced and trained listeners. I would think trained listeners will do the best in the end. We'll still lack the data about just how corruptible even trained listeners are to sighted bias.

To me the two big deals are level matching as well as it can be done with speakers (you can ever fully do this) and instantaneous switching with a second speaker. Preferably a highly rated reference speaker, but at least a 2nd speaker for a baseline reference. These often can pierce the veil of sighted bias with DACs or amps or gear which is much closer in performance than speakers are. I see no reason they aren't valuable as methods for comparing speakers.

https://www.audiosciencereview.com/...ted-evaluation-guidelines-for-speakers.15310/

Inner Space · Aug 14, 2020

valerianf said:
Well cars are tested and rated by professionals without any way of doing a blind test!
What is important:
1) The tester abilities.
2) The tester equipment.
3) The tester independence.
That is all what is required.

You're absolutely right. But 3) is hugely complicated. Deep down it might be insoluble. Therefore around and around we go.

Thomas savage · Aug 14, 2020

The crux of some of the issues folks have here imo boils down to their reliance and value of objective measurements and how they feel when Amirm presents his subjective appraisal alongside in his speaker reviews while seemingly assigning the same weight to it. I say ' seemingly ' as just by dint of the fact they are there under our banner makes it seem like they are ' objective ' .

Of course we should just apply caution to this part of the review and I think we all do but somehow there's been a lot of misunderstanding miscommunication and a issue has been created where really there isn't one.

Sighted listening is flawed, no matter how trained you are it's not 100% reliable. However in the case of speakers it can have value ( more to some , less to others ) but we need to minimise the drawbacks and maximise this ' value ' . We also need to put in a qualification to clearly express the subjective part of the review is not presented with the same weight as the measurements.

We can discuss this here https://www.audiosciencereview.com/...ted-evaluation-guidelines-for-speakers.15310/

Rusty Shackleford · Aug 14, 2020

bobbooo said:
I think there’s been some repeated misunderstandings / misrepresentations of each side of the debate by the other on here. These are the two opposing hypotheses/claims under debate:

Experienced/trained listeners are no less susceptible to sighted bias than average

Experienced/trained listeners are less susceptible to sighted bias than average

Drs Toole and Olive’s study has been cited in support of claim 1. Below are the results from the paper that can be used to compare experienced listeners' preference ratings to the average for experienced and inexperienced listeners as a whole, for blind and sighted tests. Note: the speaker ratings are likely naturally compressed due to listeners' contraction bias (not using extreme ends of the scale), which is common in subjective evaluations. Rescaling the rating axis from 4/5 to 8/9 is simply done to make the data more readable, and visually correct for this contraction bias, so there's no conspiracy there.

Average for experienced and inexperienced listeners (same data as the first graph in Sean Olive’s blog often reproduced on here, but in a different format):
View attachment 77959

Experienced listeners (the more pertinent graph to this discussion, which I don't think has been discussed yet):
View attachment 77960

So for experienced and inexperienced listeners as a whole shown in the first graph, on average only the preference order of speakers S and T changed places between sighted and blind listening, but for experienced listeners only, the preference order completely changed - when listened sighted it was: D, G, T, S, whereas the preference order during blind listening was: S, D, T, G. The difference in score between the blind and sighted ratings given for the same speaker by the experienced listeners is also larger on average than this difference for all listeners. This suggests the experienced listeners were at least equally (if not more) affected by sighted bias than the average of all types of listeners.

The study also compared how sensitive the listeners were to changing acoustic variables in sighted and blind listening, in this case two speaker positions, 1 and 2.

Average for experienced and inexperienced listeners:
View attachment 77962

Experienced listeners:
View attachment 77963

Both graphs show speaker location had a strong influence on preference when blind, yet little effect when sighted, again showing that experienced listeners are just as affected by sighted bias as all listeners are, which in this case deafens ('blinds' ) them to actual acoustic changes caused by speaker positioning, which they recognised fine during actual blind listening. All these results support claim 1 above.

Now, from what I can tell, the two main objections to the study seem to be:

(a) The listeners’ bias is not representative of and much greater than Amir’s possible bias, due to them being Harman employees and three of the four speakers being Harman brands
(b) The study's definition of experienced/trained listener is too inclusive

Starting with objection (a), I think @preload made some great points here. Simply investing a large amount of money in, owning and very much liking a brand’s products and design philosophy can in itself foster a subconscious brand loyalty and so cognitive bias. Sure this would likely not be as much as the bias the Harman employees had for their own speakers, but there are all the other possible biases @Sean Olive mentioned that are still on the table and common possibilities to all sighted listening tests. Even if objection (a) is valid, and an extreme position is maintained that the only valid results are for the non-Harman speaker ‘T’, the last graph above showing experienced listener ratings does show a significant change in rating given for this speaker in ‘position 2’ between sighted and blind, shifting it from being ranked third sighted, to last blind, which notably runs counter to any possible bias against speaker T due to it being a rival brand, suggesting the remaining biases, that are common to all sighted listening tests, play a relatively large role. The graph above for all listeners shows the same shift in ranking of speaker T in position 2 from sighted to blind, and a similar change in rating (again less than for experienced listeners), echoing the results from the first two graphs of this post, again showing the experienced listeners were at least equally if not more affected by sighted bias, even when listening to a speaker they had no vested interest in.

So what about objection (b)? Here's how experienced/inexperienced listeners are defined in the study (my emphasis):

The bolded parts imply an experienced listener is one who has had at least critical listening experience and controlled listening test experience. This doesn't sound too inclusive to me. And even if it is, and doesn't meet the requirements of a 'highly experienced/trained' listener (whatever they are), it makes sense that this experience is a continuum of ability, which would mean at worst the study is suggestive evidence that even highly experienced listeners are also no less susceptible to sighted bias than others (claim 1). What scientific research is there in evidence of the opposing claim 2 at the beginning of this post, that experienced listeners are less susceptible to sighted bias? If there is none, then claim 1 is on stronger ground. If you take the extreme (and I'd say irrational) view that this study contains zero evidence for claim 1, then the two claims are on equal footing, and you should remain agnostic. The fact remains that claim 2 is a claim of exception however, that goes against not only this study, but cognitive science as well - I'm not aware of any scientific studies showing sighted biases can be noticeably reduced through knowledge of them and training. In fact, this would be a prime example of the (ridiculously named, but very real) G.I. Joe fallacy. When it comes to cognitive biases, knowing really isn't half the battle - in fact it's not even close:

It should be noted that as Sean said here, Harman now have a more exacting definition of a trained listener – passing level 8 or higher in their How to Listen software, with normal audiometric hearing, and showing good discrimination and consistency in their sound ratings. I believe Amir has said he reached level 5/6 (still much better than audio dealers who only passed level 3), and I presume ‘normal’ hearing precludes people with notable presbycusis that can start to become significant in terms of sound judgement variability after around age 50 (as Floyd Toole has humbly described with reference to his own hearing and I mentioned in this post). 'Normal hearing' would obviously also preclude those with notable NIHL which could occur due to such activities as, ahem, routinely listening to headphones at ‘earlobe resonating' volumes . Of course, Amir has specific training in identifying small lossy digital compression artefacts (I believe primarily via IEMs/headphones, speakers being notoriously harder to hear sound imperfections with), but the relevance of this specific skill to discerning differences in speakers’ acoustic attributes at normal listening volumes and distances, and to what extent if any this skill could balance out the high stipulations for a Harman trained listener above is debateable.

But the bigger picture here is that sighted bias is just the tip of the iceberg in terms of the nuisance variables needed to be controlled for listening tests to be useful in drawing reliable conclusions. Some of these have been controlled for here, but there are major exceptions in addition to standard sighted bias: measurement bias (from seeing the spinorama before listening), no level-matching, and no instantaneous A/B switching (instead mostly comparing speakers over days, weeks and months, relying on long-term auditory memory which is notoriously unreliable). And this isn’t even considering the fact that this is a single listener whose perceptions are not as generalizable as a collection of listeners, or any of the other methodological controls put in place in a scientifically controlled double-blind study Sean mentioned here. The gulf between those studies and the listening tests here really is huge.

Please note: this post is in no way either an attack on Amir, or a demand (or even a request) to change his listening methodology (this would obviously be impractical for one person and especially during a pandemic, and he's doing all of this for free so I would never demand anything). I don’t think anyone else is taking these positions either, and of course we are all incredibly grateful for the frankly mind-boggling amount of work he’s put in to this project. However, it has been claimed that the subjective impressions are ‘data’, from which conclusions can be drawn about the accuracy and validity of Sean Olive’s speaker preference rating formula. If this is the case, this necessitates the same analysis and scrutiny of the ‘measuring instrument’ and method of data collection as has been exacted on the Klippel NFS data. If this is objected to or ignored, then it's simply inconsistent and unscientific to maintain the subjective judgements are data, and not informal impressions (which is what they seemed to start out as, and personally I was fine with). I am also not saying the impressions have zero utility either - they can definitely point in interesting directions for fully controlled listening tests to investigate further. But any claims made by anyone that conclusions can be drawn about the validity of the preference formula from these impressions are not really tenable, as partially-controlled, sighted, single-listener tests are simply incongruent with the well-controlled, double-blind tests by hundreds of listeners the formula is based on.

This is an excellent, thorough overview of the issues. However, it’s clear that Amir is going to continue to insist that other humans “can't know the limits or effectiveness of what [he can] do” (i.e. that even Harman cannot train listeners as good as him, performance on How to Listen or other tests be damned). He believes he is categorically excluded from any and all studies, since his skills are wholly unparalleled. So essentially we are in an amorphous “golden ears” debate in which the golden ears belong to one person (and one person alone) who cannot be questioned. Given that, I don’t know if there’s any point in any of us continuing to engage in this dialogue. But your post is a wonderful contribution nonetheless, and I don’t think you deserved his condescending reply.

preload · Aug 14, 2020

Racheski said:
I think you have some good points in this post, but overall your conclusion does not follow from your premise, and is invalid.

Agree. It starts off arguing that being a trained listener does not confer an accuracy advantage when providing sighted listening impressions. But it ends with the claim that no sighted listening evaluation is valid. I want to respond but it's just too long.

whazzup · Aug 14, 2020

Rusty Shackleford said:
This is an excellent, thorough overview of the issues. However, it’s clear that Amir is going to continue to insist that other humans “can't know the limits or effectiveness of what [he can] do” (i.e. that even Harman cannot train listeners as good as him, performance on How to Listen or other tests be damned). He believes he is categorically excluded from any and all studies, since his skills are wholly unparalleled. So essentially we are in an amorphous “golden ears” debate in which the golden ears belong to one person (and one person alone) who cannot be questioned. Given that, I don’t know if there’s any point in any of us continuing to engage in this dialogue. But your post is a wonderful contribution nonetheless, and I don’t think you deserved his condescending reply.

I do not think that's what Amir is saying in his posts. Is it possible your bias is causing you to read everything he writes in a negative way?

As mentioned by others:
1. Is an 'Experienced listener' in the quoted study the same as a 'critical listener working in Harman' (defined as a person specifically trained and depended upon on the task of speaker/audio-related evaluation and fault finding?

2a. Significance of question 1: If experienced listener (in the study) IS a critical listener (level 8 listening and so on) and found to be as worthless as an untrained listener, does it mean that Harman can act on the research and dismantle its training and employment of critical listeners, thereby saving hundreds of thousands of dollars (super vague estimate)? So you're saying a critical listener is a job that is not needed?

2b. If experienced listener (in the study) IS NOT a critical listener, should the conclusion then be that the quoted study is not relevant to the discussion when applied to a 'critical listener'?

Sean Olive said:
I think my opinion stated here is much the same opinion I held when I wrote the blog posting. I do think sighted tests have their role in audio, and some are more useful than others when the right controls are in place.

A trained listener can provide useful data about the spectral/spatial/dynamic/distortion attributes of the product that an untrained listener would have difficulty providing.

3. And do you disagree with Olive's comment then? (*removed his last sentence as it's not relevant)

valerianf · Aug 14, 2020

Buying choice is always irrational.
But if Amir in one of his speaker test says that the listening session was negative, I will consider to go myself listening to the speaker.
Negative note coming from a specialist needs to be considered seriously.

Rusty Shackleford · Aug 14, 2020

whazzup said:
I do not think that's what Amir is saying in his posts. Is it possible your bias is causing you to read everything he writes in a negative way?

As mentioned by others:
1. Is an 'Experienced listener' in the quoted study the same as a 'critical listener working in Harman' (defined as a person specifically trained and depended upon on the task of speaker/audio-related evaluation and fault finding?

2a. Significance of question 1: If experienced listener (in the study) IS a critical listener (level 8 listening and so on) and found to be as worthless as an untrained listener, does it mean that Harman can act on the research and dismantle its training and employment of critical listeners, thereby saving hundreds of thousands of dollars (super vague estimate)? So you're saying a critical listener is a job that is not needed?

2b. If experienced listener (in the study) IS NOT a critical listener, should the conclusion then be that the quoted study is not relevant to the discussion when applied to a 'critical listener'?

3. And do you disagree with Olive's comment then? (*removed his last sentence as it's not relevant)

I think @bobbooo did a thorough job of addressing these issues. However, you’re adding the conclusion that if trained listeners can’t perform as well sighted as blind then the training is of no value and, further, sighted listening is of no value. That’s a straw man. No one is arguing that. This isn’t a binary choice between “perfect” and “worthless.”

whazzup · Aug 14, 2020

Rusty Shackleford said:
I think @bobbooo did a thorough job of addressing these issues. However, you’re adding the conclusion that if trained listeners can’t perform as well sighted as blind then the training is of no value and, further, sighted listening is of no value. That’s a straw man. No one is arguing that. This isn’t a binary choice between “perfect” and “worthless.”

So what is YOUR interpretation of the role of a critical listener and how much weight do you place on their sighted evaluations?

Sure, you can disregard my questions 2a/2b, but still, would you care to address questions 1 and 3? Your interpretation of the study and Olive's words, of course.

solderdude · Aug 14, 2020

In a nutshell:
All studies clearly show sighted tests differ from non-sighted for all listeners. As S. Olive said each have their place.
Sight (or knowing what is playing) is a form of bias that can be taken away but is not always easy. It changes the way we evaluate.
Aside from sight/knowing there are other factors that also determine the outcome of any test (sighted and blind)
Speakers/headphones are not equal to electronics when it comes to testing with regard to actual FR differences. removing sight/knowledge and creating a level playing field are easy to do. Adding sight/knowing what is playing is not improving accuracy.
Trained listeners can be more useful for debugging/improving things than untrained ones (not knowing what to listen for.
Untrained listeners have a preference and can be used for finding out what preferences are as a target for products that are well liked (commercial purposes)
Opinions are like a****s, everybody has one.
You either value someones subjective opinion or you don't.
Most people trust their own subjective findings over other opinions (or science). It's what they hear and is truth to them.
All good for home listening. When you are happy with your setup even if it deviates from what research says then I see no reason not to enjoy that. Even when others disagree.

whazzup · Aug 14, 2020

whazzup said:
I do not think that's what Amir is saying in his posts. Is it possible your bias is causing you to read everything he writes in a negative way?

As mentioned by others:
1. Is an 'Experienced listener' in the quoted study the same as a 'critical listener working in Harman' (defined as a person specifically trained and depended upon on the task of speaker/audio-related evaluation and fault finding?

2a. Significance of question 1: If experienced listener (in the study) IS a critical listener (level 8 listening and so on) and found to be as worthless as an untrained listener, does it mean that Harman can act on the research and dismantle its training and employment of critical listeners, thereby saving hundreds of thousands of dollars (super vague estimate)? So you're saying a critical listener is a job that is not needed?

~~2b. If experienced listener (in the study) IS NOT a critical listener, should the conclusion then be that the quoted study is not relevant to the discussion when applied to a 'critical listener'?~~

3. And do you disagree with Olive's comment then? (*removed his last sentence as it's not relevant)

@bobbooo @Coach_Kaarlo Same question as I had asked Rusty. Feel free to answer the questions 1 and 3 from your point of view, while ignoring question 2 (or answer it if you want to). Trying to understand if there's actually any common ground.

Coach_Kaarlo · Aug 14, 2020

whazzup said:
@bobbooo @Coach_Kaarlo Same question as I had asked Rusty. Feel free to answer the questions 1 and 3 from your point of view, while ignoring question 2 (or answer it if you want to). Trying to understand if there's actually any common ground.

Hi, thanks for the question.

Question 1. My (cheeky) response would be to suggest reading https://en.wikipedia.org/wiki/Bias and see how many are observable in Amirm's recent response.

My opinion is that bias is ALWAYS present (to give Amrim a break - not his fault), and that removing the sighted portion of evaluation consistently produces a more accurate result - ALWAYS. Out of interest, if I understand the latest neuroscience, removing visual input (eyes closed) also allows for greater acuity in other senses - particularly hearing. Oliver Sacks also talks about this I think?

As far as experienced versus critical - I am of the opinion this is largely irrelevant when compared with the differences between sighted and blind evaluation. Perhaps to make your point stronger I could better define the criteria being listened for, and maybe this would serve to differentiate between experienced and critical. For example critical = out of phase versus experienced = tonally bright? Maybe this is what you are thinking?

I guess it surprises me that we even have to have a discussion regarding sighted v blinded testing in 2020. And for that matter what the definition of a measurement or a scientific method is....

A measurement or test is valid when it can be repeated and produce the same result. Otherwise it is merely a professional opinion and must be weighed against our own personal framework (bias) of evaluation / criteria - regardless of any qualifications that say experienced or critical or whatever.

Question 3. I agree with Dr Sean Olive's entire comment, and the intent, meaning, and tone obtained when read in whole.

whazzup · Aug 14, 2020

Coach_Kaarlo said:
My opinion is that bias is ALWAYS present (to give Amrim a break - not his fault), and that removing the sighted portion of evaluation consistently produces a more accurate result - ALWAYS.

As far as experienced versus critical - I am of the opinion this is largely irrelevant when compared with the differences between sighted and blind evaluation. Perhaps to make your point stronger I could better define the criteria being listened for, and maybe this would serve to differentiate between experienced and critical. For example critical = out of phase versus experienced = tonally bright? Maybe this is what you are thinking?

Thanks for answering!
The point of Question 1 is simply whether experienced listener / critical listener is one and the same. To me it is not, to you it is?

Yes, let's assume bias is always present. And blind testing trumps sighted testing in accuracy, no one doubts that too.
Under the 2 assumptions above, can a 'critical listener' still dispense his work duties *(with sighted testing, just so it's clear. Blind testing is used when required)?

Corporations think they can, hence they're still being trained, paid and relied upon.
But in your opinion, they (or their professional opinions when doing sighted testing) cannot be depended upon because everyone has bias? So they HAVE to do blind testing for their professional opinions to have any weight at all?
*edited to hopefully make the question clearer

Coach_Kaarlo said:
A measurement or test is valid when it can be repeated and produce the same result. Otherwise it is merely a professional opinion and must be weighed against our own personal framework (bias) of evaluation / criteria - regardless of any qualifications that say experienced or critical or whatever.

No arguments on that. It is up to the individual to read the data and listen to opinions and eventually make his own decisions.
So Amir gives a professional opinion, it's up to the individual to listen, or not. No arguments too I hope?

Coach_Kaarlo said:
Question 3. I agree with Dr Sean Olive's entire comment, and the intent, meaning, and tone obtained when read in whole.

If you're referring to the sentences I omitted for brevity, sure.

The frailty of Sighted Listening Tests

Active Member

Grand Contributor

Major Contributor

Major Contributor

Founder/Admin

Grand Contributor

Founder/Admin

Grand Contributor

Major Contributor

Grand Contributor

Active Member

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Grand Contributor

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Similar threads