The frailty of Sighted Listening Tests

preload · Aug 8, 2020

SoonHappy said:
I swear I am like a speaker magnet, the more I get rid of the more that show up, wish it was women....

Perhaps this will help:

Deleted member 17820 · Aug 9, 2020

preload said:
With this hypothetical scenario, is your point that sighted tests are superior to blinded tests in determining the overall enjoyment from a speaker? If so, I would agree that the answer is yes, in some situations. For instance, if a loudspeaker is so ugly that your spouse makes you cover it up with a bedsheet everytime you listen to it, then yeah, that's going to affect your enjoyment. However, the question we're trying to answer with blinded tests is not overall enjoyment. The question is how preferred is the loudspeaker based JUST on its sound quality and nothing else (to the extent possible). Two different questions, two different methods to answer them.

Good question, I guess this is my point, I have 3 systems in my house.

1. Home theater
2. Desktop
3. 2 channel "reference"/vinyl

For my home theater system I want "black boxes" that sound great and will not distract all all from movies/tv
For my 2 channel system I want speakers that visually stand out and make some kind of statement!
For my desktop I wish I could have a switch for "black boxes that sound good" to bright neon red, like those sony party speakers with all the flashing rgb. hahaha

This is why I never understood mixing HT with audiophile looks. But that's just me

PS, as I said "blind" preference is ONLY valid and relative to how the speaker is going to be used, IMHO

Deleted member 17820 · Aug 9, 2020

Putter said:
Sometimes I think that may be the audiophile curse which is that you KNOW that your equipment isn't as good as the reviewers/friends/dealers/etc. and use your cognitive biases to find problems that aren't there, like power cords, bad electrical supply, cheap interconnects and speaker cables or theoretical problems like the amp uses too much feedback or 'stair-step' digital conversion.

I realize that in most cases these 'problems' don't exist or are easily fixed like better gauge speaker cables or better shielded interconnects, but it does tend to lead one down the rabbit hole of upgrading and expecting an improvement so one hears it until you read a review that says this cable/amp/power conditioner/..... is better and gives a blacker background to the music. I've rambled a bit OT, but the natural skeptic/scientist in me always doubted the truth of the golden eared audiophile, but not enough to ignore them.

I have many years as a live sound A1, and I can tell you that for me, there is a switch I need to throw, soundcheck and the first few songs, ALL I am doing is listening for problems, trying to fix them, and shaping the sounds, THEN I have to flip a switch and go from listening critically to enjoying the sound I just created. The problem I have with some audiphile perspectives is they never want to enter the enjoyment phase and just want to stay in the "critical" listening phase.

krabapple · Aug 9, 2020

amirm said:
That's absurd. No one has taken that position. I don't even do listening tests on bulk of electronics.

The issue is for speakers.

Whatever. Restrict 'audio gear' to 'loudspeakers', then. 'Audio gear' versus 'loudspeakers' was not at all the point of what I wrote.

It has been proposed that sighted evaluation (which IME is not acceptable protocol for research, for well-known reasons), can yield useful data so long as *trained listeners* are doing the evaluating.

Which begs the question of what constitutes trustworthy 'training'. Is there a consensus on it? So that we, confronted with a claim about audio from someone, can evaluate whether they are a 'trained listener'?

That was the point of what I wrote.

NB, data from trained listeners in Harman tests are still gathered under...blind conditions. For publication, at least.

krabapple · Aug 9, 2020

SoonHappy said:
I just found ASR while ago and have seen a war of sorts online, I have even been told on this forum to "go away" and accused of being a subjectavist. HAHAHA
I wish people could just see that everything is a data point. blind listening, sighted listening, mono, stereo, all just data points. More data points may help you or they may not.

Do not confuse 'anecdote' with 'data'. It often is just noise. And generally no way to be sure, that it isn't...unless it's tested further under controlled conditions.

amirm · Aug 9, 2020

krabapple said:
It has been proposed that sighted evaluation (which IME is not acceptable protocol for research, for well-known reasons), can yield useful data so long as *trained listeners* are doing the evaluating.

Whether you do a test blind or sighted, there is no guarantee of correctness. Every test has a margin of error. Turn a sub on and off in your room. Do you need a double blind test to trust what it does in your room? No. A blind test would generate the same as sighted test.

Sightest tests have higher error rate than blind tests, all else being equal. But they are also extremely fast and low effort. For this reason, the industry uses them as quick tests and then perform occasional double blind test as a backstop. No different than a lab test at a doctor's office which is quick but has lower accuracy than one sent to an external lab.

Again, look at Olive research:

Ranking of speakers G, D and T did not change in sighted versus blind. Only speaker S changed.

Best engineering is about optimization and getting 90% right for 10% of the effort/cost. We are not purists here with infinite budget and time to double blind tests speakers.

If we could show sighted tests to be mostly wrong as they are in electronics, then sure, we would not attempt it. But reality is that difference between speakers is quite large and is able to overpower listener bias when said listener a) has no stake in the outcome and b) has better thresholds of detections of impairments.

As I have said, we do this in the industry all the time where the outcome really matters. Jobs and company reputations are at stake. Yet we do it because the risk/reward is appropriate.

patate91 · Aug 9, 2020

What the industries do and what they should do is a reason why Dr Olive wrote the article and did experiments about this.

"In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. "

amirm · Aug 9, 2020

patate91 said:
What the industries do and what they should do is a reason why Dr Olive wrote the article and did experiments about this.

No, he did the study because the people who sold audio, marketed them and designed them, had no use for controlled testing of any kind, or the job he was going to do at the company. It was personal for him to demonstrate that these people were not qualified to make critical decisions about fidelity of their speakers relative to the competition.

"In other words, if you want to obtain an accurate and reliable measure of how the audio product truly sounds, the listening test must be done blind. It’s time the audio industry grow up and acknowledge this fact, if it wants to retain the trust and respect of consumers. "

That's totally consistent with what I explained. Everyday testing and evaluation is performed sighted. Then at the end of the process you perform double blind tests.

Companies don't audit their books everyday. Software developers don't run full release testing every day. When you go to your doctor for a cold, you don't an get an MRI and CAT scan to rule out brain cancer. You do what is expeditious and necessary at the time.

Again, Sean has said he runs sighted tests:

The case is clear and closed. Don't rehash it over and over again with no additional insight, first hand experience, or appreciation for what has been explained already.

Lbstyling · Aug 9, 2020

amirm said:
Klippel is not science. It is a set of measurements. Those measurements are very difficult to interpret as "buy don't buy" against countless other speakers with similar looking measurements. 1 dB peak at 600 Hz is not the same as 1 dB peak at 1.5 kHz. Yet the score may be identical. We need to bridge that gap so people can purchase speakers without listening to them which is the norm today.

If we had a scoring system where we could all stand behind it so much that if it said speaker a is better than b, that would be the "truth," then sure, I would not need to do listening tests. But we are not there. Scoring system is like a compass that shows you north. It is not a turn by turn navigation system for driving in the city.

Also, when I first started to do measurements, people kept asking me what I recommend. I refused to say so. We had a bunch of debate threads about them. Eventually I got tired of answering those questions in private and in public and added the recommendations. That has proven to be hugely popular and rarely controversial. Today, I cannot, without listening to a speaker, give such recommendations. So as much work and aggravation it has turned out to be, I listen and provide this as a factor in my recommendation.

And no, not all "human beings" are the same. Which one of you has been exposed to nearly 80 speakers in the last 7 months where you could compare and correlate measurements to what you could hear? The answer is none. In other words, I am not situated like any of you. There are many things that apply to you that don't apply to me and vice versa. We rely on informed opinion of experts in real life all the time. Not sure why it is such a big deal to follow the same in audio.

I would be fascinated to read a review by yourself of a otherwise excellently measuring smooth FR speaker with narrow despertion. (The head in a vice type lol

) Alas, they are few and far between!

Wes · Aug 9, 2020

Lbstyling said:
I would be fascinated to read a review by yourself of a otherwise excellently measuring smooth FR speaker with narrow despertion.

or narrow desperation - it might add to the drama in the music

Inner Space · Aug 9, 2020

Wes said:
or narrow desperation - it might add to the drama in the music

Or quiet desperation - it's the English way.

MattHooper · Aug 9, 2020

This has actually been a very interesting and often informative thread, especially hashing out various member's take on sighted listening (I'm sticking with speakers here).

Most interesting to me has been the general support for the (limited) use of sighted testing. I've been making that case for a while, mostly because
there has been such an emphasis on blind testing/measurements on the site, and to the degree some members have held a strict line even on speaker testing. There has been an especially (generally speaking) harsh attitude towards subjective reviewing. "Subjective reviews are useless to me - unless done blind. Just give me the measurements!"

And, again, I have tried the make a case for the limited usefulneess of sighted listening (speakers) and even to a degree subjective reviews.

amirm said:
Ranking of speakers G, D and T did not change in sighted versus blind. Only speaker S changed.

Yes, that's one of the issues I've brought up before. And I noticed again just recently when I revisited Floyd Toole's talk in which he puts up slides for sighted vs blinded speaker ratings. The actual ranking of the speakers remained pretty constant. They tended to vary by degree, though.

Now, as applied to the subjective review crowd in general, I can see a protest "We are fine provisionally accepting sighted speaker impressions from Amir (a "trained listener") but forget about the stereophile/absolute sound, untrained audiophiles and all the other riff-raff. That stuff is useless."

Except I don't find it to be useless. When I encounter or audition a new speaker I often look up what reviewers and other audiophiles are saying about it. I don't care much about reading someone's emotional reaction "I was swept away...blah, blah..." I pay attention to whether the person is characterizing the sound, and how well they do it. When I see a consensus happening on the general character of a speaker, I most often find it to be "accurate" to my own impressions of that speaker. (Sometimes this is when I've heard the speaker after being intrigued by the descriptions/reviews I'm reading, but often enough I'm looking for these subjective impressions after I first heard the speaker and formed my own impression). When I find a reviewer who seems to be pretty accurate in this way and/or whose tastes I have divined over time, I can find their subjective take on a speaker to be somewhat informative.

Another reason I've seen for dismissing the subjective reviewers is "Look, this guy thinks he hears differences in cables and tweaks, that automatically shows me he is not a trustworthy listener, so I put no stock in what he says about speakers either!"

Now, I can see how someone who wants as much certainty as possible in a review may use that logic to just avoid the audio press's subjective reviews. However, I don't necessarily go down that road myself. Why?

Because the fact someone can be fooled in to hearing imaginary differences does not entail he can not detect and describe real sonic differences.
If that logic were wrong, then every single person here and every human on earth could not rely on ANYTHING he/she hears as being accurate or informative at all. In fact we could not rely AT ALL on our perception using any sense at all. This is because we know that any human being is susceptible to some level of and forms of bias. That's why even researchers themselves would blind themselves in a trial (or double blind a trial).
This is why for instance even trained-listener Amir has done so many blind tests (codecs etc). When looking for possibly very subtle differences that you don't yet know even exist, just being a human being and knowing anything about science, you are going to control for your bias effects.

But clearly our senses including our hearing are reasonably reliable for identifying *real* and distinct sonic differences. The same reviewer who may be fooled into perceiving a difference between AC cables will surely score perfectly in a blind test identifying between his mother and father's speaking voice.

Similarly, it does not follow from the fact a reviewer fell to some sighted bias effect in hearing differences between cables that this reviewer has not therefore correctly identified real sonic differences that exist between speakers, and aptly described those differences.

If Amir were doing a listening test to discern possible sonic differences between audio cables or AC cables, I'd expect him to use blind listening controls, because like anyone else the very act of trying to discern a difference can make us percieve a difference. But here we are saying we can have some level of trust in Amir's subjective speaker impressions, even THOUGH Amir would be susceptible to bias effects testing cables without controls, because....speakers DO sound different, and often different enough to reliably detect and characterize. For this same reason, I don't completely dismiss the findings of subjective reviewers who didn't use controls for cables, but who describe the sound of different speakers.

(Which, again, is not just carte blanche that every subjective review is accurate...only the reason I can still find use in the subjective reviewing community, and even the audiophile community, for sighted speaker reviews. Even when they are not Amir

).

Sean Olive · Aug 10, 2020

richard12511 said:
In the Harman blind speaker tests, do they allow you to go back and change the ratings you gave to speakers after you hear subsequent speakers?

Yes. A typical test consists of 8 trials where the order of speakers and program are randomized. In each trial, the listener can switch among the different speakers as many times as they like, changing their scores as they see fit. Once they are satisfied with their scores, comments and any other scales we include (spectral balance, distortion, etc) they hit a button (DONE) and move on to the next trial. One of the eight trials is a repeat so we get a measure of how consistent each listener rates the speakers for each program.

Sean Olive · Aug 10, 2020

patate91 said:
Yes it also means that untrained listeners have the same "authority" about preferences.

Let's not forget that those preference things are generalisation and are very usefull for companies that want to make profit. I guess it should not prevent people to try different ways of enjoying the audio experience.

What audio companies do NOT want to make a profit?? Only the ones that won't be in business for very long

Doing rigorous blind testing of products costs a lot of time and money in R&D. Many audio companies would argue these tests cost too much money and reduce profit.

Also, if we only cared about profit would we publish our research so that other companies can use it freely to improve their products?. Would we help create loudspeaker measurement methods and standards that are used on this site to measure our loudspeakers and competitors so that consumers have better data to make informed purchase decisions?

Certainly we believe science & research are necessary to stay competitive over the long haul, but part of what we give back to the industry is not just motivated by profit, but to help raise the standards and bar of the industry. In the long run, that is good for the industry but also good for the consumer.

richard12511 · Aug 10, 2020

Sean Olive said:
Yes. A typical test consists of 8 trials where the order of speakers and program are randomized. In each trial, the listener can switch among the different speakers as many times as they like, changing their scores as they see fit. Once they are satisfied with their scores, comments and any other scales we include (spectral balance, distortion, etc) they hit a button (DONE) and move on to the next trial. One of the eight trials is a repeat so we get a measure of how consistent each listener rates the speakers for each program.

Thanks for the response. For the very first blind shootout I organized for myself and friends, we recorded both a preference and a score, but we've just done preference rankings since then, due to how much we all struggled with scoring the first time. I want to give scoring another try for the next one we do. I feel like it's something that will get easier with practice and hearing more speakers. I really like the idea of repeating the test to judge listener consistency, I'm definitely gonna try that next time.

patate91 · Aug 10, 2020

Sean Olive said:
What audio companies do NOT want to make a profit?? Only the ones that won't be in business for very long

Doing rigorous blind testing of products costs a lot of time and money in R&D. Many audio companies would argue these tests cost too much money and reduce profit.

Also, if we only cared about profit would we publish our research so that other companies can use it freely to improve their products?. Would we help create loudspeaker measurement methods and standards that are used on this site to measure our loudspeakers and competitors so that consumers have better data to make informed purchase decisions?

Certainly we believe science & research are necessary to stay competitive over the long haul, but part of what we give back to the industry is not just motivated by profit, but to help raise the standards and bar of the industry. In the long run, that is good for the industry but also good for the consumer.

I'm glad to see that you've decided to exchange with us.

I'm in business too l so I understand that profit is important.

I think that's what science is all about.

Sean Olive · Aug 10, 2020

richard12511 said:
Thanks for the response. For the very first blind shootout I organized for myself and friends, we recorded both a preference and a score, but we've just done preference rankings since then, due to how much we all struggled with scoring the first time. I want to give scoring another try for the next one we do. I feel like it's something that will get easier with practice and hearing more speakers. I really like the idea of repeating the test to judge listener consistency, I'm definitely gonna try that next time.

The benefit of scoring versus ranking is you get a magnitude of the preference. We define a strong preference as a <=2-point spread, a moderate preference as a <= 1 point spread and 0.5 point as a slight preference. If the Preference Scale is calibrated (using a common low/medium/high anchors helps) you can compare speakers measured across different tests. Using anchors is is how we were able to test 30+ headphone models across a number of tests, and minimize context effects.

Tom C · Aug 10, 2020

Sean Olive said:
What audio companies do NOT want to make a profit?? Only the ones that won't be in business for very long

Doing rigorous blind testing of products costs a lot of time and money in R&D. Many audio companies would argue these tests cost too much money and reduce profit.

Also, if we only cared about profit would we publish our research so that other companies can use it freely to improve their products?. Would we help create loudspeaker measurement methods and standards that are used on this site to measure our loudspeakers and competitors so that consumers have better data to make informed purchase decisions?

Certainly we believe science & research are necessary to stay competitive over the long haul, but part of what we give back to the industry is not just motivated by profit, but to help raise the standards and bar of the industry. In the long run, that is good for the industry but also good for the consumer.

Your gracious contribution of your time and thoughts are greatly appreciated.
Might I ask, given the present emphasis on headphone evaluations, do you think significant time and effort will be spent by the industry in the near future on further speaker research?

patate91 · Aug 10, 2020

Sean Olive said:
What audio companies do NOT want to make a profit?? Only the ones that won't be in business for very long

Doing rigorous blind testing of products costs a lot of time and money in R&D. Many audio companies would argue these tests cost too much money and reduce profit.

Also, if we only cared about profit would we publish our research so that other companies can use it freely to improve their products?. Would we help create loudspeaker measurement methods and standards that are used on this site to measure our loudspeakers and competitors so that consumers have better data to make informed purchase decisions?

Certainly we believe science & research are necessary to stay competitive over the long haul, but part of what we give back to the industry is not just motivated by profit, but to help raise the standards and bar of the industry. In the long run, that is good for the industry but also good for the consumer.

Do you have recomendations for people like me, that would want to be more rigorous when we test new gear?

Like Amir, I don't have the budget and the installation to perform "real" double tests.

Also there were a couple of questions regarding the definition of experienced/trained listener

Sean Olive · Aug 10, 2020

Tom C said:
Your gracious contribution of your time and thoughts are greatly appreciated.
Might I ask, given the present emphasis on headphone evaluations, do you think significant time and effort will be spent by the industry in the near future on further speaker research?

I'm not sure it will for the short-term. The focus right now seems to be immersive audio and how to improve the performance and deliver it to more people over headphones, loudspeakers and in the car. Besides movies, music, and gaming there are applications in VR and AR.

The frailty of Sighted Listening Tests

Major Contributor

Deleted member 17820

Guest

Deleted member 17820

Guest

Major Contributor

Major Contributor

Founder/Admin

Active Member

Founder/Admin

Addicted to Fun and Learning

Major Contributor

Major Contributor

Master Contributor

Senior Member

Senior Member

Major Contributor

Active Member

Senior Member

Major Contributor

Active Member

Senior Member

Similar threads