• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The frailty of Sighted Listening Tests

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,554
Likes
1,701
Location
California
Rusty/Dale, I agree with others that these are fair discussion points. I know a lot of us have similar questions, and I think it would be nice to take another step closer to coming to a more consensus understanding of them. In order to do that, my feeling is that we would need to establish a little more clarity and precision in the questions that are being proposed. For instance:

1) This forum has been very critical of subjective reviewing, especially with sighted listening, and even more especially with sighted comparisons done after a long time lag.

When you say "this forum has been critical of ____," what exactly is this referring to? Is it Amir being critical? Is it SOME people? Is it MOST people? Is it the forum software itself?
I THINK what you mean to say is: "SOME people on this forum have been critical of _____." Because I certainly am part of this forum, and I don't think I've been particularly critical of subjective reviewing, and I've observed others who take that position as well.

Given that, I think it’s fair to know what the accepted standards are now. Is it that it’s acceptable if the listener is trained? If so, what’s the training?

When you ask for "accepted standards," what term does that mean to you? Would these be written standards determined and posted by Amir? Would they be consensus standards formed by a group of independent experts who reviewed and summarized what we know from the literature? And how would you know they were accepted? Accepted on ASR? Would that be determined by a ratification vote?

What I THINK you want is for Amir to establish what HIS standard is regarding the validity of subjective listening impressions and to be consistent with it.

Do we think blind tests or, at least, level-matched listening with both speakers being present at the same time, playing the same music, is necessary to come to a critical comparison, or not?

When you say "necessary," what does that mean? "Necessary" for you, specifically, to accept the results? "Necessary" for the results to be usable at all, based on published evidence? "Necessary" to know, without a shadow of a doubt, that the results are true?

I think a BETTER question might be: How reliable and valid are the results of a listening test when performed under specific conditions (i.e. blinded/sighted, trained/untrained, influenced by $$$, level matched, specific music tracks, etc.)?

I do think that, if they’re going to trump the data, the subjective reviewers should be longer, level-matched, and done against a consistent “good” reference speaker. But that’s just my view.

I think that's a great question. In other words, are qualitative listening tests that report subjective impressions of the sound (under specific conditions) more reliable than the typical ASR member's interpretation of a set of spin graphs (or the calculated preference score, either one).
 
Last edited:

Dale Gribble

Member
Joined
Aug 7, 2020
Messages
8
Likes
11
Refrain from putting so much faith into the preference ratings/scores. They are not perfect by any means and they can easily be ignored -- I ignore them myself.

*Nearly through reading this thread, I noticed another member pointed this out as well.*

This has actually been mentioned about the preference rating across multiple threads. I believe it was also mentioned that Harman themselves do not use this rating in their own speaker design/test process.

I fully agree. However, it doesn’t resolve the nub of the issue. As Richard noted above, even looking at the graphs, the two speakers are remarkably similar, down to their flaws. But I’m full agreement that reducing any piece of equipment to one number is a mistake.
 

Dale Gribble

Member
Joined
Aug 7, 2020
Messages
8
Likes
11
Rusty/Dale, I agree with others that these are fair discussion points. I know a lot of us have similar questions, and I think it would be nice to take another step closer to coming to a more consensus understanding of them. In order to do that, my feeling is that we would need to establish a little more clarity and precision in the questions that are being proposed. For instance:



When you say "this forum has been critical of ____," what exactly is this referring to? Is it Amir being critical? Is it SOME people? Is it MOST people? Is it the forum software itself?
I THINK what you mean to say is: "SOME people on this forum have been critical of _____." Because I certainly am part of this forum, and I don't think I've been particularly critical of subjective reviewing, and I've observed others who take that position as well.



When you ask for "accepted standards," what term does that mean to you? Would these be written standards determined and posted by Amir? Would they be consensus standards formed by a group of independent experts who reviewed and summarized what we know from the literature? And how would you know they were accepted? Accepted on ASR? Would that be determined by a ratification vote?

What I THINK you want is for Amir to establish what HIS standard is regarding the validity of subjective listening impressions and to be consistent with it.



When you say "necessary," what does that mean? "Necessary" for you, specifically, to accept the results? "Necessary" for the results to be usable at all, based on published evidence? "Necessary" to know, without a shadow of a doubt, that the results are true?

I think a BETTER question might be: How reliable and valid are the results of a listening test when performed under specific conditions (i.e. blinded/sighted, trained/untrained, influenced by $$$, level matched, specific music tracks, etc.)?



I think that's a great question. In other words, are qualitative listening tests that report subjective impressions of the sound (under specific conditions) more reliable than the typical ASR member's interpretation of a set of spin graphs (or the calculated preference score, either one).

I think your last question is on point.

To the previous questions, I meant both “the accepted culture and expectations on the site” and Amir’s personal views. Obviously, I think those are related. I think it’s fair to say that if a member posts “I think x sounds better than y,” it’s inevitable that that member will be met with “Was it level marched?” and “Was it blind?” responses, and if the answers are “no,” the observation will be dismissed by most, if not all, members.

The “anything other than blind, level-matched listening is invalid” view, which I took to be the generally accepted view here, was a clear guideline, even I didn’t necessarily agree. The turn to speakers has exposed the problems with that guideline and the need for subjective evaluations. However, I don’t think the standards for subjective evaluations have really been established. Insofar as they’re used by Amir for his reviews, they’re important, because they will impact purchasing decisions, businesses, etc.

What exactly are you trying to resolve? The point being made is that there isn't a resolution as to why this happens. There are a lot of hypotheses, of course.

What is the acceptable procedure for being able to say “The measurements of x and y are similar, but to me x clearly sounds different/better/worse than y”? It can’t be “only Amir has this ability.” If it’s training, let’s specify it. If it’s level matching and having both pieces of gear on hand at the same time, let’s make that clear. Etc.
 
Last edited:

HooStat

Addicted to Fun and Learning
Joined
May 11, 2020
Messages
856
Likes
933
Location
Calabasas, CA
I think your last question is on point.
What is the acceptable procedure for being able to say “The measurements of x and y are similar, but to me x clearly sounds different/better/worse than y”? It can’t be “only Amir has this ability.” If it’s training, let’s specify it. If it’s level matching and having both pieces of gear on hand at the same time, let’s make that clear. Etc.

Good luck with that. But after you get that SOP drafted, let's get an independent body involved to monitor Amir's compliance with whatever SOP gets developed.
 

LDKTA

Active Member
Joined
Jun 8, 2019
Messages
181
Likes
230
I fully agree. However, it doesn’t resolve the nub of the issue. As Richard noted above, even looking at the graphs, the two speakers are remarkably similar, down to their flaws. But I’m full agreement that reducing any piece of equipment to one number is a mistake.

Noted. "The nub of the issue" is not as black and white as you make it out to be. "Only a Sith deals in absolutes."

While both loudspeakers may look remarkably similar on paper (even down to their flaws -- and I must agree that they do look fairly similar, they are NOT even close to being identical). It makes perfect sense as to why they'd sound different -- and why one would reliably be able to tell the difference between the two via sighted listening, blind listening, etc. It is a very complex issue. Sighted listening will ALWAYS be valid. Preference is inviolate. Amir's opinions can be ignored just as the preference ratings/scores can be ignored. Is there value in Amir's opinion? That should only matter to you.
 
OP
P

patate91

Active Member
Joined
Apr 14, 2019
Messages
253
Likes
137
Noted. "The nub of the issue" is not as black and white as you make it out to be. "Only a Sith deals in absolutes."

While both loudspeakers may look remarkably similar on paper (even down to their flaws -- and I must agree that they do look fairly similar, they are NOT even close to being identical). It makes perfect sense as to why they'd sound different -- and why one would reliably be able to tell the difference between the two via sighted listening, blind listening, etc. It is a very complex issue. Sighted listening will ALWAYS be valid. Preference is inviolate. Amir's opinions can be ignored just as the preference ratings/scores can be ignored. Is there value in Amir's opinion? That should only matter to you.

There's ways to do sighted listening test to make it more serious. Like not looking at data first. Amir is an expert with training it should not be an issue.
 

Archsam

Senior Member
Joined
Apr 8, 2020
Messages
326
Likes
513
Location
London, UK
It pains me to say this, but perhaps part of the problem is our beloved pink panther.

It is the first image we see when we log on to a new review, and we all understand the meaning of each panther being used i.e. headless vs. cookie jar vs. golfing panther.

This is giving the viewer a first impression of the positive / negative 'rating' of the item being reviewed before anyone has a chance to read the full data set / objective analysis. I can see that, for casual readers, this symbol becomes the summation of the review. It may even introduce among regular viewers a bias in our mind before we even start reading the review?

I think this is how Amir's subjective review (the panther score) can be rightly or wrongly perceived as a bigger emphesis than it is perhaps intended?

Never underestimate the power of the visual icon.
 

LDKTA

Active Member
Joined
Jun 8, 2019
Messages
181
Likes
230
There's ways to do sighted listening test to make it more serious. Like not looking at data first. Amir is an expert with training it should not be an issue.

His sighted listening tests are only as serious as you make them out to be.
 

Racheski

Major Contributor
Forum Donor
Joined
Apr 20, 2020
Messages
1,116
Likes
1,699
Location
Chicago
I honestly have no idea what this discussion thread is about. It reads like a series of loosely connected debates around a set of very vague assertions. Perhaps some of these topics can be separated out into separate threads. But that's unlikely to happen because the next post in this thread will be a reply to one of the aforementioned loosely connected debates and that will continue for another couple of posts before someone changes the debate topic again. And so on...
It's about the frailty of sighted listening while gardening, debated under the theoretical framework established by Olive that drinking wine during experiments with Harman employees will bias trained listeners. Clear as mud for me.
 

Killingbeans

Major Contributor
Joined
Oct 23, 2018
Messages
4,089
Likes
7,547
Location
Bjerringbro, Denmark.
I think it’s fair to say that if a member posts “I think x sounds better than y,” it’s inevitable that that member will be met with “Was it level marched?” and “Was it blind?” responses, and if the answers are “no,” the observation will be dismissed by most, if not all, members.

I don't think anybody on this forum has any problems with “personally, I like the sound of x better than the sound of y”. What triggers the call for level matching and blind tests is when people say “I know without a doubt that x sounds better than y”. One is a preference, the other is an extraordinary claim. And those require extraordinary evidence.

Sure, there's no need to be a d**k about if it can be avoided, but if I have to choose between being a bit rude and throwing critical thinking out the window, it would be downright degrading for me to pick the latter.

The turn to speakers has exposed the problems with that guideline and the need for subjective evaluations. However, I don’t think the standards for subjective evaluations have really been established.

Isn't the whole point of a subjective evaluation, that it has no standards? o_O
 
OP
P

patate91

Active Member
Joined
Apr 14, 2019
Messages
253
Likes
137
His sighted listening tests are only as serious as you make them out to be.

Very seriously : He's a trained listener,.less subject to biaises than average people, and, as a former manager, he has a lot of experience on how things work. He knows a lot of people in the audio industry. He measured and listened to countless audio products, more than 60 or 70 speakers in the past two months. Etc.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,524
Likes
37,057
One change I wish Amir would make is to listen prior to measuring. I'll take what I can get from him, but I do think that would be a better approach. The problem with that is he gives what can be some very useful info to potential buyers when he does some EQ for the measured shortcomings and reports they helped it sound lots better or on the contrary it still isn't a good sounding speaker to recommend. So there are pluses and minuses to both approaches.
 

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,755
Likes
16,207
One change I wish Amir would make is to listen prior to measuring. I'll take what I can get from him, but I do think that would be a better approach. The problem with that is he gives what can be some very useful info to potential buyers when he does some EQ for the measured shortcomings and reports they helped it sound lots better or on the contrary it still isn't a good sounding speaker to recommend. So there are pluses and minuses to both approaches.
The advantages can be combined by listening before and after measurements, but of of course testing will take more time. Maybe there should be a poll what most ASR readers would prefer, more devices reviews or more detailed (time spent per device) reviews?
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
One change I wish Amir would make is to listen prior to measuring. I'll take what I can get from him, but I do think that would be a better approach. The problem with that is he gives what can be some very useful info to potential buyers when he does some EQ for the measured shortcomings and reports they helped it sound lots better or on the contrary it still isn't a good sounding speaker to recommend. So there are pluses and minuses to both approaches.
+1
 

Wes

Major Contributor
Forum Donor
Joined
Dec 5, 2019
Messages
3,843
Likes
3,788
I'll add to my fictiv situation.

I decide to compare it with one of my prefered wine.

I pour a glass of each I taste them and rate them.

How robust is my evaluation?

potato wines?
 

whazzup

Addicted to Fun and Learning
Joined
Feb 19, 2020
Messages
575
Likes
486
I think your last question is on point.

To the previous questions, I meant both “the accepted culture and expectations on the site” and Amir’s personal views. Obviously, I think those are related. I think it’s fair to say that if a member posts “I think x sounds better than y,” it’s inevitable that that member will be met with “Was it level marched?” and “Was it blind?” responses, and if the answers are “no,” the observation will be dismissed by most, if not all, members.

The “anything other than blind, level-matched listening is invalid” view, which I took to be the generally accepted view here, was a clear guideline, even I didn’t necessarily agree. The turn to speakers has exposed the problems with that guideline and the need for subjective evaluations. However, I don’t think the standards for subjective evaluations have really been established. Insofar as they’re used by Amir for his reviews, they’re important, because they will impact purchasing decisions, businesses, etc.

This sounds pretty easy. Amir just needs to preface every review with a disclaimer that his subjective opinions and panthers are his personal opinions and he is not liable for any decisions made based on reading them. Now if ASR were to become a 'standard' like THX, with ASR stickers and all, that will be a totally different discussion.


Very seriously : He's a trained listener,.less subject to biaises than average people, and, as a former manager, he has a lot of experience on how things work. He knows a lot of people in the audio industry. He measured and listened to countless audio products, more than 60 or 70 speakers in the past two months. Etc.

Go write to Harman and tell them not to produce any more trained listeners and to please fire them so that you can be appeased.
 

Rusty Shackleford

Active Member
Joined
May 16, 2018
Messages
255
Likes
550
I don't think anybody on this forum has any problems with “personally, I like the sound of x better than the sound of y”. What triggers the call for level matching and blind tests is when people say “I know without a doubt that x sounds better than y”. One is a preference, the other is an extraordinary claim. And those require extraordinary evidence.

Sure, there's no need to be a d**k about if it can be avoided, but if I have to choose between being a bit rude and throwing critical thinking out the window, it would be downright degrading for me to pick the latter.

Isn't the whole point of a subjective evaluation, that it has no standards? o_O

This sounds pretty easy. Amir just needs to preface every review with a disclaimer that his subjective opinions and panthers are his personal opinions and he is not liable for any decisions made based on reading them. Now if ASR were to become a 'standard' like THX, with ASR stickers and all, that will be a totally different discussion.

Not to belabor this, but isn't the whole point of this debate that Amir was claiming that his subjective impression were correct because he's a "trained listener"? That's how we got into what counts as training, if trained listeners are actually more valid or simply more reliable, etc. If he were simply saying "I like this better, but those are just my personal tastes" I agree it wouldn't be an issue.

I also don't think subjective reviews have no standards, insofar as that means "best practices." Certainly, things like level matching, having the equipment compared in possession at the same time, real-time switching, using consistent music, specifying what you are or aren't hearing in clear and precise language, etc. can make a subjective review more or less informative.
 

MattHooper

Master Contributor
Forum Donor
Joined
Jan 27, 2019
Messages
7,200
Likes
11,816
One change I wish Amir would make is to listen prior to measuring.

Let us note: this is the approach already used by Stereophile for decades. In almost all reviews, the reviewer gives the subjective report and the measurements by JA are made afterwards.
 
Top Bottom