• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The frailty of Sighted Listening Tests

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,700
I was quoting the blog post:

View attachment 76794

The preference scale is very small. So 1-2 units is a pretty large effect. The question is, “Will two speakers that measure as similarly as the Revel and SVS likely vary 2 units?” Saying “yes” is a pretty strong claim, given the scale. Going from “recommenced” to “not recommended” is even larger. It may be true! But I think more than sighted listening might be necessary to demonstrate it.

Personally, I trust Amir's ears with the SVS review. He's not the first one to report them sounding harsh. They have a bit of a reputation in that regard. In fact, I was surprised when I opened the review and saw the spin. My first reaction upon seeing the spin was something like "damn, those aren't nearly as bright as I thought they would be. Maybe all those people on AVS are wrong about these sounding bright". But then, we got to the subjective impressions, and Amir's impressions were exactly in line(minus the boosted bass part) with what most people say about them.

Also, I personally don't see a good enough reason why Amir would be biased into not liking these. If anything, I think the bias would be more likely to benefit the speaker, as he saw the measurements before he listened.

You do bring up an interesting point, though. It's a point that I've been thinking about since that SVS review dropped, and that is just how remarkably close the SVS and M106 measure, and yet they sound so much different. They have VERY similar on/off axis response(with deviations in very similar ranges), and very similar directivity indexes(with very similar magnitude errors at the same frequency). Overlaying the responses on top of each other makes this very clear. In fact, I'd go so far as to say that they're *almost close enough to where you could say it's just two different samples of the same speaker. The difference in measurements between the SVS Ultra and Revel M106 are really no bigger than the difference between the M106 and M105. Maybe this is more relevant to the SVS thread, but I would really like to figure out what in the measurements is causing such a drastic difference.

I also see no reason why Amir would be biased towards Revel speakers he has no plans on owning. What incentive does he have?

Maybe Revel is doing something with most of their designs that doesn't show up well in measurements, and maybe Amir prefers that something. Looking closely at the SVS Ultra vs Revel M106, I noticed that the M106(and most Revels measured so far) has - what looks to be - intentionally boosted bass, specifically around 100Hz. The SVS Ultra, by comparison, has more neutral bass. Could be that the boosted bass is softening the highs.

Generalizing a bit more, maybe Revel has a current "house sound" that isn't perfectly in agreement with the older science that Toole and Olive did, and maybe Amir prefers that house sound? Given that his reference speaker is the current flagship Revel, it could be that he's judging speakers based on how close they get to that sound, rather than how close they get to something more neutral(Genelec, for example). I guess you could call that a Revel bias, but I wouldn't. I'd say it's more like a Revel "preference". Bias to me means something outside of sound reproduction that influences the listening result, and wouldn't hold up under blind conditions. In this case, the Revel "preference" would hold up under blind conditions.

I really enjoy the subjective impressions now, and they've changed my view on the whole objective vs. subjective debate. I still lean way more towards the objective side, but they've forced me to give more credence to the typical subjective view that you can't tell if a speaker will sound good or bad just by looking at the spinorama. You really can have a speaker with terrible measurements that sounds good(Canon s-50), and you really can have a speaker with excellent measurements that sounds bad(SVS Ultra).

Anyway, not trying to attack or offend anyone, but just giving some thoughts.
 
OP
P

patate91

Active Member
Joined
Apr 14, 2019
Messages
253
Likes
137
I have read them and I can't find such a claim, that is why I am asking. Instead of assuming I haven't read the reviews, it would be helpful if you could point out this specific claim instead of dismissing my question entirely.

Here's an exemple
 

Attachments

  • Screenshot_20200805-195549_Chrome.jpg
    Screenshot_20200805-195549_Chrome.jpg
    502.4 KB · Views: 122

Kvalsvoll

Addicted to Fun and Learning
Audio Company
Joined
Apr 25, 2019
Messages
878
Likes
1,643
Location
Norway
I am going to listen to speakers. I am going to try to correlate measurements with what I hear.

And I certainly hope you will continue to do so.

This listening can not be done blind. Yes, there will be inaccuracy, there may be bias, you may have a bad day. Still, a sighted listening will provide very useful information, and from what I have seen, you carefully describe how you did the listening and also mention limitations.

Also, the text is condensed and right to the point, in a concise technical language. Combined with the measurements, it provides useful information not only about the speaker being tested, but also about how and what measurements correlate to what we hear. We see that this is complicated.

Speakers are fundamentally different from electronics. This may not be so obvious if you do not have the technical background to understand. Blind testing dacs and amplifiers is possible, but also not necessary. Blind testing every speaker is not possible, but would be desirable. So we go for the next best - sighted testing, which in this case - same listener, similar or same conditions - improves consistency.

Your listening has value, and your approach to the testing of speakers with measurements will hopefully bring on more knowledge and awareness.
 

Racheski

Major Contributor
Forum Donor
Joined
Apr 20, 2020
Messages
1,116
Likes
1,699
Location
Chicago
Here's an exemple
How can you infer that "Amir claims that his sighted listening can detect things the measurements cannot" from this post? This post isn't even about that - he is responding to the notion that good measurements will bias his listening impressions.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,374
Likes
234,477
Location
Seattle Area
More about Experienced listeners

"Experience is one of those variables among listeners that is very difficult to quantify. For example, musicians are experienced listeners but,
is experience in focusing on musical attributes equivalent to that of focusing on timbral and spatial attributes?
Some evidence suggests that it is not. Gabrielsson found that musicians who were not also audiophiles,were not especially good judges of sound quality[4]. The famous pianist Glenn Gould
came to appreciate the insights of non musicians[5]. Our own tests have confirmed this. So, listeners with different backgrounds could be expected to have differing abilities or preferences in subjective evaluations. This is a nenormously broadtopic, but we thought that it would be interesting to take a first step towards understanding the importance of this variable."
Musicians indeed have no special ability as it relates to fidelity. They hear things from their vantage point which has little to do with how music is recorded and mixed.

By the way, you all are experienced in a field but don't realize it. Ask a non-audiophile what they think of your fancy system and they will likely shrug their shoulder. To them music is what matters, not what this system or another does. When I drag my wife into comparisons, she mostly says it is hard to tell any difference.

Audiophiles on the other hand, can tell good recordings from bad. They can tell a boomy bass from a balanced one. They can tell if a system is too bright versus not. All of this is because you have become experienced in detecting linear differences as a whole.

That skill though is not broad. Audiophiles for example are poor in detecting non-linearities. Even Harman trained listeners may be so as their training is around timbre, not non-linear distortions. Learning to detect small nonlinear impairments requires months of training whereas linear tonal variations are much faster to learn.

So the topic is broad. And any dismissal of experience mattering turns right back to haunt you, invaliding your role as an audiophile. After all, if you have no judgement of value, how do you become an audiophile? Can you not tell good recordings from bad? Were you born this way? Or did you learn to appreciate it so over time?
 

Newman

Major Contributor
Joined
Jan 6, 2017
Messages
3,452
Likes
4,213
....Audiophiles on the other hand, can tell good recordings from bad. They can tell a boomy bass from a balanced one. They can tell if a system is too bright versus not. All of this is because you have become experienced in detecting linear differences as a whole.

....And any dismissal of experience mattering turns right back to haunt you, invaliding your role as an audiophile. After all, if you have no judgement of value, how do you become an audiophile? Can you not tell good recordings from bad? Were you born this way? Or did you learn to appreciate it so over time?
Michael Fremer, for example, boasts how he cannot listen to any digital source without becoming physically ill.

And this is because of his vast experience with many systems, making him experienced in detecting certain things?
 

MattHooper

Master Contributor
Forum Donor
Joined
Jan 27, 2019
Messages
7,200
Likes
11,816
Not to contradict your central point, but...

Musicians indeed have no special ability as it relates to fidelity. They hear things from their vantage point which has little to do with how music is recorded and mixed.

By the way, you all are experienced in a field but don't realize it. Ask a non-audiophile what they think of your fancy system and they will likely shrug their shoulder. To them music is what matters, not what this system or another does. When I drag my wife into comparisons, she mostly says it is hard to tell any difference.

My wife is easily the least enthusistic person to have ever sat in front of my systems. Can acknowledge it sounds good, but truly does not care.
(Though she was brought to tears once, in hearing one of her favorite artists through a really stunningly "live" sounding system I had once. But other than that I've long given up having her sit and listen because it's otherwise pointless).

But as for "normal" people not hearing a difference, my experience is completely the opposite. Over the years a great many non-audiophiles have listened to my systems and every single one was clearly blown away, had their world somewhat turned upside down not realizing just how good music could sound through a stereo system. And often enough, for the "least audiophile" among them they started by saying "well, I don't really listen for this kind of thing and my ears probably aren't as good as yours so I don't know that I'd hear a difference anyway." But they always are blown away...like almost speechless. My mom gave all those disclaimers and thought it would be completely lost on her (she never stops moving, so even being asked to sit on a sofa to actually just listen to music was not in her tool kit). And yet, after a beautiful piece of classical music played, she just sat there with slightly watery eyes, mouth open, looking for words. Like "wow...I never...that was really amazing and beautiful." She instantly "got it."

But when it comes to wives...yeah...most audiophiles I know say their wife doesn't give a damn and they get the eye-roll if they even ask "hon, could you just have a listen to this for a moment?" :)
 

oldsysop

Senior Member
Joined
Sep 5, 2019
Messages
383
Likes
657
MQA2 ? :D


------------
Very interesting this thread, I read it carefully.
My opinion is that what the AP555 and Klippel measures is replicable, what Amir listens to with his training, training and knowledge is not replicable.
So, posting subjective opinions gives snake sellers, golden ears, etc. the opportunity to question ASR. :(
 

MattHooper

Master Contributor
Forum Donor
Joined
Jan 27, 2019
Messages
7,200
Likes
11,816
MQA2 ? :D


------------
Very interesting this thread, I read it carefully.
My opinion is that what the AP555 and Klippel measures is replicable, what Amir listens to with his training, training and knowledge is not replicable.
So, posting subjective opinions gives snake sellers, golden ears, etc. the opportunity to question ASR. :(

Why wouldn't Amir's sighted evaluations be, in principle, replicable? In the case of measurements, you could replicate the process and do the same measurements yourself. In the case of the sighted impressions, you could replicate the same process and listen to the speaker yourself.
 

Rusty Shackleford

Active Member
Joined
May 16, 2018
Messages
255
Likes
550
Every study has a motivation. The motivation for this test was clear: typical sales, marketing and speaker designers being so proud of their hearing ability that they had no use for likes of Sean Olive or controlled testing. You know, standard subjectivsts. They have golden ears because they say they do with no verification.

So Sean set up to demonstrate that they could be wrong. Here is the selection criteria for listeners:

View attachment 76796

100% in compliance with the goals of research to find out if that bias impacts listening test results. It did. Interestingly it was not a home run as it did not change the ranking of the speakers. But did show preference level changing some.

So what you state above is simply incorrect. You have to read the research and understand what it is trying to do. This is a conference paper and not peer reviewed so you can't expect the exacting standards used in J. AES papers. People are free to express opinions and Sean provided some generalities.

Bottom line is that three Harman speakers were used and judged by 40 Harman employees. That makes it way, way outside of any normal study of bias for speaker preference among the general public with no relationship with products being shown.

Importantly, nothing in the paper reads on what we are talking about. No one had critical listening skills. What they call "experienced" just means industry experience or taking a few tests. That is not who we are talking about.

If the test wanted to include true trained listeners, it would have had a test to find such people. Nothing like that is in the paper. People were taken at face value that they are "experienced."

Here is a fun conjecture on my part: I am confident Sean would not want to perform such a study on their small pool of critical listeners used in later studies and show them that what they do outside of blind tests is worthless! That would be friendly fire of worst kind.

What I said is not “incorrect.” I am quoting the stated conclusions of the study. You are asserting that there’s some subtextual, “real” meaning of the study, which only you can divine, and which runs counter to its author’s conclusions in the study itself and in a blog post summarizing the study. It’s nonsense.

Now you are slandering the study (and thereby it’s author), saying it’s not up to “exacting standards,” even though you’ve previously cited it as authoritative.

Once again, if you have other evidence suggesting that trained listeners can reliably discern small differences between speakers with sighted listening, please present it. Please also outline the training necessary to reach that level.
 
Last edited:

Rusty Shackleford

Active Member
Joined
May 16, 2018
Messages
255
Likes
550
Rusty, if you read Kevin Voecks' comments on listening tests for the Salon 2, I think you might agree that he seems to feel that some things are better determined by listening, than by measurements. Similarly, Floyd Toole has commented in his book, "A recent listening test proved its worth when it revealed that a loudspeaker having excellent looking spinorama data (Section 5.3), which normally is sufficient to describe sound quality, was not rated highly as expected. The problem was found to be intermodulation distortion, an extremely rare event, associated with the ways sounds from a woofer and tweeter combined in a concentric arrangement--so constant vigilance and listening are essential." FLOYD TOOLE. AND KEVIN VOECKS. FROM HARMAN.

Furthermore, you are setting up a straw man argument about your perception of "the accepted wisdom on this site."



I highly advise that you take the radical step of simply not reading that section of his posts.



I strongly suggest that you consider not following this forum further, if you are so bothered, or even better, listen to both very similarly measuring speakers and offer your own opinion, making sure that you include some proof of validity.

Young-Ho

As I have said before, if this site wants to move to the position that there are things discernible in even sighted listening that measurements cannot divine, that’s fine. But we need to have the specifics laid out clearly. It cannot simply be that Amir and Amir alone has this amorphous ability.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,374
Likes
234,477
Location
Seattle Area
So, posting subjective opinions gives snake sellers, golden ears, etc. the opportunity to question ASR. :(
What? Someone else on youtube reviewed the same SVS speaker and gave it high praise. When someone linked to my review, the reviewer and others immediately claimed we only measure and don't listen to music:

1596679668337.png


1596679771073.png


1596679827828.png


If I didn't listen, we would walk right into these subjectivist arguments.

As I have mentioned, for things that show no measured performance difference, we are not going to do anything but blind testing.

You all need to get used to two different domains of products we test: those that by definition sound different from each other (speakers), and everything else. One size does not fit all.

Maybe you can show me a study that says listening to speakers is no longer required as measurements are completely descriptive.
 

Rusty Shackleford

Active Member
Joined
May 16, 2018
Messages
255
Likes
550
There is no such "wisdom." Fact remains that even small differences in measurements can change the nature of a speaker's sound rather strongly. Anyone who thinks measurements should be all we do is mistaken about the science and certainly shares no wisdom with me.

I have no ability to tell the sound of two similarly "good measuring" speakers by just looking at the graphs. Similar looking measurements may translate into different perception. Take these two speakers starting with JBL M2:

Spin%2B-%2BJBL%2BM2%2B%2528full%2Bspin%2529.png


And here is Revel Salon 2:

Spin%2B-%2BRevel%2BUltima2%2BSalon2%2B%2528re-measured%2Bin%2B2017%2529.png


The Salon 2 was tested in an informal blind test of audiophiles on AVS: https://www.audiosciencereview.com/...ootout-between-jbl-m2-and-revel-salon-2.1844/

The result was that the Salon 2 was preferred. This is so even though its response is less perfect than JBL M2.

What would you like to happen in our forum? That we only measure and pick the M2 as being better than Salon 2? Clearly this runs foul of the above blind test whose thread included likes of Dr. Toole and Kevin Voecks from Harman.

So I listen. It is a double check on the measurements. That the speaker you bought doesn't pass my listening test is an issue you have to get over. Your personal bias and detest for what I do is not our issue. Learn to deal with them by not reading the reviews or the whole forum for that matter as it was just nicely noted. Heaven knows if I going to listen to someone, it won't be an angry mob running with talking points of research they have not read or understood.

As you just said: blind test. That is not what is at question here, is it?
 

Rusty Shackleford

Active Member
Joined
May 16, 2018
Messages
255
Likes
550
Ah, thanks for clarifying that. Definitely would need to go with the "you would be well advised to test blind" found in the original academic paper then (not the internet blog post).



To be clear, the effect of "sight" bias varied between 0 and 2 on the preference scale. For some loudspeakers and conditions, there was 0 bias caused by being unblinded. And the average across tests was more like 1. Moving from, say, a 5 to a 6 is a "noticeable improvement," but it's not THAT big of a deal on the scale. But here's the thing - the 0-2 unit "bias effect" was created under extreme levels of bias (i.e. if you like your job here, you know what the answer is). So when generalizing the results over to Amir, where the bias is less extreme, I would expect the influence of bias to, in turn, be LESS than 1, on average. Personally, I think that's really good!

But I get the point you're trying to make. You're saying that for you, even the possibility of a 1-2 unit difference is not acceptable when company sales can be affected. I don't necessarily agree, but I hear you. And I'm saying that listening impressions are still valid and usable even when they're not blinded - they're just not quite as authoritative or reliable.

I was thinking of the subjective scale in terms of the quantitative preference rating, too. The scales are roughly the same, and a difference of 1-2 units is the difference between ELAC AS-61 and Revel M105. So I think that much subjective variation would be considered unacceptable to most observers.
 

oldsysop

Senior Member
Joined
Sep 5, 2019
Messages
383
Likes
657
Maybe you can show me a study that says listening to speakers is no longer required as measurements are completely descriptive
So the reliability of the AP is far superior to what Klippel offers us with the speakers.
AP shows us the whole truth, Klippel a part and the rest completes it by ear?
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,374
Likes
234,477
Location
Seattle Area
I also trust Amir's ears, the problem is that I don't have Amir's ears.
They are available for rent. Cost only $1/day but I can't afford to have them damaged. So you also have to add insurance against damage which costs $1000/day.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,374
Likes
234,477
Location
Seattle Area
So the reliability of the AP is far superior to what Klippel offers us with the speakers.
That's for certain. One is using a electro mechanical microphone for sound acquisition, and then other, incredibly sensitive wired connection. I can measure SINAD of 120 dB+ with AP and a DAC. I can't measure half that with microphone and analyzer (either hardware) with a speaker.

AP shows us the whole truth, Klippel a part and the rest completes it by ear?
A speaker generates a 3-D soundfield which is dependent on the location of the speaker, size of the room, listener position, furnishings in the room, etc. Spinorama produces a set of averaged frequency response which have good predictive power but they are not absolute. Nor are they descriptive.

In contrast the measurements of a DAC are static, location, room, and listener independent. There are complexities there in interpreting those results but they in the scale of speakers.

The devices we test are in audio chain but have vastly different characteristics. We are fortunate that measurements with Klippel are as meaningful as they are. We would be in real hot water if all we measured for example was on-axis response.
 

stunta

Major Contributor
Forum Donor
Joined
Jan 1, 2018
Messages
1,155
Likes
1,399
Location
Boston, MA
They are available for rent. Cost only $1/day but I can't afford to have them damaged. So you also have to add insurance against damage which costs $1000/day.
Not sure I want your ears. Mine are already too expensive.
 
Top Bottom