• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Can You Trust Your Ears? By Tom Nousaine

Status
Not open for further replies.

SoundAndMotion

Active Member
Joined
Mar 23, 2016
Messages
144
Likes
111
Location
Germany
Can a test that involves asking people what they perceive while listening to music, ever be declared to be valid, reliable and objective? Only in the fevered imaginations of audiophiles.
[snip]...

In listening tests, are people responding to novelty, fashion, or a fundamental absolute biological truth? You may get perfectly repeatable, "reliable" results in 2004 (and 2005 and 2006) but because you are not making actual objective measurements and you are relying on asking people their impressions of the aesthetics of art, you cannot guess whether the results will be repeatable in 2007.
You realize "perceive" and "perception" have basically 2 groups of definitions, right? One relates to feelings, opinions, impressions, etc, while the other relates to processing of sensory information for subsequent use by the brain (actions and memory).
Without taking that into account, you dismiss all of perceptual neuroscience as "fad measurement", which it is not. As with all science, if done with a careful experimental design, the results are valid, reliable and objective. As with all science, people (i.e. scientists) may jump to invalid conclusions from those results.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
I don't think science is in the business of delivering "correct" results. It can deliver scientific results, but defining that can be tricky.

Can a test that involves asking people what they perceive while listening to music, ever be declared to be valid, reliable and objective? Only in the fevered imaginations of audiophiles.

It is no different from using science to determine the best shop lighting for viewing clothes in - for example. The full scientific method can be adopted, with randomised trials. A/B/X testing can be used to determine whether cheaper LEDs are indistinguishable from halogen. People can be asked for their preferences between 4000K and 6000K etc. etc. But if you'd run the trial in 1973, you'd have got different results compared to 2013. Preferences would change from month to month. People in 1973 would probably never have even seen a 6000K lamp so probably couldn't even 'comprehend' what they were seeing; dyes were made differently those days; and earth tones were 'in' that season.

In listening tests, are people responding to novelty, fashion, or a fundamental absolute biological truth? You may get perfectly repeatable, "reliable" results in 2004 (and 2005 and 2006) but because you are not making actual objective measurements and you are relying on asking people their impressions of the aesthetics of art, you cannot guess whether the results will be repeatable in 2007.

Good/interesting points.
Wrt to "scientific" i´d say it´s the other way round; afair the definition of scientific says, it´s scientific if the scientific tools/methods are used, which of course is sort of recursive. The methods/tools are evolving, which means the definition is a matter of convention and - up to a certain degree - subjective.

The intention (or business as you said) is imo nevertheless to get correct results, but usually correctness is not warranted.

And of course it depends on the hypothesis/questions to be examined and results are a degree of probability but not necessarily the "truth/proof" and one should always be careful in generalizing any results.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
You realize "perceive" and "perception" have basically 2 groups of definitions, right? One relates to feelings, opinions, impressions, etc, while the other relates to processing of sensory information for subsequent use by the brain (actions and memory).
If you ask someone for their perception of music in a listening test, which of those two are you measuring?
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
Amir, you're into photography, right?
If I asked you: "what is the correct shutter speed?" Wouldn't you answer: "it depends." (...on lighting conditions, subject motion, aperture, film/sensor speed/sensitivity...)
Switching time and sample time in a listening test depend on what you are testing and how you are testing. Your blanket statement doesn't work, just as "I've taken tons of photos and 1/500 sec or shorter works best. Anything longer and all my daylight race photos would be all blurry" doesn't work for all photos.
Taking a picture is about creating art. The job of audio reproduction is to preserve art. They are different things.

In photography people will take a picture that represents 1/500 sec and examine each and every pixel and determine lens and camera sensor quality. And from point of view of examining a test chart, any shutter speed that doesn't cause a blur would be fine (or locked mirror).

Now take that same 1/500 sec shot and put it in a video of fast moving action and ask people to tell the difference between two lenses of identical focal length (i.e. "look") and they won't be able to ascertain the above differences. Indeed lossy compression of video creates tons of poor quality frames but because they change rapidly, we can't tell that is happening. Of course we can freeze the video and then examine the fidelity -- something we can't do with audio.

Analogies aside, let's look at the task at hand. Let's say I want to tell the difference between 320 kbps lossy compression and original. If we had a tool to objectively analyze the lossy file, we would see that the fidelity is all over the place. Let's say I have pure, absolute silence. In that case I don't need a fraction of bandwidth that CD provides to represent it. On the other hand, if I have tons of transients, throwing away 75% of the bits will cause large degradations (objectively).

Lossy compression works on a frame by frame basis. The frames are measured in milliseconds. A problem, i.e. a transient, may be there in one of those frames but not in the others. By capturing that transient and listening to it over and over again, you take advantage of much higher fidelity short-term memory to hear the difference between that, and the alternative.

In a larger context, "masking" is the enemy of hearing distortions. A lot of sins can be buried in the power of music itself. So the trick to hearing those artifacts is to find components -- which may be awfully short -- that have sufficient silence before/after or during of maybe a single note where that distortion becomes audible. A pluck of acoustic guitar is my favorite example here.

Stepping back, we have to consider why we use listening tests instead of measurements. If I have equipment that truncates everything above 10 Khz, then an instrument telling us that is sufficient to know there is a problem. If however a transient is distorted ever so slightly, our classic measurements don't reveal that -- not easily anyway. Hence the reason we don't use instrumentation for determining lossy compression artifacts. We use humans and weapon of choice there is short switching time.

Indeed fastest way to "lie" about fidelity of lossy compression is to not give that ability to listeners and ask them if two files are different. You will get far more votes of "it is the same as CD" than if you provided that critical snippet and ability to loop there.

I say all this stuff from training and doing such work. And having others in the industry practice exactly the same thing.

Take Harman speaker testing. Their switching time is about 4 seconds. I have to tell you, even though speakers have large sonic differences, 4 seconds was excruciatingly long. During that pause you keep trying to refresh in you memory of what you heard and watch it fade away before the next speaker plays. As such for hearing small differences like distortion, 4 seconds is completely unacceptable. It is OK though for hearing large differences in timbre/overall sound of speakers which is what Harman uses them for.
 
Last edited:
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
On reliability of blind tests, of course there can be errors and agendas to drive specific results. The problem is that doing it sighted will magnify those problems a million times. So we take an imperfect test over what is totally junk.

What can give us a lot more confidence in blind tests is combining it with engineering and science/measurements. If all arrows point to the same direction, then the combined set of data provides highly reliable results.

The problem with the advocates of sighted testing is that they also want to throw out the other data points and with it, rely completely on faulty data. Then they compound the issue by advocating those outcomes as being true in discussions and forums. It goes way beyond point of reasonableness.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
<snip> So we take an imperfect test over what is totally junk.

The "is totally junk" part is an interesting assertion. Is there actual data to back it up?
I am questioning it, because most people i know of, use "sighted listening" to get an impression that will be further examined with "blind" controlled listening tests. If "sighted" were "totally junk" that procedure wouldn´t make any sense.
Usually "sighted" listening tests weren´t used on a certain level as one can not show the internal validity of sighted listening (without incorporation of other/addition methods).

What can give us a lot more confidence in blind tests is combining it with engineering and science/measurements. If all arrows point to the same direction, then the combined set of data provides highly reliable results.

If the "blind" controlled listening tests were shown to be valid, reliable and objective it creates confirming evidence.

The problem with the advocates of sighted testing is that they also want to throw out the other data points and with it, rely completely on faulty data. Then they compound the issue by advocating those outcomes as being true in discussions and forums. It goes way beyond point of reasonableness.

There surely exists a proportion that wants to avoid any "blind" controlled listening tests, but to dismiss all result from "sighted" just because these does not comply with measurement results or theory would go way to much, as the "blind" property only removes one bias mechanism, all other are still at work. Given that our hearing sense is a nonlinear system the mechanistic approach (i.e. if we avoid one bias mechanism, results _must_ be better than before) happens to fail.....
 

Jinjuku

Major Contributor
Forum Donor
Joined
Feb 28, 2016
Messages
1,278
Likes
1,180
The "is totally junk" part is an interesting assertion. Is there actual data to back it up?

Here is what I know. I've put two expensive Ethernet cables on the bench and captured the output through my ADC. Initially with the samples labeled corresponding to the cable under test participants were able describe the benefits of the expensive cabling (WireWorld 12 foot CAT 8 at $330, and Nordost 3 foot at $699).

When the samples were blinded it all fell apart for the people that previously made claims.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
The "is totally junk" part is an interesting assertion. Is there actual data to back it up?
Sure. How much time do you have? :)

I present myself as the data point. I have taken countless listening tests, both sighted and blind. These are tests where we objectively know the answer. I can tell you story after story that despite all my training, I still found "huge" differences in files that were identical in each instance.

A recent example was using Dirac EQ. I did a full bandwidth correction and thought that the high frequency correction was a degradation. So I created another profile that stopped at 200 Hz. Tested that and the high frequency problems I heard were gone. But my happiness stopped when I looked at the Dirac control panel and realized I was still running the full correct profile!!! In other words, nothing had changed yet my expectation read good bit of change in there.

Perhaps more remarkable is that similar thing happens in evaluation of speakers even though we are talking about large differences in sound. The look of speakers, price, even color and finish impact our impressions there by a lot. A controlled experiment of such is from Sean Olive: http://seanolive.blogspot.com/2009/04/dishonesty-of-sighted-audio-product.html

upload_2017-5-22_9-12-53.png


Look at the reversal of total scores for speaker "S" and overall scores in general between loudspeakers, sighted and blind.

I remember being at Fry's and seeing a "home theater" demonstration back in 1990s when home theaters were a big deal. I walk in and see there is a formal presentation. I see wall mounted floor standing size speakers all around. Movie clip is played and afterward the presenter removes the grills to expose these tiny Bose speakers behind them. The optical illusion made a big difference in perception and Bose knew that. That no one would take those tiny cube speakers seriously without such illusion.

I can tell you story after story.

Really, I love sighted testing to work as it is so much easier to do. But it just doesn't.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
There surely exists a proportion that wants to avoid any "blind" controlled listening tests, but to dismiss all result from "sighted" just because these does not comply with measurement results or theory would go way to much, as the "blind" property only removes one bias mechanism, all other are still at work. Given that our hearing sense is a nonlinear system the mechanistic approach (i.e. if we avoid one bias mechanism, results _must_ be better than before) happens to fail.....
Here is the deal and there is no getting around it: we can make arguments as you are making in favor of validity of sighted tests. The problem is that what I find in reality is that audiophiles as a group are horrible in hearing distortions that they talk about all the time. That is, they can't even hear the distortions when they are objectively there. Vast majority for example will fail tests of lossy compression against the original.

So while like you I used to have a moderate position here, I thought that it was not deserved of the point of view. No way then is their observations in hearing differences in sighted testing can be taken seriously when high level of evaluation says the audible difference even if they were there, would be so hard to hear.

Taken as a whole then, sighted observations by audiophiles are just not worthy.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
That´s evidence for the point that "sighted listening" can go wrong (slightly to totally), but that does not qualify for the term "totally junk". In a similar way i could cite a lot of examples where "blind" controlled test was wrong (slightly to totally) but i´m sure we do agree that it would not qualify for coining the term "totally junk" for those either.

The "totally junk" term implies that nobody is able to learn to minimize the impact of bias mechanism (up to a certain degree) even if aware of these bias mechanism. I´d question this implication, because we know that under "blind" conditions a plethora of bias is still at work.
Despite that fact, we know too, that humans could reach under these "blind" conditions quite astonishing sensitivity, which means they must be able to control the impact of bias mechanism.

Which means, we have to state that the "know about bias" is the one and only bias mechanism which is uncontrollable while all others are, which i´d consider to be highly unlikely.

Im always wondering in these discussion why suddenly "audiophiles" were thrown in, as we were talking about listening tests and humans in general. It wouldn´t help either to bring in pseudoobjectivism as a argument why we can´t trust in "blind tests" .... ;)
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Here is the deal and there is no getting around it: we can make arguments as you are making in favor of validity of sighted tests. The problem is that what I find in reality is that audiophiles as a group are horrible in hearing distortions that they talk about all the time. That is, they can't even hear the distortions when they are objectively there. Vast majority for example will fail tests of lossy compression against the original.

So while like you I used to have a moderate position here, I thought that it was not deserved of the point of view. No way then is their observations in hearing differences in sighted testing can be taken seriously when high level of evaluation says the audible difference even if they were there, would be so hard to hear.

Taken as a whole then, sighted observations by audiophiles are just not worthy.

It is not so much about the validity of "sighted" listening but instead of opponing the statement of invalidity (i.e. "totally junk" ) for _every_ "sighted" listening.
As said before, one can not show the validity of a "sighted" result (without doing "blind" controlled listening tests or using another method of qualitative tests, which generally involves nevertheless sort of "blinding" ) but stating the "invalidity" is something different and isn´t justified without further examination either.

P.S. If we want to take everyday life into consideration (as in forum discussions) we have to notice that "sighted" listening is often welcomed if not applauded, provided the results happen to fit the various beliefs of nonaudibility (in contrary to all "totally junk" ascriptions) .....
 
Last edited:

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Taking a picture is about creating art. The job of audio reproduction is to preserve art. They are different things.

In photography people will take a picture that represents 1/500 sec and examine each and every pixel and determine lens and camera sensor quality. And from point of view of examining a test chart, any shutter speed that doesn't cause a blur would be fine (or locked mirror).

Now take that same 1/500 sec shot and put it in a video of fast moving action and ask people to tell the difference between two lenses of identical focal length (i.e. "look") and they won't be able to ascertain the above differences. Indeed lossy compression of video creates tons of poor quality frames but because they change rapidly, we can't tell that is happening. Of course we can freeze the video and then examine the fidelity -- something we can't do with audio.

Analogies aside, let's look at the task at hand. Let's say I want to tell the difference between 320 kbps lossy compression and original. If we had a tool to objectively analyze the lossy file, we would see that the fidelity is all over the place. Let's say I have pure, absolute silence. In that case I don't need a fraction of bandwidth that CD provides to represent it. On the other hand, if I have tons of transients, throwing away 75% of the bits will cause large degradations (objectively).

Lossy compression works on a frame by frame basis. The frames are measured in milliseconds. A problem, i.e. a transient, may be there in one of those frames but not in the others. By capturing that transient and listening to it over and over again, you take advantage of much higher fidelity short-term memory to hear the difference between that, and the alternative.

In a larger context, "masking" is the enemy of hearing distortions. A lot of sins can be buried in the power of music itself. So the trick to hearing those artifacts is to find components -- which may be awfully short -- that have sufficient silence before/after or during of maybe a single note where that distortion becomes audible. A pluck of acoustic guitar is my favorite example here.

Stepping back, we have to consider why we use listening tests instead of measurements. If I have equipment that truncates everything above 10 Khz, then an instrument telling us that is sufficient to know there is a problem. If however a transient is distorted ever so slightly, our classic measurements don't reveal that -- not easily anyway. Hence the reason we don't use instrumentation for determining lossy compression artifacts. We use humans and weapon of choice there is short switching time.

Indeed fastest way to "lie" about fidelity of lossy compression is to not give that ability to listeners and ask them if two files are different. You will get far more votes of "it is the same as CD" than if you provided that critical snippet and ability to loop there.

I say all this stuff from training and doing such work. And having others in the industry practice exactly the same thing.

Take Harman speaker testing. Their switching time is about 4 seconds. I have to tell you, even though speakers have large sonic differences, 4 seconds was excruciatingly long. During that pause you keep trying to refresh in you memory of what you heard and watch it fade away before the next speaker plays. As such for hearing small differences like distortion, 4 seconds is completely unacceptable. It is OK though for hearing large differences in timbre/overall sound of speakers which is what Harman uses them for.
Lossy encoding is a special case and definitely an area where listening tests are mandatory - although we might query whether it is necessary or desirable to use actual music. Instead we might determine the characteristics of hearing at a more elemental level and then deduce whether the signal would mask the bandwidth reduction. This would be closer to actual science.

But lossy encoding is/has been only a temporary glitch in audio progress - it need not concern the true audiophile. Objective measurements rather than listening tests can encompass the vast majority of what we need.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
It is not so much about the validity of "sighted" listening but instead of opponing the statement of invalidity (i.e. "totally junk" ) for _every_ "sighted" listening.
Let me qualify that then :). It produces junk data for anything that audio science/engineering knows cannot make a difference.

Examples:
Thumb tacks from Synergistic research.
Ethernet cables
Almost any audio cable sans extremes.
Cable lifters
Shelving for anything other than turntables
USB cleaners of any sort
Powerline filters
Customizing Windows to remove background tasks, etc.

Every one of these has tons of people who say they make night and day differences. So we have that data point. I will put $1,000 of my own money to demonstrate any of the above differences are heard once we remove the identity of what is being tested.

If any such differences can be shown, they will make great news in scientific and engineering circles. So in addition to my money, huge amount of fame awaits anyone to demonstrate the validity of the sighted observations.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
But lossy encoding is/has been only a temporary glitch in audio progress - it need not concern the true audiophile. Objective measurements rather than listening tests can encompass the vast majority of what we need.
Objective measurements require interpretation. And need to be applicable to the case at hand. We don't always have an easy measurement to do in that regard.

This is why I always say that I prefer a controlled listening test to anything. If that doesn't exist, then we can resort to other tools.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Let me qualify that then :). It produces junk data for anything that audio science/engineering knows cannot make a difference.

Examples:
Thumb tacks from Synergistic research.
Ethernet cables
Almost any audio cable sans extremes.
Cable lifters
Shelving for anything other than turntables
USB cleaners of any sort
Powerline filters
Customizing Windows to remove background tasks, etc.

Which is imo surprising as audio science/engineering can not _know_ that none from this list makes no difference. It can provide some valuable insights and offer hypothesis against (oiow might create some priors if one wants to take the Bayesian approach, wich we imho all do although to varying degrees).
I can´t emphasize the importance of carefully worded assertions and hypothesis at this point.
I think we can agree that all items from this list _should_ not make a difference, but we can´t take it for granted as we combine gear made by humans and humans tend to be not perfect)

So we have that data point. I will put $1,000 of my own money to demonstrate any of the above differences are heard once we remove the identity of what is being tested.

And you´ll put your money in although we already know that it quite likely will have an impact (i.e. represents a bias)?
So, you have to believe that some biases can be controlled. :) You carefully avoided to address that point..... ;)

If any such differences can be shown, they will make great news in scientific and engineering circles. So in addition to my money, huge amount of fame awaits anyone to demonstrate the validity of the sighted observations.

My hypothesis would be instead that the reasons were already known but we failed to address it properly before, so not so much fame is to be gained.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
Which is imo surprising as audio science/engineering can not _know_ that none from this list makes no difference. It can provide some valuable insights and offer hypothesis against (oiow might create some priors if one wants to take the Bayesian approach, wich we imho all do although to varying degrees).
We don't operate from the point of view of zero knowledge in audio or any other sciences. What you say we don't know is the consensus point of view of literally thousands of engineers and researchers in this field. This is not an unknown field where all outcomes have reasonable chances of being true. There are immutable laws and rules that govern audio and it is not reasonable to throw them out in evaluating likeliness of outcomes.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
And you´ll put your money in although we already know that it quite likely will have an impact (i.e. represents a bias)?
An impact on what? I put money forward because I know the likelihood of losing it is zero. I know that based on full consensus of the entire engineering and scientific community of audio. Per my other post, we don't get to throw that out in assessing someone's notions of improving audio. We don't live in a bubble in audio where no rules would apply to it. That my doctor thinks I can't cure my cancer by taking megadoses of vitamins as someone may advocate online, doesn't make him biased. It is the sign of experience and guarding against charlatans dipping their hand in our wallet and pulling out fresh $1,000 bills.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,381
Location
Seattle Area
So, you have to believe that some biases can be controlled. :) You carefully avoided to address that point..... ;)
You mean in sighted testing in high-end audio as performed by audiophiles? Where is the data to back that to borrow your line? ;) :)
 

Don Hills

Addicted to Fun and Learning
Joined
Mar 1, 2016
Messages
708
Likes
464
Location
Wellington, New Zealand
... This sort of quantitative tests are useful for certain differences for the detection of multidimensional differences a qualitative test might be more suitable.

Er... what?
 
Status
Not open for further replies.
Top Bottom