• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Can You Trust Your Ears? By Tom Nousaine

Status
Not open for further replies.

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,386
Location
Seattle Area
I went looking for the original text of this often talked about article by Tom Nousaine but could not find it. Eventually found an optical scan with fair number of errors. I fixed most of them and thought I post it here for future reference before any trace of it disappears from the Internet.

There is a lot of wisdom in the article that is as current today as it was in 1997 when he wrote this for Stereo Review. Rest in peace Tom.

upload_2017-5-21_21-40-47.png

Can You Trust Your Ears?
Perhaps — if you use your head, too by tom nousaine

"Relax, listen carefully over the long term, and, above all,
trust your ears." So goes the most often repeated advice
from friends, reviewers, and salespeople about the right
way to evaluate sound quality. An often quoted corollary is
that you may have to "learn" to hear certain differences by
listening over extended periods. It all sounds so logical,
too. We use our ears constantly, and they serve us well —
at least they saved our ancestors from being eaten
by tigers. Isn't it natural to rely on them to help
us pick out audio equipment?

Unfortunately, our ears can't al-
ways compensate for some of the
error mechanisms that are present
during typical open listening ses-
sions in living rooms and audio
salons. By "open" I mean without
control mechanisms to prevent lis-
tener bias from influencing the re-
sults. I call open listening sessions
"plug-and-play" because you just
disconnect the old amplifier (or
whatever), plug in a new one, and
let 'er rip. That's what you usually
do when you get a new piece of
gear in your living room, and,
with a few clever twists, it's the
standard operating procedure on
the sales floor.

Listener bias can be sorted into
three primary categories: sensory,
psychological, and social. The bias
mechanisms might remain hidden
in the plug-and-play environment,
but they are always there, and
they insure that the
listening evaluations
will be contaminated.
The results will be partially or wholly based on fac-
tors other than the sound being produced.

Sensory Bias

Humans sense the environ-
ment in a differential fash-
ion, and we are most sensitive to
any stimulus when first exposed
to it. For example, we only "hear"
or notice a fan when it is turned
on or off. That is not to say
training is never important,
but for sound-evaluation
purposes differences seem
more dramatic on first expo-
sure. Furthermore, with continued
exposure we quickly adapt or
equalize ourselves to any stimu-
lus. We stop hearing a fan after a
short while, and it turns into sonic
wallpaper. We can even have different
sensory responses to the same stimu-
lus. You feel cold both when you first
jump into a pool and again when you
get out, even though the air tempera-
ture hasn't changed at all.

These simple examples show how
our differentially based sensory sys-
tem can respond in different ways to
the same stimulus and how those re-
sponses can vary over time. Try a little
experiment to highlight this point. Re-
peat a 30-second musical passage sev-
eral times and notice how you hear
different things each time. We can
hear differences with the same music
because we are constantly scanning
for differential information and will
often sense a change just to help us
avoid that one fatal error with the
tiger. Sound is simply another physical
stimulus, and our ability to assess the
actual sound being produced is what
we are interested in evaluating here.
That long-term listening is a good idea
when choosing audio equipment is
starting to look like a myth.

Any plug-and-play comparison,
with no mechanism for side-
by-side comparisons, will be
inadequate. We are most sen-
sitive to sound differences in a side-
by-side comparison. Think of your
ability to evaluate shades of white.
With paint chips side by side, even
subtle differences are apparent and
easy to detect. But separate the com-
parisons by time and distance, and
your sensitivity to the differences de-
creases. Have you ever tried, for ex-
ample, to pick out the paint chip that
matches the color of your living-room
ceiling while you're at the paint store?
It's nearly impossible.

The best way to evaluate sound is
also through direct side-by-side com-
parisons. It is the only way that allows
us to notice and identify subtle differ-
ences. But there are other aspects spe-
cific to the interpretation of sound that
can get us in trouble. For example, we
tend to interpret small changes in vol-
ume as changes in sound quality. I
conducted an experiment several years
ago where thirty-one subjects were
asked to listen to ten sets of musical
passages, with each set containing two
30-second samples.

In half of the sets, both samples
were played at precisely identical vol-
umes. In the other half, there was a 1-
dB difference in level between them.
Although people had a strong tenden-
cy to "prefer" the louder alternative
(especially when it came as the second
of two), not one of the subjects report-
ed volume or level as a discriminating
factor. All comments on how the
sound changed were couched in quali-
ty terms such as "cleaner" or "more
harsh" even though volume was the
only thing that had changed.

Psychological Bias

Our psychological biases make us sus-
ceptible to cognitive errors in judg-
ment. We are, for example, programmed
to make choices and are perfectly hap-
py choosing a favorite from identical
alternatives. In the experiment de-
scribed above, the subjects said that
they "preferred" one of two identical
sound clips more than 75 percent of
the time, even when there was a "No
Preference" box to check on the score
sheet.

We also tend to make decisions very
early in the game, with only a tiny
portion of the possible evidence accu-
mulated. Some researchers estimate
that we make most of our decisions
with about 5 percent of the available
evidence. Most of the time that works
just fine — you can't go too far wrong
choosing coffee filters this way. But
people routinely make such important
decisions as buying a car the same
way. There is evidence that most of
the information people gather about
cars they buy is done after the pur-
chase — to justify a decision already
made. We humans are imperfect in
this respect, too, willing to discount or
ignore overwhelming contradictory
evidence to avoid admitting to our-
selves that we made a wrong decision.
So once you have decided to "hear"
the marvelous effects of placing a
brick on top of your power amplifier
or painting green stripes around your
CDs, you will be psychologically
hard-pressed to "unhear" them even
when confronted with evidence that
the sound didn't change.

Social Bias

So far we have concentrated on char-
acteristics that can influence our judg-
ment in private. The plot really thick-
ens in group settings, where our social
biases allow group dynamics — and
not sound — to shape what we "hear."

The best way to illustrate this is
with an anecdote. I attended a press
conference unveiling a new speaker a
couple years ago. When my seatmate
leaned over to tell me what he was
hearing and to ask my opinion, I just
made up something that sounded plau-
sible. Sure enough, after the next se-
quence we reconvened, and he inti-
mated that he had "heard" what I had
told him I heard. He seemed a trifle
miffed, though, when I refused to ac-
knowledge what he had heard or to ne-
gotiate an agreement. You can try this
one at home too. Just be careful about
whom you use as a subject!

The scenario that I just described is
played out over and over again in au-
dio demonstration rooms and living
rooms across the country. When the
host fires up the system with his hot
new amplifier, guests initially report
hearing different things, but after a
few replays the group negotiates a
consensus about how the new amplifi-
er "sounds." What we have with this
kind of open interaction is an exercise
in group dynamics, not an exercise in
sound evaluation.

The potential for error here is large,
especially when someone present has
a special status. If Neil Young were in
the crowd, you can bet that many
would defer to his judgment. I know I
would. But the authority figure doesn't
even need to be present. A salesperson
will gladly prime the pump by telling
you in advance what "most people"
and favored reviewers hear when they
listen to this amplifier. He will also
skillfully negotiate differences: "Well,
maybe you didn't perceive the better
rhythm and pace, but surely you heard
the improved liquidity in the mid-
range." It happens all the time.

Now let's check out the more crass-
ly commercial aspects of group behav-
ior and audio evaluation. First of all is
the hidden assumption that the product
being demonstrated actually does
sound different from some other prod-
uct. In an audio salon "no difference"
is not an acceptable answer. You can
argue over what the differences are,
but never question whether differences
actually exist. Woe be unto the audio-
phile who can't hear differences —
even inaudible ones.

Another often hidden factor is the
agenda of the host. Whether he's a
salesperson or a good friend, he wants
your concurrence that something
sounds good. The salesman wants you
to buy the product — that's his job. He
may be your friend, too, but he will
tend to confuse his sales commission
with sound quality. If he didn't he
wouldn't be a good salesman. Like-
wise, your buddy wants your approval
of his investment or his latest tweak. If
he really wanted your true opinion, he
would give you a private score
sheet and let you write down
comments that he could evalu-
ate later. Your approval and con-
firmation are being solicited.
The high sound quality of the
equipment being demonstrated
is a given.

How It Plays Out

Now that we know what to
watch for, let's see how all these
bias factors play out in the sales
routine. Here's what happened Yes,
to me several years ago when I ever
dropped into a suite at the Con-
sumer Electronics Show to au-
dition a certain loudspeaker. I was in
the wrong suite, but the product
demonstrator there suggested that his
cable conditioner, costing several hun-
dred dollars, would improve the sound
more than a change in speakers. He
then offered to let me "audition" the
effects of this device by comparing a
conditioned interconnect cable with an
unconditioned one. Always the skep-
tic, I allowed him to demonstrate the
cables, but I asked him to select music
where the differences would be as
large as possible right from the start.

So we had a "blind" demo where
the demonstrator hooked up one cable
and then another seemingly identical
one and played a minute or two of a
CD using each of them. There were
two other people present during the
demo. Afterwards the host asked ex-
pectantly which cable we "preferred."
The other two people were split. One
"liked" the first, the other the second. I
just said they sounded the same. And
they did. The host responded that he
would repeat the demo with "better
material." What? Hadn't I asked for
the "killer demo" on the first run?

The demo ran for two more trials.
The other listeners didn't "like" the
same cable until the last trial. After
they acknowledged that they both pre-
ferred the conditioned cable on the
third try, the demo was over.

Let's look again at the routine used
for the presentation to spot where
bias was introduced: 1) The host
carefully primed the pump by
telling us what we were going to hear
in the demo. 2) The scoring was heavi-
ly prejudiced by the "which one did
you like" format; there was no easy
way to say they were the same. 3) The
host always started the demo with the
volume control at full off and turned it
up slowly as he began each music seg-
ment. He turned it down when he
swapped cables and between trials, so
there was no way to insure that levels
were closely matched. Although it
wasn't blatant, the second sequence
was always just a little louder than the
first. 4) The 15 to 20 seconds between
each comparison and the 2 minutes
between the segments were way too
long for us to have good sensitivity to
any differences. 5) The conditioned
cable was always presented second. 6)
Listeners were allowed to chat before
deciding their preferences. 7) When
"wrong" answers were given, the
process was simply repeated until the
"right" answers were obtained. Past
results were then ignored, and the de-
mo was brought to an end.

There were no records kept, and no
scientific controls of any kind were
applied. Yet the other two listeners
(one of them a professional audio re-
viewer who should have known better)
declared their amazement that such a
device could "change" the sound of a
cable, conveniently forgetting that nei-
ther of them had agreed on what was
what during the first two trials where
the most revealing material was used.
Plug-and-play at its best! No "sound"
was being evaluated here. The answers
were known in advance, and the rou-
tine was guaranteed to leave many lis-
teners thinking they had heard differ-
ences when there were none. It is
tempting to think, "Maybe they did
hear differences." But if so, why didn't
they hear them in the first trial? Why
not with the best material?

When you are shopping for audio
gear you will experience this routine,
in one form or another, again and
again. It can't happen to you? If you
are really honest, you know that it can
because it has in the past. Watch care-
fully for the clues, and you will see it
played out over and over — even at
your own house when you have
friends over to hear your new
Gizmotron.

Unfortunately, you can't over-
come bias just with willpower
and good intentions even when
you are aware of it. There is a
common notion that if you hear
something you didn't expect to
hear, then you have become an
experienced listener who is able
to tune out bias. Well, it doesn't
work that way even with bias
that you know about.

Optical illusions like the
Muller-Lyer lines shown here
give us insight into sensory
bias. The center lines are exactly the
same length. But even after you mea-
sure them yourself, you won't be able
to "see" the lines as being equal no
matter how hard you squint. You can-
not just tune out audio level mismatch-
es, either. You cannot avoid the differ-
ential nature of human hearing, which
is constantly scanning for changes.
The moral is that humans cannot just
tune out bias. Some cognitive errors
are built in. Furthermore, much of our
prejudice is buried in the subcon-
scious, safely out of the reach of will-
power. So where does that leave us?
a nee [sic] we understand how bias
works, plug-and-play audi-
tions at an audio salon will
never carry the same level
of mystery. It is relatively easy to pro-
duce a situation where people can be
induced into hearing differences be-
tween sonically identical products. We
also know that a fair listening compar-
ison is very difficult to arrange. Level
matching and other procedures needed
for bias-free evaluations are not easy
tasks even for experienced testers.
Knowledge of our innate error mecha-
nisms will go a long way toward keep-
ing the quest for perfect sound headed
down the right road.

No one has ever produced a scientif-
ically controlled listening test showing
that well-designed amplifiers (flat re-
sponse, no clipping), preamplifiers, in-
tegrated circuits, and speaker wires
(16-gauge and bigger) have the slight-
est effect on the sound being pro-
duced. Special capacitors, absolute po-
larity, dots, clamps, green pens, bricks,
and assorted other things also won't
change the sound from a stereo or
home-theater system, although people
can be made to think so. Why? Listen-
er bias can make people hear unverifi-
able "differences" in sound. □


STEREO REVIEW AUGUST 1997 55
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
Tom Nousaine had a personal web site with many worthwhile articles on it. When I pointed someone to it at some point I found out he had passed and that site was no more. I wish somehow we could recover that entire site and have it somewhere.

You probably have read the one about how they made a black box which added I believe 3% distortion to everything. Half of them were distorting and half were not. Sent them home with lots of audiophiles who could audition however they wished as long as they wished before declaring if they had a clean or dirty box. They couldn't tell the difference. One of the big points being long term auditioning is not better than quick switching. Everyone of those same people with quick switching detected the 3% with near 100% reliability. With some experience they could get it down to 1% THD at least.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,386
Location
Seattle Area
That was actually the Clark/Greenhill test. It is documented in Clark's AES Paper,
Ten Years of A/B/X Testing
David L. Clark

index.php


Tom's test was similar but was done using two CD-Rs, one with and the other without added distortion. In take home version subjectivists could not detect the difference. He then setup an ABX test and one of those people got it immediately. I have to see if I can find original text of that article.

I have done tons of tests myself and my accuracy goes to hell as soon as switchover time gets long (I am talking seconds here). Long-term testing would remove much of my ability to find any small impairments. So beyond the above, I can attest to how much better short-term testing is. This is backed by the very short capacity of our short-term memory for such recall. Long-term memory is very sparse and has little ability to help us here.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Yes, you can trust your ears. When I went to the most recent audio show, in Sydney, nearly every system was riddled with obvious distortion artifacts - there is no point in comparing two badly prepared and cooked meals, inferior A vs. inferior B is no contest. There were only a handful of systems that got the basic reproduction done well enough to take further notice, and the best of the bunch was indeed of a top order.

Trying to assess the intrinsic quality of a component in this situation is something like deciding which of two cars which are covered in mud are the better finished in build quality. First, "clean up" the contestants, and then take a closer look - doing this intelligently will yield the right information.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,198
Likes
16,981
Location
Riverview FL
Trying to assess the intrinsic quality of a component in this situation is something like deciding which of two cars...

...is better. Say, at an Auto Show.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
...is better. Say, at an Auto Show.
Which is when they're covered in mud? The point I was making is that if the object that is on show is not presented in a reasonable light, whether a car or audio component, then someone has not paid enough attention to organising the exhibit. A car will be polished to the 9's, so people can assess how they present, in the driveway. No more than that. In the audio game, last time I checked, the concept was that one puts all the bits in position, and switches it on - if the sound is a bit dodgy, in a show situation, then I say something is 'wrong' with the gear - have we not progressed in the how many decades since hifi started to at least be able to do that simple thing?
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
I agree with every word he wrote, but at the same time the man-made world is created 99% 'sighted'. This tells me that
(a) neutral science and engineering is responsible for a lot of the stuff that works regardless of the level of competence of the people who create it and what is going on in their heads
(b) in the 'creative' world, the emperor's new clothes is just another form of art
(c) whether or not 'differences' are real or all in the mind doesn't really matter - except to the customers' impoverished families
(d) you can imagine differences that don't exist, but you can also educate yourself to get things in proportion.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
For an article back in 1970s, when controlled testing in audio was a new idea - as it was for example in medicine at that time as well - it would have been very important, but in 1997 it is imo a bit too much relying on a certain belief.
A lot of the errors he described, could not be attributed to "sighted listening" but to dishonesty on the experimenters behalf, or in other words could be incorporated equally in "blind tests" .

So it concentrated merely on the various bias mechanism at work in humans (during sighted listening) whily mostly omitting to mention that most of these bias mechanisms are still at work in controlled listening tests (controlled means including the "blind" property).
Not to mention that the bias mechanisms at work in the experimenters still might have stong impacts on the results of any controlled test. As stated before, looking at the results of well documented "blind" listening tests it is obvious that correct results aren´t assured per se. Or in other words, it is as easy to get incorrect results with controlled listening tests as it is with "sighted tests".

Everything else would be imo quite surprising as so many bias mechanism are still at work......
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
snipe............Or in other words, it is as easy to get incorrect results with controlled listening tests as it is with "sighted tests"..........snip.......

Maybe the important thing to remember is it being possible to get accurate results in controlled blind tests that are impossible to get in sighted listening tests.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Maybe the important thing to remember is it being possible to get accurate results in controlled blind tests that are impossible to get in sighted listening tests.

Which is quite difficult to answer; usual wisdom in sensory evaluation says, ask an expert and if he thinks the differences are really, really small then "blind" tests are required.
Which leads to the next question, which way do you know who is an expert. :)

Getting correct results isn´t an easy task, that´s why i am always surprised, as it seems that so many peoples still believe in the "magic" of "blind" , although a plethora of literature exists to confirm counter evidence.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
<snip>

I have done tons of tests myself and my accuracy goes to hell as soon as switchover time gets long (I am talking seconds here). Long-term testing would remove much of my ability to find any small impairments. So beyond the above, I can attest to how much better short-term testing is. This is backed by the very short capacity of our short-term memory for such recall. Long-term memory is very sparse and has little ability to help us here.

The important part of the message was imo:
"The A/B/X was proven to be more sensitive than long-term listening for this task."

Although the "proven" part is debateable, it is well known that different test protocols produce different results and that it is important to find the right one for the task.

Krabapple is technically right if he thinks any ABX with raised levels and using small excerpts to find a detectable difference _under_ _these_ _conditions_ does not tell so much about the difference under "normal" conditions.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
The important part of the message was imo:
"The A/B/X was proven to be more sensitive than long-term listening for this task."

Although the "proven" part is debateable, it is well known that different test protocols produce different results and that it is important to find the right one for the task.

Krabapple is technically right if he thinks any ABX with raised levels and using small excerpts to find a detectable difference _under_ _these_ _conditions_ does not tell so much about the difference under "normal" conditions.

If you get to a point that a difference is not detectable at elevated levels and small excerpts, it tells you the difference will be undetectable under sighted conditions and at equal or lower levels or with longer auditioning.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
If you get to a point that a difference is not detectable at elevated levels and small excerpts, it tells you the difference will be undetectable under sighted conditions and at equal or lower levels or with longer auditioning.
Either that or it means that you maybe didn´t see the wood for the trees. :)

This sort of quantitative tests are useful for certain differences for the detection of multidimensional differences a qualitative test might be more suitable.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Getting correct results isn´t an easy task

Surely, results are just results. If you know in advance or afterwards what the "correct" results are, you don't/didn't need to do the experiment. Even worse, if you think you know what "correct" is, you are doing pseudoscience in order to confirm your beliefs.
 

SoundAndMotion

Active Member
Joined
Mar 23, 2016
Messages
144
Likes
111
Location
Germany
I have done tons of tests myself and my accuracy goes to hell as soon as switchover time gets long (I am talking seconds here). Long-term testing would remove much of my ability to find any small impairments. So beyond the above, I can attest to how much better short-term testing is. This is backed by the very short capacity of our short-term memory for such recall. Long-term memory is very sparse and has little ability to help us here.
Amir, you're into photography, right?
If I asked you: "what is the correct shutter speed?" Wouldn't you answer: "it depends." (...on lighting conditions, subject motion, aperture, film/sensor speed/sensitivity...)
Switching time and sample time in a listening test depend on what you are testing and how you are testing. Your blanket statement doesn't work, just as "I've taken tons of photos and 1/500 sec or shorter works best. Anything longer and all my daylight race photos would be all blurry" doesn't work for all photos.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Surely, results are just results. If you know in advance or afterwards what the "correct results" are, you don't/didn't need to do the experiment. Even worse, if you think you know what "correct" is, you are doing pseudoscience in order to confirm your beliefs.
Of course, a test delivers more likely correct results if it is valid, realiable and objective (main quality criteria, leaving others aside) the more biased an experimenter the higher the risk of doing something wrong.

Nevertheless in evaluating the quality of tests it is quite often possible to know what the "correct result" is (see for example usage of negative controls and positive controls). Omitting these means literally "flying blind", but is surprisingly common in the audio field.
 
Last edited:

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Of course, a test delivers correct results if it is valid, realiable and objective (main quality criteria, leaving others aside) the more biased an experimenter the higher the risk of doing something wrong.
I don't think science is in the business of delivering "correct" results. It can deliver scientific results, but defining that can be tricky.

Can a test that involves asking people what they perceive while listening to music, ever be declared to be valid, reliable and objective? Only in the fevered imaginations of audiophiles.

It is no different from using science to determine the best shop lighting for viewing clothes in - for example. The full scientific method can be adopted, with randomised trials. A/B/X testing can be used to determine whether cheaper LEDs are indistinguishable from halogen. People can be asked for their preferences between 4000K and 6000K etc. etc. But if you'd run the trial in 1973, you'd have got different results compared to 2013. Preferences would change from month to month. People in 1973 would probably never have even seen a 6000K lamp so probably couldn't even 'comprehend' what they were seeing; dyes were made differently those days; and earth tones were 'in' that season.

In listening tests, are people responding to novelty, fashion, or a fundamental absolute biological truth? You may get perfectly repeatable, "reliable" results in 2004 (and 2005 and 2006) but because you are not making actual objective measurements and you are relying on asking people their impressions of the aesthetics of art, you cannot guess whether the results will be repeatable in 2007.
 
Status
Not open for further replies.
Top Bottom