• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Blind test - objectivists with tin hearing?

garbulky

Major Contributor
Joined
Feb 14, 2018
Messages
1,510
Likes
827
Me too, and I agree. Sadly, these kinds of recordings are the exception. There is something special about the 3-D image with pinpoint spatial positioning, and sense of the room space they are playing in, that a simple 2-mic recording captures. Most modern recordings are individually close-miced and mixed. If they do this well it makes a higher resolution recording, capturing finer more subtle details. But that extra detail can be artificial: it's the kind of thing I hear in the group during rehearsal right next to the other musicians, but you can't hear even from the front row of the audience. Also these close-miced recordings don't capture the spatial aspects of the experience.
I actually prefer a dead listening room for audio. At first that sounds ironic, but the reason is I want my room to be acoustically invisible so I hear the space captured in the recording, not the false sense of ambience and reflection created by my room superimposed on the recording.
For me listening to a stereo micced recording (or similar) is more relaxing and natural. I actually record using this way and I'm surprised at how easy it is to get an impressive soundstage and full bodied sound using stereo microphones.
 

garbulky

Major Contributor
Joined
Feb 14, 2018
Messages
1,510
Likes
827
if someone hears a difference "sighted" and that difference disappears under blind conditions, it's very likely not to be a false negative
Sighted and blind listening tends to be done in different ways. Sighted lsitening can be done over a long period of time in the same circumstance one uses the gear. Blind listening tends to not lend itself to those cirucmstances.

I think one thing people gloss over is that when you level match and do a test blind without even knowing what component one is listening to, it makes it harder. Of course it's harder, it's blind. That way you don't get swayed by "biases." But it's also harder. Ever listened to level matched anything? It's not easy to hear differences even in ones where there are differences. I wonder what studies have been done to see if and what the masking effects are simply due to level matched or blind circumstances. AFAIK it's just assumed to not exist or be a problem.
But the fact when people listen to gear in real life they don't listen level matched. They don't listen to the same sample for five seconds at a time and repeat it. They also know what gear is being played and aren't being tested. Those are all significant differences in the context in which we perceive audio that are being ignored in level match DBT tests.

Sight also helps place audio in context. Listen to a recording. Then listen to it with the video of the music performance. You'll notice your brain will perceive it differently. With the video context, the sound feels less flat and the room ambience is perceived better. Sight matters in audio listening.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
That's not "authority". It's an assertion.
My assertion is the simple definition of the word. Test sensitivity is defined as "recall": what % of true positives does the test detect? In a blind audio test this is impossible to measure: recall is unknown because we can't identify false negatives. It would be a leap of faith to assume any test is perfectly sensitive: that is, it has 100% recall. There is no evidence to support this belief. And there is some evidence (which I've explained in prior posts) to suggest that audio tests have less than perfect sensitivity - that is, recall is less than 100%.
Thus it's reasonable to assume that the thresholds of perception are lower than the thresholds measured in tests. How much lower, nobody knows because we can't detect false negatives.
 

rwortman

Addicted to Fun and Learning
Forum Donor
Joined
Jan 29, 2019
Messages
740
Likes
683
I don't regard the failure to hear a difference in one test and being able to hear it after training a false negative. It is a valid result that shows the value of training. The untrained person couldn't tell the difference, the same person with training could. A false negative would be if someone heard a difference and lied about it. We have not way of independently testing an ear without testing the brain attached to it and all the programming therein. So the first test was valid. Then you updated the software and got a different result. In my opinion there is nothing false about either result.
 

rwortman

Addicted to Fun and Learning
Forum Donor
Joined
Jan 29, 2019
Messages
740
Likes
683
[QUOTE="garbulky, post: 149115, member: 986"
Sight also helps place audio in context. Listen to a recording. Then listen to it with the video of the music performance. You'll notice your brain will perceive it differently. With the video context, the sound feels less flat and the room ambience is perceived better. Sight matters in audio listening.[/QUOTE]

For this to be valid you have to listen to the video disk or file with the screen turned off. I have sometimes found that the sound guy mixing the video just did a better job.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,521
Likes
37,050
My assertion is the simple definition of the word. Test sensitivity is defined as "recall": what % of true positives does the test detect? In a blind audio test this is impossible to measure: recall is unknown because we can't identify false negatives. It would be a leap of faith to assume any test is perfectly sensitive: that is, it has 100% recall. There is no evidence to support this belief. And there is some evidence (which I've explained in prior posts) to suggest that audio tests have less than perfect sensitivity - that is, recall is less than 100%.
Thus it's reasonable to assume that the thresholds of perception are lower than the thresholds measured in tests. How much lower, nobody knows because we can't detect false negatives.
Under what natural or other conditions would this lower threshold be perceived?

It is quite possible the perception in echoic memory is as good as it gets. The effect with longer switching times might still apply.

Let us assume your echoic memory in total is 10 seconds. And perfect best it can be at the very best threshold that is possible.

With instant switching you get 5 seconds and another 5 seconds of the exact thing all level matched to perceive a difference.
With one second switching you get 4.5 seconds, a slightly disruptive silent glitch, and 4.5 more seconds to perceive a difference.
With two second switching you get 4 seconds, 2 slightly disruptive silent seconds, and 4 more seconds to perceive a comparative difference.

It wouldn't be surprising if the instant switching is a little lower threshold of perceiving a difference. Even though our best threshold of hearing is in use.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
... A false negative would be if someone heard a difference and lied about it. We have not way of independently testing an ear without testing the brain attached to it and all the programming therein. So the first test was valid. Then you updated the software and got a different result. In my opinion there is nothing false about either result.
I agree. To highlight the distinction I'll give 2 examples. They both start with a person failing to pass the test the first time, then passing it the second time after training.

Case 1: he perceived the difference but failed to articulate it in the test because he lacked sufficient concentration or memory skills to articulate the difference he perceived. Training improved his concentration and memory skills, so he passed.
Case 2: he did not perceive the difference because he lacked sufficient perception skills to recognize what his ears were telling him. Training improved his perception, so he passed.

Case 1 is a false negative: he perceived the difference but the test did not reflect this.
Case 2 is a true negative: he did not perceive the difference and the test correctly reflected this.
Problem is, training improves all these areas together and they are virtually impossible to tease apart because it's all happening inside our minds.

It is absolutely true that instant switching reduces the threshold and improves test sensitivity. This is well known and measured. But even with instant switching, the test relies not only perception, but also memory and articulation of perception. This suggests that the test has less than perfect sensitivity, or (put differently) less than 100% recall.

PS: another simpler way to get a false negative: you have a set of trials that are better than random guessing, but short of the test's target precision. This test counts as negative. You can't prove it's a false negative, but the odds are that it was, since the results were better than random guessing. Over time you encounter many of these which gives statistical confidence that at least some false negatives are happening.
 
Last edited:

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
...
Let us assume your echoic memory in total is 10 seconds. And perfect best it can be at the very best threshold that is possible.
...
By "echoic memory" do you mean perfect memory, like photographic memory applied to audio? I get your point in principle, but the concept of echoic memory seems implausible given the evidence that switching delays less than 1 second reduce test sensitivity. So if echoic memory exists at all (maybe it does, who knows?), it must be shorter than this.

Of course, how close to the echoic ideal anyone's memory is, varies from person to person. Training definitely improves it. But even the most sensitive well trained listeners with the lowest thresholds, still show reduced sensitivity in tests with even brief switching delays. So their audio memory falls short of the echoic ideal.
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,399
People who express a preference between A and B, but can't differentiate them in a DBT might be victims of expectation bias or other psychological factors, or they might also be hearing a real difference that is lower than the test threshold but higher than the acuity threshold.

I’m almost 100% with you, but this is to me where the logic breaks down.

You’ve already acknowledged that long-term auditory memory is less reliable than short-term auditory memory.

If one’s preference is based on a “real difference”, then this difference must derive from experiences stored in memory: one must be able to compare two things in order to have a preference between them, and one must hold two things in memory in order to compare them.

In other words, if one holds a preference based on long-term memory, but the perceived difference on which that preference is based is not discernible even using short-term memory, we can rule out that the difference recalled in long-term memory is a real difference.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
...
If one’s preference is based on a “real difference”, then this difference must derive from experiences stored in memory: one must be able to compare two things in order to have a preference between them, and one must hold two things in memory in order to compare them.
...
Yeah, this is where things get a bit slippery or grey. Perception and awareness is multi-layered. We perceive things that affect us at different levels of awareness; we cannot always articulate them. Part of training improves perception (the mental part, not the physical part), and part of it improves the ability to articulate perception. Example: you could briefly show somebody a photo of a bunch of kids playing in a field, then take it away so he must rely on memory to describe it. He feels there was something ominous about that photo, but he can't recall what it was. As far as he remembers, it was just a bunch of kids playing. What's ominous about that? It turns out the photo had a lion hiding in the grass watching the children, but he saw it so briefly he observed that at a lower level of awareness that he can't articulate. If you showed him the same photo with the lion photoshopped out, that ominous feeling is gone but the photo looks the same; he can't identify anything different. He perceived something real that he could not articulate, so the test returned a false negative.

The above is a bad analogy, I'll probably regret posting it (flame suit on!), but it's not as bad as it sounds. Years ago I participated in ABX to test the threshold of audibility of various filters related to MP3 encoding. The source "A" was a high quality close-miced recording of castanets. Very clean and snappy with lots of HF energy. "B" had various parametric filters applied at different magnitudes, frequencies and widths: -3 dB @ 12 kHz Q=2, -6 dB @ 18 kHz Q=1, etc. We started with easy ones so they were obvious. As we incrementally made it harder (higher frequencies, lower magnitudes, bigger Q), the differences got progressively harder to discern. I reached a point where I wasn't sure exactly what the differences were, or even if they were actually there. Near these thresholds, I could not explain why I picked "X is A" or "X if B" in any given trail, but went on gut instinct. Almost but not quite like guessing. As the test results dropped to random, there was a range where the test results were less than target, but better than random guessing.

At the threshold of audibility we lose the ability to articulate the differences we're hearing, before (at a higher level), than we lose the ability to perceive them. That's because articulating them requires a step beyond perception alone. And blind tests require both perception and articulation. Training improves both perception and articulation, and shrinks this difference. But it is a big leap of faith to assume training eliminates it entirely. Indeed, that would be equivalent to claiming that blind tests have perfect sensitivity, or 100% recall.

BTW, after all this discussion it's worth repeating: I am a proponent of blind testing. It's a great tool. Audio engineering is better for it. I've even written ABX software. I'm only pointing out that it doesn't have perfect sensitivity. It's a "controlled precision, unknown recall" test.
 
Last edited:

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,399
And blind tests require both perception and articulation.

I don’t see where the articulation part comes in necessarily. Take your own example from personal experience: you couldn’t articulate what the difference was, yet you were still able to guess correctly at a rate higher than random chance would suggest.


BTW, after all this discussion it's worth repeating: I am a proponent of blind testing. It's a great tool. Audio engineering is better for it. I've even written ABX software. I'm only pointing out that it doesn't have perfect sensitivity. It's a "controlled precision, unknown recall" test.

Yes and this point I agree with. You’d have to be quite blinkered not to see it. But blind testing (when optimally executed) has better sensitivity than any other test, and better sensitivity than the type of long-term memory required to form an uncontrolled preference.

Example: you could briefly show somebody a photo of a bunch of kids playing in a field, then take it away so he must rely on memory to describe it. He feels there was something ominous about that photo, but he can't recall what it was. As far as he remembers, it was just a bunch of kids playing.

Is this a real test? It contradicts my (admittedly limited) knowledge of the psychology around “subliminal” imagery.

Putting that aside though, we have to keep in mind that there is a distinction between psychoacoustics and psychology. While the latter might be influenced by feelings, the former can’t be. You may have certain feelings about a piece of audio equipment, and this may influence your preferences (psychology). Yet you may not be able to discern a difference between it and another piece of equipment you don’t prefer under controlled conditions (psychoacoustics). The point of blind testing is to eliminate or at least neutralise the psychology aspect and isolate the psychoacoustic aspect. Of course I know you know this :) But i think it’s relevant to your attempt to make an analogy with a hypothetical (or real?) psychological test.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
And the question is still studiously avoided.:D Procrustes nods.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,384
Location
Seattle Area
Sighted and blind listening tends to be done in different ways. Sighted lsitening can be done over a long period of time in the same circumstance one uses the gear. Blind listening tends to not lend itself to those cirucmstances.
Of course it does. Just have a loved one switch one for the other, use towels and such to cover what is connected and you can test as long as you want. Science says you are doing yourself a disservice doing that but if you want to do it, the option is all yours.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
I don’t see where the articulation part comes in necessarily. Take your own example from personal experience: you couldn’t articulate what the difference was, yet you were still able to guess correctly at a rate higher than random chance would suggest. ...
I added to the confusion by using the word "articulation" improperly. By "articulation" I mean the steps required to perform a blind test that go beyond direct perception: primarily the mental analysis/comparison of a perception to the recent memory of a perception. Of course, in a blind test one need only discriminate, he need not describe or articulate the differences he perceives. But that discrimination takes a step beyond direct perception because it relies on memory and comparison.

Keep in mind: those trials near the limits of test sensitivity whose results were less than target, count as negative tests, even though they were better than random guessing. That is one of the doors through which false negatives pass. Of course, we can only call them false negatives probabilistically. Yet evidence suggests they are real... but only suggests -- we can't be sure. If we move the target door to count them as "real", it lowers the test precision/confidence. There is a tradeoff between precision & recall, neither is ever 100%.

... blind testing (when optimally executed) has better sensitivity than any other test, and better sensitivity than the type of long-term memory required to form an uncontrolled preference. ...
Yeah, I don't know of any better tools. Like Churchill said of democracy, blind testing is the worst, except for all the others. This doesn't disparage it, just a pithy way to point out that it's imperfect. It's useful tool but to wield it effectively we must understand its limitations as well as its strengths.

...
Is this a real test? It contradicts my (admittedly limited) knowledge of the psychology around “subliminal” imagery. ...
No, just my attempt to give an analogy to the distinction between perception and the reliable comparison of perceptions through memory, outside the context of audio in case that different context helps see it in a fresh light. The downside of all such analogies is they risk confusing the point.
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
Sighted and blind listening tends to be done in different ways. Sighted lsitening can be done over a long period of time in the same circumstance one uses the gear. Blind listening tends to not lend itself to those cirucmstances.

I think one thing people gloss over is that when you level match and do a test blind without even knowing what component one is listening to, it makes it harder. Of course it's harder, it's blind. That way you don't get swayed by "biases." But it's also harder. Ever listened to level matched anything? It's not easy to hear differences even in ones where there are differences. I wonder what studies have been done to see if and what the masking effects are simply due to level matched or blind circumstances. AFAIK it's just assumed to not exist or be a problem.
But the fact when people listen to gear in real life they don't listen level matched. They don't listen to the same sample for five seconds at a time and repeat it. They also know what gear is being played and aren't being tested. Those are all significant differences in the context in which we perceive audio that are being ignored in level match DBT tests.

Sight also helps place audio in context. Listen to a recording. Then listen to it with the video of the music performance. You'll notice your brain will perceive it differently. With the video context, the sound feels less flat and the room ambience is perceived better. Sight matters in audio listening.

The fletcher munson curves demonstrates your hearing changes with volume. By listening at different volumes you immediately invalidate any comparison. It is literally the oldest trick in the book for hifi sales. Play one louder. It will win.

As Amir pointed out you can listen as long as you like blind.

the only thing that gets masked by blind listening are your biases.

Video justs distracts attention from the audio. It also locationally steers your perception
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
... The fletcher munson curves demonstrates your hearing changes with volume. By listening at different volumes you immediately invalidate any comparison. It is literally the oldest trick in the book for hifi sales. Play one louder. It will win.
...
On a related note, these tonal differences can also be used to detect dynamic compression. When you dynamically compress it, due to Fletcher Munson it's like turning up bass & treble tone controls on the quiet parts. Live acoustic music at PP sounds more tonally muted than at FF. When that doesn't happen in a recording it is unnatural. At first glance it can sound more "detailed" but the detail is artificial and it impairs the emotional impact of the music.
 

garbulky

Major Contributor
Joined
Feb 14, 2018
Messages
1,510
Likes
827
Of course it does. Just have a loved one switch one for the other, use towels and such to cover what is connected and you can test as long as you want. Science says you are doing yourself a disservice doing that but if you want to do it, the option is all yours.
Except we know what we are listening to and it's not level matched and we normally don't do back to back comparisons in the way we normally use our gear. So it's not the same thing. Also, good luck convincing my wife to put towels or whatever! :D
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,384
Location
Seattle Area
Except we know what we are listening to and it's not level matched and we normally don't do back to back comparisons in the way we normally use our gear. So it's not the same thing.
You don't have to level match. Or follow any protocol other than doing the testing blind. Have someone switch out one product for the other a dozen times once a day and you both keep score of which unit is what. Then at the 12th day compare notes and see how it matches your sighted evaluation.
 
Top Bottom