• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Blind test - objectivists with tin hearing?

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
And how would a listener have access to this additional acuity except in such testing? ... You posit since it all relies on some memory someone could be hearing something that is below the testing to detect which leads to a false negative. But under what circumstances could this be accessible to the perception of a listener?
...
Direct perception is immediate; the listener accesses his acuity instantaneously as he listens to music! How exactly he can remember this and compare that memory with another perception is a different skill. Testing relies on this skill. If this weren't true, people would not improve their listening acuity with training. But they do! Training can't make your listening acuity any better from a biological perspective; your ears don't change. It improves your brain's ability to interpret, organize, understand and remember what you're hearing. And we know that hearing differences means comparing what we directly perceive with the memory of a prior perception. We also know that sub-second delays measurably impair that memory, even with the most skilled, sensitive trained listeners.

From this it follows that inherent acuity thresholds must be lower than thresholds measured in testing. I don't know why this seems so controversial. Indeed, the purpose of training is to shrink the difference between these thresholds, so researchers implicitly acknowledge this difference. All it says is that blind audio tests are imperfect; like all tests, they have sensitivity limitations, which means actual acuity exceeds tested acuity. We don't know by how much, because we can't detect false negatives.
 

sergeauckland

Major Contributor
Forum Donor
Joined
Mar 16, 2016
Messages
3,440
Likes
9,100
Location
Suffolk UK
Ah, but even instantaneous switching doesn't enable you to listen to A and B simultaneously. You're always listening to one and comparing what you hear to your recent memory of the other.

Put differently: you can hear A, and you can hear B, but you can't hear the difference between A and B. The difference is not something you directly perceive, but is created in your mind by comparing what you perceive to a memory of another perception.

One could say that switching time is so short as to be insignificant. But we know that you must listen to each for at least a second or two (probably longer) just to hear it properly, so in the comparison you're relying on audio memory of something several seconds old. Yet we know even a fraction of a second impairs audio memory to a measurable amount (as observed in the correlation of switch delay to test sensitivity).

Thus, we can plausibly assume actual hearing acuity thresholds are lower than test sensitivity thresholds. How much lower, we can't measure.
That's not how I do it. I listen for a change at the switching point. If I can't hear any change, then the two are identical. If I hear a change. then they're not. That's the beauty of AA AB BA BB testing that you're only listening for a change. It does require instantaneous switching and very close level matching, but it's exquisitely sensitive to tiny differences.

S
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
You can do tests not involving memory for some things. Especially on basic acuity.

Have someone listening for a tone, and press a button when they hear something. You can see how long it takes them to respond or if they ever respond. You can check frequency response this way (pretty much how people's hearing is tested). You can check how loud something must be before you hear it. You can have noise and ask people to respond when they hear a certain word (I've taken part in that sort of testing), and there are some others.

You could then do the switching against a reference testing to see if the thresholds differ. I don't know of that sort of comparison being done, but maybe it has.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,385
Location
Seattle Area
This seems to confuse test sensitivity with listener sensitivity. Imagine 2 people X and Y with equal hearing acuity.
Why? As I said, we deploy trained listeners and in doing so, we are testing with people who have far better acuity. What they can detect takes care of false negatives for general public.

Let's remember that this system in practice works exceptionally well. Audio codecs like AAC, etc. were all developed using this method and results have proven that the levels of distortion are well below that of general public and even audiophiles. Trained listeners can hear such artifacts still so no false negative there.

Similar work by NRC and Harman shows speaker preference is quite durable. Expert listeners can identify much smaller levels of frequency response variation than general public. I think Dr. Olive measured their effectiveness relative to general public at 8:1 (?).

So yes, it is abundantly easy to get false negatives with picking wrong content, wrong setup, or wrong group of testers. We did that once at Microsoft, showing that 64 kbps WMA sounded as good as the CD! This was a test performed by an independent testing lab, at a cost of $25,000. They followed ITU BS1116, except they didn't know what content to use. So they resorted to classical music which was easy to compress.

Convert the test to expert listeners with "codec killer" sample music and even max bitrate of 320 kbps shows problem. I for example passed double blind test of WMA Pro in perceptually (variable bit rate) lossless form at nearly 700 kbps and could tell it apart from the source. None of the audiophiles we tested could however. And I could only do it in some of the clips.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,385
Location
Seattle Area
Ah, but even instantaneous switching doesn't enable you to listen to A and B simultaneously. You're always listening to one and comparing what you hear to your recent memory of the other.
Again true in general sense but can be mitigated by make a loop of a note and repeating it. I can then synchronize my switching to hear the same note in each case. Hit A/B fast enough and you will almost have both in mind at once.

Again, shades of gray....
 

jsrtheta

Addicted to Fun and Learning
Joined
May 20, 2018
Messages
936
Likes
991
Location
Colorado
Whatever threshold of audibility we measure in blind tests, is not the threshold of inherent hearing acuity; it is the threshold of test sensitivity.

What is your authority for this?
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,383
Likes
24,749
Location
Alfred, NY
Once again, the error being made is lumping together every sort of test for every sort of question and trying to generalize across all of them. My pet term is the Procrustean Fallacy.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
What is your authority for this?
It's true by definition. Every test has a sensitivity threshold. The threshold of inherent acuity is by definition equal to or lower than this threshold. That is, in a properly conducted test having appropriate confidence levels, you can't measure greater acuity than you have -- but you might measure less.

We know the test requires skills beyond inherent listening acuity, because otherwise training listeners would not be effective. Training doesn't increase inherent acuity (it doesn't give you better ears), it only increases the interpretation and recognition skills measured by the test. This tells us these thresholds can't be equal. Training reduces the difference between these thresholds. Put differently, if the thresholds were the same, we wouldn't train listeners because it would not create any improvement in results. Yet it does.

It is a leap of faith to assume that training, which reduces the difference between these thresholds, reduces it to zero. There is no evidence to support that, and strong evidence suggesting the contrary. That is, we know that small (less than 1 second) time delays measurably reduce test sensitivity, even in highly trained and sensitive listeners.

I agree with Amir that these are shades of grey. But the grey does exist -- how wide or narrow a range of grey is anyone's guess.
 

rwortman

Addicted to Fun and Learning
Forum Donor
Joined
Jan 29, 2019
Messages
740
Likes
683
I don't understand what people are are saying when they talk of "false negatives" in this discussion. Either the people under test could reliably hear a difference or they couldn't. The test is for whether the people being tested can detect the difference, not whether some difference exists. There is almost no such thing as identical objects, let alone audio devices. Two devices of the same make and model, made on the same day, are likely to be subtly different. What the audio engineer or tech leaning audio enthusiast wants to know is, are these differences audible, under what conditions, and do these differences make the music less enjoyable to listen to. After that, is this difference worth the price that is being charged for it, to me. If twenty people produce random results on an audibility test that doesn't mean no one in the world can hear a difference and if someone in the world can, it's not a false negative. Those people, that day, using that test method, couldn't reliably hear a difference. Would a different test method produce different results? Maybe, but that doesn't invalidate the test. There is still information contained in the results of that test. I dabble a bit in live sound. I can hear gross frequency response anomalies and tell about where on the spectrum the problem lies. I have a book on ear training that came with a disk and a bunch of training exercises, one of which is how to tell an MP3 from the 16bit PCM source. I have had it for two years and still haven't done it. I really don't care enough, I guess. If I have to be trained how to hear the difference that how much can it matter? It's not like the singer on stage is suddenly going to switch her voice to an MP3 encoded version so that skill just isn't a big deal to me. (I can hear low bit rate easily but am not that good at 320kbps, in close proximity to my monitors I think I got 7 out of ten with a test I took. One some recordings the stereo image just falls flat, on other recordings I don't think I could reliably tell. I have seen a demo where if you high pass filter the signal above a certain threshold, the masking drops away and you can easily hear it.) I rip all my digital stuff to FLAC because sometimes I can hear a difference and because, as has been said, disk space is cheap so why not. As far as the original anecdote, sure bias works both ways. That's why the test needs to be as blind as possible. Undisclosed component A vs B in that group would probably produce a different result than power cord A versus power cord B. That said, in my system, sitting in the sweet spot, a test tone with one speaker out of phase sounds like it is coming from behind my left ear instead of center between the speakers. Not sure I have enough bias in me to miss that.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,385
Location
Seattle Area
Training doesn't increase inherent acuity (it doesn't give you better ears), it only increases the interpretation and recognition skills measured by the test.
It does actually. For example, masking thresholds can become lower for trained listeners. Likewise, musicians are able to hear room reflections substantially better than general public (from Dr. Toole's book):

1549772921042.png


I know as I went through training, my detection threshold kept getting better and better.
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
And how would a listener have access to this additional acuity except in such testing? There are things detectable without a reference, but the threshold is higher. Smaller differences detectable with a reference to switch to. The faster the switching the better the results, and the lower the threshold. You posit since it all relies on some memory someone could be hearing something that is below the testing to detect which leads to a false negative. But under what circumstances could this be accessible to the perception of a listener?

This is one of the benefits of rapid switching in blind testing. When we've gotten this close to thresholds they are already well past what someone could pick up on in casual listening without a reference. Or polluted sighted long term listening. That is already a margin of safety vs the normal use of the audio equipment for listening to music.
And this is the point, if you can't detect a difference with rapid switching how will you possibly detect any difference under normal listening if audio memory is the big problem? You would never be able to tell the difference between two amps in normal listening if unable to under rapid switching, so it all becomes moot.
 

garbulky

Major Contributor
Joined
Feb 14, 2018
Messages
1,510
Likes
827
It does actually. For example, masking thresholds can become lower for trained listeners. Likewise, musicians are able to hear room reflections substantially better than general public (from Dr. Toole's book):

View attachment 21644

I know as I went through training, my detection threshold kept getting better and better.
I've played instruments all my life (though not all that well). Interesting you've said this. My favorite type of music is acoustic music captured in stereo live without being close miced. I guess I want that room ambience and reflection. For me that's part of the sound of acoustic instruments.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
It does actually. For example, masking thresholds can become lower for trained listeners. Likewise, musicians are able to hear room reflections substantially better than general public (from Dr. Toole's book):
...
I know as I went through training, my detection threshold kept getting better and better.
Exactly, this is my poimt!
Training improves your brain, not your ears. It teaches one to be a more critical listener. But it cannot change your ears or the raw data they send to your brain. That would require surgery or biological enhancement. Thus training cannot improve inherent acuity. But it clearly does improve test performance. The fact that people improve their performance with training proves that inherent acuity is more sensitive than test acuity. Training narrows that difference.

Thus we know that difference exists. And it's what makes false negatives possible.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
I've played instruments all my life (though not all that well). Interesting you've said this. My favorite type of music is acoustic music captured in stereo live without being close miced. I guess I want that room ambience and reflection. For me that's part of the sound of acoustic instruments.
Me too, and I agree. Sadly, these kinds of recordings are the exception. There is something special about the 3-D image with pinpoint spatial positioning, and sense of the room space they are playing in, that a simple 2-mic recording captures. Most modern recordings are individually close-miced and mixed. If they do this well it makes a higher resolution recording, capturing finer more subtle details. But that extra detail can be artificial: it's the kind of thing I hear in the group during rehearsal right next to the other musicians, but you can't hear even from the front row of the audience. Also these close-miced recordings don't capture the spatial aspects of the experience.
I actually prefer a dead listening room for audio. At first that sounds ironic, but the reason is I want my room to be acoustically invisible so I hear the space captured in the recording, not the false sense of ambience and reflection created by my room superimposed on the recording.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
Me too, and I agree. Sadly, these kinds of recordings are the exception. There is something special about the 3-D image with pinpoint spatial positioning, and sense of the room space they are playing in, that a simple 2-mic recording captures. Most modern recordings are individually close-miced and mixed. If they do this well it makes a higher resolution recording, capturing finer more subtle details. But that extra detail can be artificial: it's the kind of thing I hear in the group during rehearsal right next to the other musicians, but you can't hear even from the front row of the audience. Also these close-miced recordings don't capture the spatial aspects of the experience.
I actually prefer a dead listening room for audio. At first that sounds ironic, but the reason is I want my room to be acoustically invisible so I hear the space captured in the recording, not the false sense of ambience and reflection created by my room superimposed on the recording.
https://www.audiosciencereview.com/...2-0-master-file-giveaway-for-asr-members.695/

Try some of these. Mario will let you download some for free. They are pure unmolested two microphone recordings.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,424
Likes
4,030
Location
Pacific Northwest
I don't understand what people are are saying when they talk of "false negatives" in this discussion. Either the people under test could reliably hear a difference or they couldn't. The test is for whether the people being tested can detect the difference, not whether some difference exists. ... If twenty people produce random results on an audibility test that doesn't mean no one in the world can hear a difference and if someone in the world can, it's not a false negative. Those people, that day, using that test method, couldn't reliably hear a difference. ...
Here's what I mean by false negative. Test performance relies on 2 things: inherent hearing acuity, and the skills needed to perform the test: critical mental analysis of what the ears are telling us, short-term memory, concentration and focus, among other things.

If often happens that someone takes a test and fails: he cannot detect a difference. Then he gets additional training, takes the same test again and passes: he can now detect a difference. When this happens, I call the first test a false negative. The reason is simple: training doesn't change inherent hearing acuity. That person's ears have not changed. He was hearing the difference all along, but he was not able to listen to it, or to articulate this difference in the test, for whatever reason. Could be lack of concentration or focus, could be not knowing what to listen for. Whatever the reasons, they were purely mental -- by definition, since his ears haven't changed. Because he passed the second time we know in hindsight that the difference was real, and he was capable of hearing it, yet he failed the test.

This next part is where I think the grey area starts, and things get interesting and a bit philosophical. Training has 2 aspects. First, you learn to hone your perception, and become aware of things you were hearing all along yet didn't know it (at least not consciously). Second, you learn other skills needed for test performance: how to improve concentration and focus, short-term memory and comparing what you're hearing with that memory. One might argue that the first is more fundamental than the second; it refers to improving perception, or awareness of what you're hearing, while the latter refers to test-taking skills that may be less directly relevant to musical enjoyment. To the extent that training improves the first, one might be justified in saying that training improved one's perception, even if it didn't improve his hearing. From that perspective, the first test wasn't a false negative; it correctly identified that person's limited perception. But to the extent that training improves the second, this perspective is harder to justify. In other words, it's possible in the first test that the person was actually perceiving the difference but due to limited ability for focus, concentration or memory, could not articulate this in the test. These two aspects of training seem virtually impossible to differentiate, which leads to the grey area.

In short, the notion of false negatives should be nothing objectionable or novel. It's a standard aspect of all kinds of testing.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,385
Location
Seattle Area
Exactly, this is my poimt!
Training improves your brain, not your ears. It teaches one to be a more critical listener. But it cannot change your ears or the raw data they send to your brain.
What? We are never interested in just the ear. We are interested in the ear+brain combination as that is how we perceive music and all sounds. If that combination becomes better than general public, then it by definition is a more exacting test.
 

JJB70

Major Contributor
Forum Donor
Joined
Aug 17, 2018
Messages
2,905
Likes
6,148
Location
Singapore
To me it seems logical that the most important question is assessing audio is whether any difference can be detected. If a difference can be identified then we can have the argument over what people prefer and everything else but if there is no audible difference then all the other questions strike me as being pointless.
 
Top Bottom