• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The deaf leading the blind? A piece by Henning Møller (B&K)

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,483
Likes
25,237
Location
Alfred, NY
If you can hear it without peeking, you can hear it. If you can’t hear it without peeking, you can’t hear it. It’s really pretty simple.

If you don’t implement basic controls, it’s not science, it’s make-believe.
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
I'm not saying having references and experience can't help, I'm just saying that it's not possible to extract what you hear from what you know, what you feel, and what you imperfectly recollect.

Isn't this just a basic, well-established fact of psychology?

To argue this by analogy: If you were offered a drug for some given medical condition, and that drug had been tested and found to have no positive effect under rigorous test conditions, but to have had a beneficial effect under sighted conditions only, would you accept this drug simply because the participants in the sighted trials had used the drug and others like it for a long time and believed they had developed an ability to discern it's (purported) effects?

I would rather not take drugs from strangers if you don't mind. But my wife is PI in clinical trials, should I ask her for a few anecdote cases?

I understand the concept of bias. I understand the reasoning behind short snippets in AB comparisons.
I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? concerning tonal balance, which is why the results are often that everything sounds the same.
You are saying that it is not possible to assess those other aspects of performance (shortcomings) in way or form because the mind plays tricks and there's no way this can be done unsighted.
I have no way of proving you wrong.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,483
Likes
25,237
Location
Alfred, NY
I understand the concept of bias. I understand the reasoning behind short snippets in AB comparisons.
I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? concerning tonal balance, which is why the results are often that everything sounds the same.

Unfortunately, this is absolutely not true, and if you had even a minor familiarity with the literature, you'd know how untrue it is.

Nice strawman about "short snippets" you snuck in there, BTW.
 

Soniclife

Major Contributor
Forum Donor
Joined
Apr 13, 2017
Messages
4,508
Likes
5,436
Location
UK
To argue this by analogy: If you were offered a drug for some given medical condition, and that drug had been tested and found to have no positive effect under rigorous test conditions, but to have had a beneficial effect under sighted conditions only, would you accept this drug simply because the participants in the sighted trials had used the drug and others like it for a long time and believed they had developed an ability to discern its (purported) effects?
If you have tried all the sensible alternative treatments without success you take the placebo, it's such a strong effect that for symtom relief it can survive logic and knowledge and still work well.
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Unfortunately, this is absolutely not true, and if you had even a minor familiarity with the literature, you'd know how untrue it is.

Nice strawman about "short snippets" you snuck in there, BTW.

I forgot to sneak in the "over a single speaker"... That's even nicer.
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,403
I would rather not take drugs from strangers if you don't mind. But my wife is PI in clinical trials, should I ask her for a few anecdote cases?

I understand the concept of bias. I understand the reasoning behind short snippets in AB comparisons.
I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? concerning tonal balance, which is why the results are often that everything sounds the same.
You are saying that it is not possible to assess those other aspects of performance (shortcomings) in way or form because the mind plays tricks and there's no way this can be done unsighted.
I have no way of proving you wrong.

I'd be interested to hear what your wife's opinion is on whether there are any circumstances in which sighted tests can be more reliable than double-blind ones?

Also, as @SIY said, this "short snippet" thing is a red herring. Like I said a few posts back, the thing that needs to be short in order to give subjects the best chance of hearing a difference is the switching time, not the duration of the stimulus.
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
I'd be interested to hear what your wife's opinion is on whether there are any circumstances in which sighted tests can be more reliable than double-blind ones?

When the patient has temporary blindness and takes medication. Surely.

I don't think that sighted tests can be more reliable than double-blind ones, quite the contrary. I never wrote that either.
I have nothing against blind tests, only against AB tests.
Sometimes double-blind is not practical, sometimes an AB comparison is not fit for purpose.
Should we not test at all?

I keep returning to Klippel for some reason:

Subjective evaluation is required to assess the audibility and the impact on perceived sound quality.

Some distortions which are audible might still be acceptable or even desirable in some applications.

Systematic listening tests, nonlinear auralization, and objective assessment based on a perceptual model are useful tools to assess regular distortion.

What does he mean by subjective evaluation?
Is it subjective double-blind?
Is it AB comparing?

Does anyone have his email?
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Also, as @SIY said, this "short snippet" thing is a red herring. Like I said a few posts back, the thing that needs to be short in order to give subjects the best chance of hearing a difference is the switching time, not the duration of the stimulus.

So you listen to the whole of Götterdämmerung then you switch very quickly and then you listen to it again?
You switch back and forth between A and/or B and/or X every few seconds?

Maybe short-snippet is not what it used to be...
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,403
So you listen to the whole of Götterdämmerung then you switch very quickly and then you listen to it again?
You switch back and forth between A and/or B and/or X every few seconds?

Maybe short-snippet is not what it used to be...

Generally, the participant can listen to the stimulus for as long as they like, making a choice based on their own perception of whatever duration allows them to make the most reliable assessment.

Having said that, I'm not aware of any study in which it was found that long-duration listening gave subjects any better chance of making a correct choice, whereas I'm aware of numerous studies where the inverse was found to be true - not to mention a large body of research giving a neurological basis for why this would be.

I don't think that sighted tests can be more reliable than double-blind ones, quite the contrary. I never wrote that either.
I have nothing against blind tests, only against AB tests.
Sometimes double-blind is not practical, sometimes an AB comparison is not fit for purpose.
Should we not test at all?

Ok.

Firstly, if I understood you correctly, you were arguing that long-term sighted listening tests are more reliable than short-duration blind listening tests.

I'm saying that's a red herring. There's nothing about either blind testing methods, or ABX testing methods, that necessarily relates in any way to stimulus duration.

Secondly, aren't the kinds of sighted tests you're arguing for nothing other than AB comparison tests, i.e. in which you compare your system with a new piece of gear in it, sighted, to your reference system without it? That would seem to me to be the definition of an AB test.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,483
Likes
25,237
Location
Alfred, NY
So you listen to the whole of Götterdämmerung then you switch very quickly and then you listen to it again?

I don't use Wagner.

Is strawman the only argument style you have?
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,596
Likes
239,647
Location
Seattle Area
I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely?
Your thinking is quite wrong. I have passed countless listening tests of smallest audible differences in double blind, controlled tests that most audiophiles won't even touch let alone be able to pass:

===========
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/08/02 13:52:46

File A: C:\Users\Amir\Music\Archimago\24-bit Audio Test (Hi-Res 24-96, FLAC, 2014)\01 - Sample A - Bozza - La Voie Triomphale.flac
File B: C:\Users\Amir\Music\Archimago\24-bit Audio Test (Hi-Res 24-96, FLAC, 2014)\02 - Sample B - Bozza - La Voie Triomphale.flac

13:52:46 : Test started.
13:54:02 : 01/01 50.0%
13:54:11 : 01/02 75.0%
13:54:57 : 02/03 50.0%
13:55:08 : 03/04 31.3%
13:55:15 : 04/05 18.8%
13:55:24 : 05/06 10.9%
13:55:32 : 06/07 6.3%
13:55:38 : 07/08 3.5%
13:55:48 : 08/09 2.0%
13:56:02 : 09/10 1.1%
13:56:08 : 10/11 0.6%
13:56:28 : 11/12 0.3%
13:56:37 : 12/13 0.2%
13:56:49 : 13/14 0.1%
13:56:58 : 14/15 0.0%
13:57:05 : Test finished.

----------
Total: 14/15 (0.0%)

The file names seem similar but one is lowered in resolution (bit depth?) and then moved back to 24-bits with countermeasures added to make mechanical analysis difficult.

It was one of the easier tests to pass compared to others because I knew what to listen for. I read some sarcastic and rude remarks from Archimago on him thinking it is impossible for me to have passed this test. I suggest he undertake some training on becoming a critical listener and where such differences may be audible.

These are the AIX challenge that was posted on another forum a few years ago.
=============

foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/11 06:18:47

File A: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_B2.wav

06:18:47 : Test started.
06:19:38 : 00/01 100.0%
06:20:15 : 00/02 100.0%
06:20:47 : 01/03 87.5%
06:21:01 : 01/04 93.8%
06:21:20 : 02/05 81.3%
06:21:32 : 03/06 65.6%
06:21:48 : 04/07 50.0%
06:22:01 : 04/08 63.7%
06:22:15 : 05/09 50.0%
06:22:24 : 05/10 62.3%
06:23:15 : 06/11 50.0%
06:23:27 : 07/12 38.7%
06:23:36 : 08/13 29.1%
06:23:49 : 09/14 21.2%
06:24:02 : 10/15 15.1%
06:24:10 : 11/16 10.5%
06:24:20 : 12/17 7.2%
06:24:27 : 13/18 4.8%
06:24:35 : 14/19 3.2%
06:24:40 : 15/20 2.1%
06:24:46 : 16/21 1.3%
06:24:56 : 17/22 0.8%
06:25:04 : 18/23 0.5%
06:25:13 : 19/24 0.3%
06:25:25 : 20/25 0.2%
06:25:32 : 21/26 0.1%
06:25:38 : 22/27 0.1%
06:25:45 : 23/28 0.0%
06:25:51 : 24/29 0.0%
06:25:58 : 25/30 0.0%
06:26:24 : Test finished.

----------
Total: 25/30 (0.0%)

===========
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 21:01:16

File A: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_B2.wav

21:01:16 : Test started.
21:02:11 : 01/01 50.0%
21:02:20 : 02/02 25.0%
21:02:28 : 03/03 12.5%
21:02:38 : 04/04 6.3%
21:02:47 : 05/05 3.1%
21:02:56 : 06/06 1.6%
21:03:06 : 07/07 0.8%
21:03:16 : 08/08 0.4%
21:03:26 : 09/09 0.2%
21:03:45 : 10/10 0.1%
21:03:54 : 11/11 0.0%
21:04:11 : 12/12 0.0%
21:04:24 : Test finished.

----------
Total: 12/12 (0.0%)

==========

oo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 18:50:44

File A: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_B2.wav

18:50:44 : Test started.
18:51:25 : 00/01 100.0%
18:51:38 : 01/02 75.0%
18:51:47 : 02/03 50.0%
18:51:55 : 03/04 31.3%
18:52:05 : 04/05 18.8%
18:52:21 : 05/06 10.9%
18:52:32 : 06/07 6.3%
18:52:43 : 07/08 3.5%
18:52:59 : 08/09 2.0%
18:53:10 : 09/10 1.1%
18:53:19 : 10/11 0.6%
18:53:23 : Test finished.

----------
Total: 10/11 (0.6%)

============

I would FAIL every one of these tests if I was not able to quickly switch and compare. This has been proven by others. Read: https://www.audiosciencereview.com/...ity-and-reliability-of-abx-blind-testing.186/

Really, you are rehashing myths in audio. If you are here, invest in the time to learn the reality. We know your arguments. We know your experiences. They simply are not valid.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,596
Likes
239,647
Location
Seattle Area
I keep returning to Klippel for some reason:

Subjective evaluation is required to assess the audibility and the impact on perceived sound quality.

Some distortions which are audible might still be acceptable or even desirable in some applications.

Systematic listening tests, nonlinear auralization, and objective assessment based on a perceptual model are useful tools to assess regular distortion.
Careful to not mix different audio domains. In acoustics, single microphone measurements can lie as we have two ears and a brain that is not represented by them. That is where Klippel comes from.

The right measurements of course have great value. Just don't fall for generalizations in what you quoted.
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Generally, the participant can listen to the stimulus for as long as they like, making a choice based on their own perception of whatever duration allows them to make the most reliable assessment.

Having said that, I'm not aware of any study in which it was found that long-duration listening gave subjects any better chance of making a correct choice, whereas I'm aware of numerous studies where the inverse was found to be true - not to mention a large body of research giving a neurological basis for why this would be.

I've only done part of the Philips Golden Ear Challenge and that used the same track and just a snippet.

Firstly, if I understood you correctly, you were arguing that long-term sighted listening tests are more reliable than short-duration blind listening tests.

I'm saying that's a red herring. There's nothing about either blind testing methods, or ABX testing methods, that necessarily relates in any way to stimulus duration.

Secondly, aren't the kinds of sighted tests you're arguing for nothing other than AB comparison tests, i.e. in which you compare your system with a new piece of gear in it, sighted, to your reference system without it? That would seem to me to be the definition of an AB test.

I was arguing that long-term (week or weeks) listening assessment (its not a direct comparison) is more adequate to identify certain shortcomings than short-duration blind listening A-B comparisons which chiefly tonal balance driven.
When you replace a piece of equipment if the difference is large then the difference may be readily apparent. Returning to the original setup after a week or two may also expose differences. This could be performed blind.
Perhaps long-term listening assessments focus on identifying shortcomings (using specific pieces of music to spot particular problems, even pink noise) not as direct comparisons could be performed blind but I am not sure how that could be nor if it would be needed considering that you are not performing a direct comparison.
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
Your thinking is quite wrong. I have passed countless listening tests of smallest audible differences in double blind, controlled tests that most audiophiles won't even touch let alone be able to pass:

===========
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/08/02 13:52:46

File A: C:\Users\Amir\Music\Archimago\24-bit Audio Test (Hi-Res 24-96, FLAC, 2014)\01 - Sample A - Bozza - La Voie Triomphale.flac
File B: C:\Users\Amir\Music\Archimago\24-bit Audio Test (Hi-Res 24-96, FLAC, 2014)\02 - Sample B - Bozza - La Voie Triomphale.flac

13:52:46 : Test started.
13:54:02 : 01/01 50.0%
13:54:11 : 01/02 75.0%
13:54:57 : 02/03 50.0%
13:55:08 : 03/04 31.3%
13:55:15 : 04/05 18.8%
13:55:24 : 05/06 10.9%
13:55:32 : 06/07 6.3%
13:55:38 : 07/08 3.5%
13:55:48 : 08/09 2.0%
13:56:02 : 09/10 1.1%
13:56:08 : 10/11 0.6%
13:56:28 : 11/12 0.3%
13:56:37 : 12/13 0.2%
13:56:49 : 13/14 0.1%
13:56:58 : 14/15 0.0%
13:57:05 : Test finished.

----------
Total: 14/15 (0.0%)

The file names seem similar but one is lowered in resolution (bit depth?) and then moved back to 24-bits with countermeasures added to make mechanical analysis difficult.

It was one of the easier tests to pass compared to others because I knew what to listen for. I read some sarcastic and rude remarks from Archimago on him thinking it is impossible for me to have passed this test. I suggest he undertake some training on becoming a critical listener and where such differences may be audible.

These are the AIX challenge that was posted on another forum a few years ago.
=============

foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/11 06:18:47

File A: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_B2.wav

06:18:47 : Test started.
06:19:38 : 00/01 100.0%
06:20:15 : 00/02 100.0%
06:20:47 : 01/03 87.5%
06:21:01 : 01/04 93.8%
06:21:20 : 02/05 81.3%
06:21:32 : 03/06 65.6%
06:21:48 : 04/07 50.0%
06:22:01 : 04/08 63.7%
06:22:15 : 05/09 50.0%
06:22:24 : 05/10 62.3%
06:23:15 : 06/11 50.0%
06:23:27 : 07/12 38.7%
06:23:36 : 08/13 29.1%
06:23:49 : 09/14 21.2%
06:24:02 : 10/15 15.1%
06:24:10 : 11/16 10.5%
06:24:20 : 12/17 7.2%
06:24:27 : 13/18 4.8%
06:24:35 : 14/19 3.2%
06:24:40 : 15/20 2.1%
06:24:46 : 16/21 1.3%
06:24:56 : 17/22 0.8%
06:25:04 : 18/23 0.5%
06:25:13 : 19/24 0.3%
06:25:25 : 20/25 0.2%
06:25:32 : 21/26 0.1%
06:25:38 : 22/27 0.1%
06:25:45 : 23/28 0.0%
06:25:51 : 24/29 0.0%
06:25:58 : 25/30 0.0%
06:26:24 : Test finished.

----------
Total: 25/30 (0.0%)

===========
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 21:01:16

File A: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_B2.wav

21:01:16 : Test started.
21:02:11 : 01/01 50.0%
21:02:20 : 02/02 25.0%
21:02:28 : 03/03 12.5%
21:02:38 : 04/04 6.3%
21:02:47 : 05/05 3.1%
21:02:56 : 06/06 1.6%
21:03:06 : 07/07 0.8%
21:03:16 : 08/08 0.4%
21:03:26 : 09/09 0.2%
21:03:45 : 10/10 0.1%
21:03:54 : 11/11 0.0%
21:04:11 : 12/12 0.0%
21:04:24 : Test finished.

----------
Total: 12/12 (0.0%)

==========

oo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 18:50:44

File A: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_B2.wav

18:50:44 : Test started.
18:51:25 : 00/01 100.0%
18:51:38 : 01/02 75.0%
18:51:47 : 02/03 50.0%
18:51:55 : 03/04 31.3%
18:52:05 : 04/05 18.8%
18:52:21 : 05/06 10.9%
18:52:32 : 06/07 6.3%
18:52:43 : 07/08 3.5%
18:52:59 : 08/09 2.0%
18:53:10 : 09/10 1.1%
18:53:19 : 10/11 0.6%
18:53:23 : Test finished.

----------
Total: 10/11 (0.6%)

============

I would FAIL every one of these tests if I was not able to quickly switch and compare. This has been proven by others. Read: https://www.audiosciencereview.com/...ity-and-reliability-of-abx-blind-testing.186/

Really, you are rehashing myths in audio. If you are here, invest in the time to learn the reality. We know your arguments. We know your experiences. They simply are not valid.

What was the test? Were you trying to determine if there was a difference?
I did that on the Philips Golden Ear Challenge.
That is not at all what I am talking about.

 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,483
Likes
25,237
Location
Alfred, NY
"I did one vaguely related test once, so now I know about all tests ever."

Read the literature so that you can argue knowledgeably.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,596
Likes
239,647
Location
Seattle Area
That is not at all what I am talking about.
What? You said this: "I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? "

The differences I detected could not be smaller. Opposite of "crude differences." I also pointed you to AES paper published showing the same results. Did you not read it?
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,403
I've only done part of the Philips Golden Ear Challenge and that used the same track and just a snippet.

Ok. Just because the blind listening test you've done happened to be short-duration doesn't mean that all the others are (although the ones in which the highest sensitivity was shown to differences are) ;)

I was arguing that long-term (week or weeks) listening assessment (its not a direct comparison) is more adequate to identify certain shortcomings than short-duration blind listening A-B comparisons which chiefly tonal balance driven.

Why do you say that short-duration (which I again maintain is a misconception) blind listening A-B comparisons are concerned chiefly with tonal balance? I can't help thinking this might be again some kind of generalisation based on the one you've done?

Let me give you a few examples of double blind ABX tests that were not interested in tonal balance:
In each of those studies, there are dozens of references to similar or related previous studies.

Perhaps long-term listening assessments focus on identifying shortcomings (using specific pieces of music to spot particular problems, even pink noise) not as direct comparisons could be performed blind but I am not sure how that could be nor if it would be needed considering that you are not performing a direct comparison.

Ok, now I think I may be starting to see where we're coming from different perspectives here. You're talking about subjective (whether sighted or blind) tests being used to identify shortcomings in audio components.

Why would you do this instead of measuring the equipment objectively, given that objective measurements provide infinitely more detailed and precise data than even the most sensitive human?

And moreover, how is it possible to listen to a piece of equipment to try to identify shortcomings without comparing it to an imagined/remembered reference?
 
OP
tuga

tuga

Major Contributor
Joined
Feb 5, 2020
Messages
3,984
Likes
4,285
Location
Oxford, England
What? You said this: "I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? "

The differences I detected could not be smaller. Opposite of "crude differences." I also pointed you to AES paper published showing the same results. Did you not read it?

You didn't quote my sentence fully.

I don't think that short-snippet AB comparisons are effective at determining anything but crude differences mostly/entirely? concerning tonal balance,

I was referring to equipment comparisons, not media resolution, EQ, etc.

And to A-B comparisons in relation to long-term assessment. I can't point you to any literature I'm affraid. Scarecrow.
 
Top Bottom