• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Limitations of blind testing procedures

Status
Not open for further replies.

watchnerd

Grand Contributor
Joined
Dec 8, 2016
Messages
12,449
Likes
10,408
Location
Seattle Area, USA
I'm such a strong believer in the power of my mind to affect my perception that I no longer listen to gear before I purchase it.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,200
Likes
16,981
Location
Riverview FL
I'm such a strong believer in the power of my mind to affect my perception that I no longer listen to gear before I purchase it.

The last thing I pre-purchase-listened-to was sometime in the latter half of the previous century.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,200
Likes
16,981
Location
Riverview FL

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
There are unfortunate aspects to this sort of 'science' in general:
  • often the person who instigated the experiment (who maybe wants to 'prove' something) also carries out the experiment
  • the same person decides whether the results are 'any good' and can bin any experiments that don't 'give the right answer'
  • no matter how perfect the methodology, the same person writes the headline, the summary and conclusion at the end, and in doing so attempts to 'interpret' the results. These are the only bits that most people can be bothered to read. In one fell swoop, the whole experiment boils down to a variant of the experimenter's own assumptions, biases and motivations.
So that is what I get from listening test-based experiments: tiny fractions of the overall 'problem domain' barely tested sufficiently to scrape someone's definition of "statistical significance" (almost certainly falling short of the strict conditions required for statistics to be valid) and then 'laundered' by the person who did the experiment under a single headline "Phase doesn't matter!" which everyone in the industry then refers to forever more.

Apart from that, they're a truly wonderful thing.
 
Last edited:

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
There are unfortunate aspects to this sort of 'science' in general:
  • often the person who instigated the experiment (who maybe wants to 'prove' something) also carries out the experiment
  • the same person decides whether the results are 'any good' and can bin any experiments that don't 'give the right answer'
  • no matter how perfect the methodology, the same person writes the headline, the summary and conclusion at the end, and in doing so attempts to 'interpret' the results. These are the only bits that most people can be bothered to read. In one fell swoop, the whole experiment boils down to a variant of the experimenter's own assumptions, biases and motivations.
So that is what I get from listening test-based experiments: tiny fractions of the overall 'problem domain' barely tested sufficiently to scrape someone's definition of "statistical significance" (almost certainly falling short of the strict conditions required for statistics to be valid) and then 'laundered' by the person who did the experiment under a single headline "Phase doesn't matter!" which everyone in the industry then refers to forever more.

Apart from that, they're a truly wonderful thing.


Sounds to me that you are biased :p

Here is a another test I did.

Provided the same group of listeners with several tracks, two versions of each, one original, one had been re-recorded through a dac and high quality ADC. They could listen to these files at their leisure at home. Nearly everyone preferred the technically inferior re-recorded versions.

Well you can always test yourself with foobar and its ABX plugin, should eliminate most of the problems mentioned above.
 
OP
oivavoi

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,934
Location
Oslo, Norway
Many good comments here. I like the discussion. But going back to what I wrote in my starter post: Most of us here agree that sighted listening can be unreliable. But what is the rationale, then, that unsighted listening is reliable? That's a leap of faith that I don't think is fully warranted.

The underlying rationale for this leap of faith may go something like this: "The thing which really clouds our auditory judgment, is pre-existing beliefs and biases. When you remove those biases and pre-existing beliefs, our auditory judgment goes back to being pure and discriminatory".

But I don't buy into that theory about human perception. Current psychological theories about perception and cognition emphasize that we only become conscious of a very small part of our sensory input. It's like an iceberg: The conscious thing on the top is just a very small part. Most of what happens in our brains and in our nervous system never becomes conscious. It's automatic, and extremely fast. This is necessary in order to function: It's impossible to "think" about every sensory input. Imagine crossing a crowded street for example - we take in so many things at the same time, and can't think about every one of them. When it comes to audio and blind-testing, it means that we are bombarded with sensory input when we hear music. Most of the processing of this input is automatic and very fast. Our cognitive judgment about what we hear only constitutes a very small part.

This is part of the reason why sighted listening is unreliable: Because of the sheer amount of sensory input, they can be fit into many "stories" about what we hear. But it also means that blind tests may mask objective differences. When we encounter a cacophony of auditory sensory input, without any guiding biases in our head to group and sort them, it becomes difficult to form any story at all about what we hear. That's just how the brain works: We need some biases or cognitive boxes to make sense of things.

Furthermore, there is ample evidence that our sensory processing adapts to stimuli: If we hear a ticking sound for some time, we will simply stop to register it (a funny thing is that buddhist monks seem to be the exception here: because of their training in meditation and being aware of their sensory input, they don't adapt in the same way as others). So when it comes to blind testing, I would think that we only have a very small time window where we are able to register objective differences, before our sensory processing starts to adapt and the input becomes smeared.

What I think or hope is the future of psychocoustic research, is more direct measurement of unconscious reactions to music. That's how they do in a lot of the really cutting edge research in cognitive science and psychology. For example: Do eye tracking, how our eyes auomatically react when faced with different kind of stimuli. Measure directly embodied emotional reactions. Do brain scans. Etc.
And, as Amir said in post: Triangulate with objective measurements. And discuss/investigate how things actually work.
 
Last edited:
OP
oivavoi

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,934
Location
Oslo, Norway
There are unfortunate aspects to this sort of 'science' in general:
  • often the person who instigated the experiment (who maybe wants to 'prove' something) also carries out the experiment
  • the same person decides whether the results are 'any good' and can bin any experiments that don't 'give the right answer'
  • no matter how perfect the methodology, the same person writes the headline, the summary and conclusion at the end, and in doing so attempts to 'interpret' the results. These are the only bits that most people can be bothered to read. In one fell swoop, the whole experiment boils down to a variant of the experimenter's own assumptions, biases and motivations.
So that is what I get from listening test-based experiments: tiny fractions of the overall 'problem domain' barely tested sufficiently to scrape someone's definition of "statistical significance" (almost certainly falling short of the strict conditions required for statistics to be valid) and then 'laundered' by the person who did the experiment under a single headline "Phase doesn't matter!" which everyone in the industry then refers to forever more.

Apart from that, they're a truly wonderful thing.

Good comment. But again, I would say that the problem is not that they try to do science, but rather that it's a small academic field (because of the lack of funding), and that measurements have been too simple. In social psychology, there's been a huge crisis of replication lately, partly for reasons such as those you cite. Lots of spectacular findings simply failed to show up when other researchers tried to do it. But then again, some findings were replicated. I would hope that psychoacoustic studies also start to get replicated.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Sounds to me that you are biased :p
We are all biased!:)
Here is a another test I did.

Provided the same group of listeners with several tracks, two versions of each, one original, one had been re-recorded through a dac and high quality ADC. They could listen to these files at their leisure at home. Nearly everyone preferred the technically inferior re-recorded versions.
I have no objection to anyone doing any kind of listening test if that's how they like to spend their time..!

In your example above, you highlight a particular issue: you assert that the re-recorded version is "inferior" and as a result, I presume you wouldn't attempt to market it as a new recording process. But this is just one interpretation. Another person might think they had discovered something, and 'science' had demonstrated it. You can't argue with statistics after all.

Variants of MQA are actually corruptions of the original, where extra aliasing is traded off against supposedly improved 'timing'. If you found that your listeners preferred MQA and declared it to be therefore better, then it would just be the same thing, and responding to the MQA creators' 'narrative'.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,053
Many good comments here. I like the discussion. But going back to what I wrote in my starter post: Most of us here agree that sighted listening can be unreliable. But what is the rationale, then, that unsighted listening is reliable? That's a leap of faith that I don't think is fully warranted.

The underlying rationale for this leap of faith may go something like this: "The thing which really clouds our auditory judgment, is pre-existing beliefs and biases. When you remove those biases and pre-existing beliefs, our auditory judgment goes back to being pure and discriminatory".

But I don't buy into that theory about human perception. Current psychological theories about perception and cognition emphasize that we only become conscious of a very small part of our sensory input. It's like an iceberg: The conscious thing on the top is just a very small part. Most of what happens in our brains and in our nervous system never becomes conscious. It's automatic, and extremely fast. This is necessary in order to function: It's impossible to "think" about every sensory input. Imagine crossing a crowded street for example - we take in so many things at the same time, and can't think about every one of them. When it comes to audio and blind-testing, it means that we are bombarded with sensory input when we hear music. Most of the processing of this input is automatic and very fast. Our cognitive judgment about what we hear only constitutes a very small part.

This is part of the reason why sighted listening is unreliable: Because of the sheer amount of sensory input, they can be fit into many "stories" about what we hear. But it also means that blind tests may mask objective differences. When we encounter a cacophony of auditory sensory input, without any guiding biases in our head to group and sort them, it becomes difficult to form any story at all about what we hear. That's just how the brain works: We need some biases or cognitive boxes to make sense of things.

Furthermore, there is ample evidence that our sensory processing adapts to stimuli: If we hear a ticking sound for some time, we will simply stop to register it (a funny thing is that buddhist monks seem to be the exception here: because of their training in meditation and being aware of their sensory input, they don't adapt in the same way as others). So when it comes to blind testing, I would think that we only have a very small time window where we are able to register objective differences, before our sensory processing starts to adapt and the input becomes smeared.

What I think or hope is the feature of psychocoustic research, is more direct measurement of unconscious reactions to music. That's how they do in a lot of the really cutting edge research in cognitive science and psychology. For example: Do eye tracking, how our eyes auomatically react when faced with different kind of stimuli. Measure directly embodied emotional reactions. Do brain scans. Etc.
And, as Amir said in post: Triangulate with objective measurements. And discuss/investigate how things actually work.

Pretty simple really. You claim you can jump 15 feet in the air. I will put up a 15 foot marker or pole and ask you to demonstrate. You either can or you can't.

You claim you can hear 30 khz, I'll play you a tone and you either can or you can't.

You claim you can hear a difference between FLAC and wav files. I'll put up FLAC and wav files and you pick which is which. You either can or you can't. No leaps of faith either way. That is the simple version, and there is a long and deep backlog of knowledge about what we can and cannot hear. Developed academically over a hundred years or more. There is additional knowledge of how the hearing mechanism is constructed and it fits with what can be demonstrably heard.

You claim you can hear artefacts of reproduction that happens at -160 db you either can or you can't. In addition since brownian motion of air molecules is above that level we can pretty well predict quite reasonably that you can't. If you can demonstrate you do, well that will need investigating.

Oh, and from other fields involving other senses and activities we know how bias, and placebo colors actual perception. Hearing isn't something special that can thwart that fact.

So where do you find any leaps of faith?
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
An article about the potential meaningless of listening tests and MQA, and the danger of assuming that 'science' has given us answers concerning what is merely 'aesthetic judgement':
Having abandoned keeping to the Sampling Theorem, the results become a matter of subjective opinion, and the creator of the individual MQA file or stream may have some control over this judgement. It then becomes a question of whether each listener likes the result or not. In some ways the creator of an MQA file is placed in the position of being a ‘magician’ who has to know how to get the best sound from an existing recording. Unfortunately, the music business has a mixed track record in such matters. For example, the tendency to equate ‘good’ sound with ‘sells well’ which has led to many Audio CDs being level compressed and clipped to be LOUD on the basis that this ‘sells more CDs’.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Pretty simple really. You claim you can jump 15 feet in the air. I will put up a 15 foot marker or pole and ask you to demonstrate. You either can or you can't.

You claim you can hear 30 khz, I'll play you a tone and you either can or you can't.

You claim you can hear a difference between FLAC and wav files. I'll put up FLAC and wav files and you pick which is which. You either can or you can't. No leaps of faith either way. That is the simple version, and there is a long and deep backlog of knowledge about what we can and cannot hear. Developed academically over a hundred years or more. There is additional knowledge of how the hearing mechanism is constructed and it fits with what can be demonstrably heard.

You claim you can hear artefacts of reproduction that happens at -160 db you either can or you can't. In addition since brownian motion of air molecules is above that level we can pretty well predict quite reasonably that you can't. If you can demonstrate you do, well that will need investigating.

Oh, and from other fields involving other senses and activities we know how bias, and placebo colors actual perception. Hearing isn't something special that can thwart that fact.

So where do you find any leaps of faith?
If it's all so simple, then why use music as your test signal and not tones, noise, clicks and bleeps? This is the point where the whole "It's real science" thing falls apart. It turns a scientific experiment into a beauty parade and, as we know, beauty exists only in the eye of the beholder. The result can be that a real difference is masked by the 'emotional' content of the music, and that supposed preferences are really just a response to novelty or fashion. You can launder the results into statistics with six decimal places of course.:)
 
Last edited:
OP
oivavoi

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,934
Location
Oslo, Norway
You claim you can hear a difference between FLAC and wav files. I'll put up FLAC and wav files and you pick which is which. You either can or you can't. (...)
So where do you find any leaps of faith?

The leap of faith is the assumption that the subjective cognitive judgment of a listener about whether they hear something or not, is valid. It is fully possible - in principle - that there are objective differences between these files, and that these are registered fleetingly at some level in the sensory apparatus. Still, the differences might be so subtle that they don't make their way to the conscious display.

A famous example from psychology is the Gorilla experiment: http://theinvisiblegorilla.com/gorilla_experiment.html
Half of the people in the experiment failed to see a gorilla that walked onto the scene and thumped his chest, because they were so focused on other things. All of these people "saw" the gorilla, in some sense. But they weren't conscious of it, because they were consciously focusing on other things. This means that we don't have complete conscious access to our sensory or auditory input.

...there is a long and deep backlog of knowledge about what we can and cannot hear. Developed academically over a hundred years or more. There is additional knowledge of how the hearing mechanism is constructed and it fits with what can be demonstrably heard.

You claim you can hear artefacts of reproduction that happens at -160 db you either can or you can't. In addition since brownian motion of air molecules is above that level we can pretty well predict quite reasonably that you can't. If you can demonstrate you do, well that will need investigating.

No argument here. I think established and well-grounded theories about how our hearing mechanism works are very important. But such theories are usually based on more than just group listening tests, where you arrive at slight tendency that is statistically significant by crunching the numbers quite a bit. I'm more interested in the absolute thresholds: What is it definitively not possible to hear, physiologically? But for arriving at a conclusion about that, I trust direct physical measurements of bodily reactions more than I trust subjective reports.


Oh, and from other fields involving other senses and activities we know how bias, and placebo colors actual perception. Hearing isn't something special that can thwart that fact.

No argument here either. I'm not saying that bias and placebo are not at work. Of course they are. My point is that sigthed and unsigthed listening can be unreliable. So what I trust is not subjective reports about hearing, sighted or not, but sophisticated experiments etc.
 
Last edited:

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
We are all biased!:)

I have no objection to anyone doing any kind of listening test if that's how they like to spend their time..!

In your example above, you highlight a particular issue: you assert that the re-recorded version is "inferior" and as a result, I presume you wouldn't attempt to market it as a new recording process. But this is just one interpretation. Another person might think they had discovered something, and 'science' had demonstrated it. You can't argue with statistics after all.

Variants of MQA are actually corruptions of the original, where extra aliasing is traded off against supposedly improved 'timing'. If you found that your listeners preferred MQA and declared it to be therefore better, then it would just be the same thing, and responding to the MQA creators' 'narrative'.

Yes we are
No, the rerecorded version is inferior. Thats a fact not an interpretation.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Yes we are
No, the rerecorded version is inferior. Thats a fact not an interpretation.
I agree with you, but there are respected people who think that vinyl (even derived from a digital master) is superior to straight digital playback. There are people who prefer the sound of FM radio (fed from a 14 bit 32 kHz PCM feed) over CD.

In a way you are making my point for me: if you 'know' what is best from measurements and/or simple logic, why are you even bothering with listening tests? The answer, IMO, is that you are 'playing the audiophiles' game' and thinking you can win. You can't....:)
 

Purité Audio

Master Contributor
Industry Insider
Barrowmaster
Forum Donor
Joined
Feb 29, 2016
Messages
9,051
Likes
12,150
Location
London
Interesting results Alan regarding MQA what did the 100% chap listen for ,did he say?
Keith
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
I agree with you, but there are respected people who think that vinyl (even derived from a digital master) is superior to straight digital playback. There are people who prefer the sound of FM radio (fed from a 14 bit 32 kHz PCM feed) over CD.

In a way you are making my point for me: if you 'know' what is best from measurements and/or simple logic, why are you even bothering with listening tests? The answer, IMO, is that you are 'playing the audiophiles' game' and thinking you can win. You can't....:)

Well this is the point I made in another thread where a certain poster insists on tweaking without any technical reference. Its all fine, like whatever you like, but dont pretend it is anything other than tweaking to your own personal preference. It has nothing to do with technical improvement.

No actually what the listening tests have demonstrated to the other guys is that their usual sighted comparisons where they all chat together and are peer influenced as well as visually biased, are quite flawed.

There are of course some whose dogma will never be changed.
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,317
Location
Albany Western Australia
Interesting results Alan regarding MQA what did the 100% chap listen for ,did he say?
Keith

Hi Keith

I havent spoken to him yet or given the results back, we were more motivated to go off for a curry and beer byvthe end of the evening :)

I have been listening to MQA forca few weeks, and the comparisons I have done indicate that the mqa is not really noticeable on some tracks but on others there is a slight inprovement in high frequencies, maybe a bit brighter. You could say cleaner, but I could see that some would even prefer the "warmer" non mqa.

Im seeing comments elsewhere about massive differences on Tidal streaming, but I would be suspicious that theycare listening to the same masters.
 
Status
Not open for further replies.
Top Bottom