• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Master Thread: Are measurements Everything or Nothing?

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,519
Likes
18,579
Location
Netherlands
Knowing what to look for is a feature, not a bug. You're still just using your ears, you just know what to listen for.
Nobody denies that. That doesn’t mean the test should accommodate this.

Just preferring it because it’s convenient isn’t really an argument, is it? It does give you different answers after all. You’ll need to account for this in analysis as well. These two results can’t just be compared or combined.

How would this go for a medical test: you get your drugs 10 times double blind, and after every trail you were told if you received the medicine or the placebo. Would this not start messing up the results after a few rounds?
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,159
Likes
36,898
Location
The Neitherlands
This thread is on fire ...

fire triangle.png
 
Last edited:
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,833
Likes
243,185
Location
Seattle Area
Is it? Doesn’t knowing the results bias you? Because next time you know what to look for. You have been influenced by the test itself.
What next time? You select a segment and see if you have reliably heard a difference. If you have not, you move on to another segment. Getting the difference at the end of some 12 trials only slows this down compared to 5 or 6. There is no other difference.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,601
Likes
25,518
Location
Alfred, NY
Nobody denies that. That doesn’t mean the test should accommodate this.

Just preferring it because it’s convenient isn’t really an argument, is it? It does give you different answers after all. You’ll need to account for this in analysis as well. These two results can’t just be compared or combined.
If it helps test sensitivity, that's a GOOD thing.

Would you feel better if there were a "practice" set of runs where the score-to-date is given, then the counting of the "real" run begins after that if the subject feels like they have a better grasp of what to listen for? Go for it- it gets you the same sensitivity, but with a larger number of trials. The important thing is that this is still ears-only.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,833
Likes
243,185
Location
Seattle Area
Just preferring it because it’s convenient isn’t really an argument, is it?
Of course it is. If a difference is there we want it found. To slow down the process such that listener gets frustrated and gives up is wrong.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,519
Likes
18,579
Location
Netherlands
If it helps test sensitivity, that's a GOOD thing.
If that is the only thing it does.
Would you feel better if there were a "practice" set of runs where the score-to-date is given, then the counting of the "real" run begins after that if the subject feels like they have a better grasp of what to listen for? Go for it- it gets you the same sensitivity, but with a larger number of trials.
I said as much. You usually learn for a test.. same thing applies here. As a training mechanism, it’s obviously excellent.
 

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,962
Likes
6,119
How would this go for a medical test: you get your drugs 10 times double blind, and after every trail you were told if you received the medicine or the placebo. Would this not start messing up the results after a few rounds?
It doesn’t because each test is fresh.

If your medical test was an opinion on happiness, and there was an assumption that more happy pills = more happiness, then yes, knowing what you got each week would skew results.

For these audio differences, knowing each round could affect one’s preference. However, there is enough agreement (I believe) that proving that a difference exists is all you need when comparing audio. If you can prove a difference exist, one can allow for two people have PREFER different products. So it skews preference but not detectability.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,519
Likes
18,579
Location
Netherlands
Of course it is. If a difference is there we want it found. To slow down the process such that listener gets frustrated and gives up is wrong.
I’m with you guys on all of that.

My concern is more with the interpretation of results. Not on an individual level, but on the collective statistics. My guess would be that the distribution of scores wil be different with and without intermediate results. Because people that know how to play the system will stick out more.

So question then is: how more valid is that metric if a subset of people know how to use this “cheat”? Yes you say: you improved detectability and it shows. But those are people that know how to expect exploit it. Most people don’t know though.

So what is then needed is to teach everyone doing the test why the intermediate results are there, and how you can utilize them.

If your medical test was an opinion on happiness, and there was an assumption that more happy pills = more happiness, then yes, knowing what you got each week would skew results.
It doesn’t matter what the pills are for. If you can find out after a few rounds which is which, the trail will fail. So thanks for tanking your own argument ;)
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,833
Likes
243,185
Location
Seattle Area
I said as much. You usually learn for a test.. same thing applies here. As a training mechanism, it’s obviously excellent.
I am not sure what you are saying. When you listen to two files, you routinely "hear" differences. These differences need to be put to test to see if they are valid -- the very purpose of automated ABX test. When differences shrink, you get hit with more false positives. You need to get through these as quickly as you can until or if you land on a real difference. You can't determine that section a priori. You need the feedback from the test fixture as there is no other way to know.

Keep in mind that it is rarely the case that proper material is selected to show the difference. As such, the difference may only manifest itself in smallest segment of music as I have explained before. Finding that needle in haystack may be quite difficult without the tool providing the feedback as you hunt for it.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,833
Likes
243,185
Location
Seattle Area
So question then is: how more valid is that metric if a subset of people know how to use this “cheat”? Yes you say: you improved detectability and it shows. But those are people that know how to expect exploit it. Most people don’t know though.
You don't nearly get the type of help you are imagining. The tests I showed often took considerable amount of time to pass. Without the tool providing the feedback, I would have just given up quickly and not even tried to get to the conclusion. So you would have been left with no data.

Also remember that you want to extrapolate me passing to general population, some of whom may be more capable than me. In that regard, you want to give me all the help I need to find differences.
 

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,739
Likes
6,079
Location
US East
It doesn’t matter what the pills are for. If you can find out after a few rounds which is which, the trail will fail. So thanks for tanking your own argument ;)
No. In an ABX test you still don't know if the next X is A or B. If the test let you are learn how to get better at discriminating between them, it makes the test more sensitive.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,519
Likes
18,579
Location
Netherlands
I would have just given up quickly and not even tried to get to the conclusion. So you would have been left with no data.
No, you’d have different data: no audible differences if you don’t know where to look. That is still data, and can still answer a valid research question, just a different one.
Also remember that you want to extrapolate me passing to general population, some of whom may be more capable than me. In that regard, you want to give me all the help I need to find differences.
As long as you know how to. It’s like comparing a chess grandmaster to me: I’d be checkmate in 10 ;). That’s why I said: if you allow it, make everyone aware of the usefulness.

You guys keep coming with arguments I agree with, I wonder why I’m still not totally convinced :facepalm:
 

manisandher

Addicted to Fun and Learning
Joined
Nov 6, 2016
Messages
656
Likes
614
Location
Royal Leamington Spa, UK
Am I mistaking you for someone else, or are you the guy here who pops up regularly ASR thread reporting truly *outstanding* claims of hearing differences that by all boring old normie audio science thinking should be well nigh impossible?

Don't know. If you consider a 25% post rate compared to yours as 'regular' then maybe.

The simplest test of that is to blind the golden ear, level match his samples, and let that golden ear try to do it just as he did before. (An ABX can be an efficient way to do taht, though it's not the only double blind listening protocol). Can he still do it when the very simplest controls are put in place?

I've shared the results of a blind listening test I was involved in (along with another ASR member) in the past before, but I think they're pertinent to this discussion.

First, some context:

1a. I was certain that I could hear differences between two different buffer settings in a software player. (The buffer settings were proven not to change the bits.)
1b. If a mechanism existed that caused bit-identical replay to sound different, I wanted to get to the bottom of it.
2. I was confident that I could pass a blind test to demonstrate that I could hear a difference between the bit-identical settings.
3. I asked someone (now a fellow ASR member) to help me conduct the blind test.
4. We agreed that the results of the test would be published online in their entirety, irrrespective of how they fell.

Here are the results:
Listening Test - Cumulative results.jpg

Now, we've already had all sorts of discsussions about the test setup, protocol, etc., so no need to rehash them. What I think might be of interest is a discsussion of how my pyschology possibly affected the results of the test (3 actually).

Test 1 (non-ABX) - 4/10

I was pretty confident that I'd pass a 'simple' ABXXXXXXXXXX test. But I'd never been involved in any blind tests before and hadn't had any time beforehand to practice, or to even consider how to approach the test. B really did sound different to A. But then, the first X threw me and I never recovered. I believe I still heard differences between the Xs, but without a prior AB reference, couldn't assign them correctly to A or B.

Test 2 (non-ABX) - 6/10

Continuing with ABXXXXXXXXXX. I'd now had some practice, and also some time to consider a more effective approach. I decided that what I needed to do was to simply compare X with X-1. Was there a difference or not? I thought this would work, but for whatever reason it didn't.

Test 3 (ABX) - 9/10

By this point, the whole exercise had been an abject failure, as far as I was concerned. We would post the 'no better than guessing' results online. I relaxed and thought, "what ever will be will be". But in the first ABX sample, the X really did sound more like A. And in the second, more like A. Etc., etc. By sample 9, we were >15 minutes into the ABX, and I was pretty shattered with all the concentration. I got #9 wrong.

**********

FWIW. (Probably not a lot in the eyes of most regulars here.)

Mani.
 

bboris77

Senior Member
Joined
Oct 23, 2018
Messages
460
Likes
957
I would like to go back to the original set of questions about the importance of measurements in audio posed at the beginning of this thread.

As someone that firmly believed in the concept of ones and zeros in digital audio and video in my 20s, I became seduced by the allure of the magical tube sound narrative and other subjectivist myths. Eventually I made my way back to the objectivist side after I realized that I could not reliably discern between a $99 delta sigma DAC and $699 R2R one when level matched. I also experienced many noise-related issues with various tube amps and started resenting their unpredictability.

IMO, measurements are extremely useful to identify faults in amps that will clearly be audible in normal use. Examples of these are audible distortion with low impedance transducers, high noise floor with high sensitivity transducers, lack of rfi interference rejection, audible power supply hum and ground loop creation potential for USB DACs and amps that offer unbalanced RCA connections only and use 3-pronged power plugs. I’ve experienced all of these issues in various DACs and amps which could have been caught and addressed at the design stage had proper measurements been made.

What measurements are not useful for, after we get past the threshold of audibility, is measuring things for the sake of SINAD race, which is what has been happening increasingly over the last few years. Does not matter if this was presented as a pursuit of excellence in audio engineering, it still effectively became a dick-measuring contest. The sad part is that this SINAD myopia came at the expense of focusing on things like reliability in certain cases.

I’m even more sceptical on the usefulness of blind testing as a measurement tool. Since our hearing and audio memory are both flawed, subjective and easily affected by our biases, comparing how well people do in double-blind tests can only lead to the development of the “perfect pitch” type of elitism where one person’s opinion will be worth more than another’s assuming that their score is higher.

The good news is that we can all enjoy audio excellence now at very affordable prices. For someone that grew up with audio cassettes, I can truly appreciate it. There’s no reason to use the SINAD fallacy to try and con(vince) the newbie audiophiles that they need to spend thousands of dollars on their DACs and amps to experience superior audio. Otherwise, measurement-obsessed objectivists are effectively doing the same thing as various snake oil subjectivist salesmen - shelling expensive gear that makes no actual difference to the listening experience.
 

danadam

Major Contributor
Joined
Jan 20, 2017
Messages
1,016
Likes
1,583
I am not sure what you are saying. When you listen to two files, you routinely "hear" differences. These differences need to be put to test to see if they are valid -- the very purpose of automated ABX test. When differences shrink, you get hit with more false positives. You need to get through these as quickly as you can until or if you land on a real difference. You can't determine that section a priori. You need the feedback from the test fixture as there is no other way to know.
Isn't that what "Training mode" is for? It runs unlimited trials and updates results after each choice.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,519
Likes
18,579
Location
Netherlands
Isn't that what "Training mode" is for? It runs unlimited trials and updates results after each choice.
@amirm likes to have his training wheels on all the time it seems. Makes him bike faster o_O

… just kidding guys, I’ll stop now :)

Edit: on a serious note: if you have young kinds, don’t buy training wheels. They’ll only slow down the learning curve of cycling! Just buy a good size bike, and usually they’ll be cycling within the hour, without any training wheels.
 
Last edited:

birdog1960

Senior Member
Joined
Oct 18, 2022
Messages
309
Likes
329
Location
Virginia
I would like to go back to the original set of questions about the importance of measurements in audio posed at the beginning of this thread.

As someone that firmly believed in the concept of ones and zeros in digital audio and video in my 20s, I became seduced by the allure of the magical tube sound narrative and other subjectivist myths. Eventually I made my way back to the objectivist side after I realized that I could not reliably discern between a $99 delta sigma DAC and $699 R2R one when level matched. I also experienced many noise-related issues with various tube amps and started resenting their unpredictability.

IMO, measurements are extremely useful to identify faults in amps that will clearly be audible in normal use. Examples of these are audible distortion with low impedance transducers, high noise floor with high sensitivity transducers, lack of rfi interference rejection, audible power supply hum and ground loop creation potential for USB DACs and amps that offer unbalanced RCA connections only and use 3-pronged power plugs. I’ve experienced all of these issues in various DACs and amps which could have been caught and addressed at the design stage had proper measurements been made.

What measurements are not useful for, after we get past the threshold of audibility, is measuring things for the sake of SINAD race, which is what has been happening increasingly over the last few years. Does not matter if this was presented as a pursuit of excellence in audio engineering, it still effectively became a dick-measuring contest. The sad part is that this SINAD myopia came at the expense of focusing on things like reliability in certain cases.

I’m even more sceptical on the usefulness of blind testing as a measurement tool. Since our hearing and audio memory are both flawed, subjective and easily affected by our biases, comparing how well people do in double-blind tests can only lead to the development of the “perfect pitch” type of elitism where one person’s opinion will be worth more than another’s assuming that their score is higher.

The good news is that we can all enjoy audio excellence now at very affordable prices. For someone that grew up with audio cassettes, I can truly appreciate it. There’s no reason to use the SINAD fallacy to try and con(vince) the newbie audiophiles that they need to spend thousands of dollars on their DACs and amps to experience superior audio. Otherwise, measurement-obsessed objectivists are effectively doing the same thing as various snake oil subjectivist salesmen - shelling expensive gear that makes no actual difference to the listening experience.
this argues for a measurement that could be called the "value quotient": nearness to optimal available performance/cost.. I'm sure a great many would be interested in that. I know I would. If optimal available performance within a subset of audio equipment can't be defined, then the science needs to mature or is irrelevant.
 
Last edited:

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,601
Likes
25,518
Location
Alfred, NY
I’m even more sceptical on the usefulness of blind testing as a measurement tool.
It's not a measurement tool, it's a basic and absolutely necessary control to have any kind of experimental validity for determining audible factors.
 
Top Bottom