Master Thread: Are measurements Everything or Nothing?

voodooless · Dec 26, 2022

SIY said:
Knowing what to look for is a feature, not a bug. You're still just using your ears, you just know what to listen for.

Nobody denies that. That doesn’t mean the test should accommodate this.

Just preferring it because it’s convenient isn’t really an argument, is it? It does give you different answers after all. You’ll need to account for this in analysis as well. These two results can’t just be compared or combined.

How would this go for a medical test: you get your drugs 10 times double blind, and after every trail you were told if you received the medicine or the placebo. Would this not start messing up the results after a few rounds?

solderdude · Dec 26, 2022

This thread is on fire ...

amirm · Dec 26, 2022

voodooless said:
Is it? Doesn’t knowing the results bias you? Because next time you know what to look for. You have been influenced by the test itself.

What next time? You select a segment and see if you have reliably heard a difference. If you have not, you move on to another segment. Getting the difference at the end of some 12 trials only slows this down compared to 5 or 6. There is no other difference.

SIY · Dec 26, 2022

voodooless said:
Nobody denies that. That doesn’t mean the test should accommodate this.

Just preferring it because it’s convenient isn’t really an argument, is it? It does give you different answers after all. You’ll need to account for this in analysis as well. These two results can’t just be compared or combined.

If it helps test sensitivity, that's a GOOD thing.

Would you feel better if there were a "practice" set of runs where the score-to-date is given, then the counting of the "real" run begins after that if the subject feels like they have a better grasp of what to listen for? Go for it- it gets you the same sensitivity, but with a larger number of trials. The important thing is that this is still ears-only.

amirm · Dec 26, 2022

voodooless said:
Just preferring it because it’s convenient isn’t really an argument, is it?

Of course it is. If a difference is there we want it found. To slow down the process such that listener gets frustrated and gives up is wrong.

voodooless · Dec 26, 2022

SIY said:
If it helps test sensitivity, that's a GOOD thing.

If that is the only thing it does.

SIY said:
Would you feel better if there were a "practice" set of runs where the score-to-date is given, then the counting of the "real" run begins after that if the subject feels like they have a better grasp of what to listen for? Go for it- it gets you the same sensitivity, but with a larger number of trials.

I said as much. You usually learn for a test.. same thing applies here. As a training mechanism, it’s obviously excellent.

GXAlan · Dec 26, 2022

voodooless said:
How would this go for a medical test: you get your drugs 10 times double blind, and after every trail you were told if you received the medicine or the placebo. Would this not start messing up the results after a few rounds?

It doesn’t because each test is fresh.

If your medical test was an opinion on happiness, and there was an assumption that more happy pills = more happiness, then yes, knowing what you got each week would skew results.

For these audio differences, knowing each round could affect one’s preference. However, there is enough agreement (I believe) that proving that a difference exists is all you need when comparing audio. If you can prove a difference exist, one can allow for two people have PREFER different products. So it skews preference but not detectability.

voodooless · Dec 26, 2022

amirm said:
Of course it is. If a difference is there we want it found. To slow down the process such that listener gets frustrated and gives up is wrong.

I’m with you guys on all of that.

My concern is more with the interpretation of results. Not on an individual level, but on the collective statistics. My guess would be that the distribution of scores wil be different with and without intermediate results. Because people that know how to play the system will stick out more.

So question then is: how more valid is that metric if a subset of people know how to use this “cheat”? Yes you say: you improved detectability and it shows. But those are people that know how to expect exploit it. Most people don’t know though.

So what is then needed is to teach everyone doing the test why the intermediate results are there, and how you can utilize them.

GXAlan said:
If your medical test was an opinion on happiness, and there was an assumption that more happy pills = more happiness, then yes, knowing what you got each week would skew results.

It doesn’t matter what the pills are for. If you can find out after a few rounds which is which, the trail will fail. So thanks for tanking your own argument

amirm · Dec 26, 2022

voodooless said:
I said as much. You usually learn for a test.. same thing applies here. As a training mechanism, it’s obviously excellent.

I am not sure what you are saying. When you listen to two files, you routinely "hear" differences. These differences need to be put to test to see if they are valid -- the very purpose of automated ABX test. When differences shrink, you get hit with more false positives. You need to get through these as quickly as you can until or if you land on a real difference. You can't determine that section a priori. You need the feedback from the test fixture as there is no other way to know.

Keep in mind that it is rarely the case that proper material is selected to show the difference. As such, the difference may only manifest itself in smallest segment of music as I have explained before. Finding that needle in haystack may be quite difficult without the tool providing the feedback as you hunt for it.

antcollinet · Dec 26, 2022

manisandher said:
He made it clear that the fuses had nothing to do with him.

He also defended them and made an unsubstantiated claim about how they can make a difference.

amirm · Dec 26, 2022

voodooless said:
So question then is: how more valid is that metric if a subset of people know how to use this “cheat”? Yes you say: you improved detectability and it shows. But those are people that know how to expect exploit it. Most people don’t know though.

You don't nearly get the type of help you are imagining. The tests I showed often took considerable amount of time to pass. Without the tool providing the feedback, I would have just given up quickly and not even tried to get to the conclusion. So you would have been left with no data.

Also remember that you want to extrapolate me passing to general population, some of whom may be more capable than me. In that regard, you want to give me all the help I need to find differences.

syn08 · Dec 26, 2022

solderdude said:
This thread is on fire ...

This time you got a tough guy, with infinite time and patience to beat you with his experience. Don't ask me how I figured this out.

Business must be pretty bad lately if ASR is considered a good place to promote it.

NTK · Dec 26, 2022

voodooless said:
It doesn’t matter what the pills are for. If you can find out after a few rounds which is which, the trail will fail. So thanks for tanking your own argument

No. In an ABX test you still don't know if the next X is A or B. If the test let you are learn how to get better at discriminating between them, it makes the test more sensitive.

voodooless · Dec 26, 2022

amirm said:
I would have just given up quickly and not even tried to get to the conclusion. So you would have been left with no data.

No, you’d have different data: no audible differences if you don’t know where to look. That is still data, and can still answer a valid research question, just a different one.

amirm said:
Also remember that you want to extrapolate me passing to general population, some of whom may be more capable than me. In that regard, you want to give me all the help I need to find differences.

As long as you know how to. It’s like comparing a chess grandmaster to me: I’d be checkmate in 10

. That’s why I said: if you allow it, make everyone aware of the usefulness.

You guys keep coming with arguments I agree with, I wonder why I’m still not totally convinced :facepalm:

manisandher · Dec 26, 2022

krabapple said:
Am I mistaking you for someone else, or are you the guy here who pops up regularly ASR thread reporting truly *outstanding* claims of hearing differences that by all boring old normie audio science thinking should be well nigh impossible?

Don't know. If you consider a 25% post rate compared to yours as 'regular' then maybe.

krabapple said:
The simplest test of that is to blind the golden ear, level match his samples, and let that golden ear try to do it just as he did before. (An ABX can be an efficient way to do taht, though it's not the only double blind listening protocol). Can he still do it when the very simplest controls are put in place?

I've shared the results of a blind listening test I was involved in (along with another ASR member) in the past before, but I think they're pertinent to this discussion.

First, some context:

1a. I was certain that I could hear differences between two different buffer settings in a software player. (The buffer settings were proven not to change the bits.)
1b. If a mechanism existed that caused bit-identical replay to sound different, I wanted to get to the bottom of it.
2. I was confident that I could pass a blind test to demonstrate that I could hear a difference between the bit-identical settings.
3. I asked someone (now a fellow ASR member) to help me conduct the blind test.
4. We agreed that the results of the test would be published online in their entirety, irrrespective of how they fell.

Here are the results:

Now, we've already had all sorts of discsussions about the test setup, protocol, etc., so no need to rehash them. What I think might be of interest is a discsussion of how my pyschology possibly affected the results of the test (3 actually).

Test 1 (non-ABX) - 4/10

I was pretty confident that I'd pass a 'simple' ABXXXXXXXXXX test. But I'd never been involved in any blind tests before and hadn't had any time beforehand to practice, or to even consider how to approach the test. B really did sound different to A. But then, the first X threw me and I never recovered. I believe I still heard differences between the Xs, but without a prior AB reference, couldn't assign them correctly to A or B.

Test 2 (non-ABX) - 6/10

Continuing with ABXXXXXXXXXX. I'd now had some practice, and also some time to consider a more effective approach. I decided that what I needed to do was to simply compare X with X-1. Was there a difference or not? I thought this would work, but for whatever reason it didn't.

Test 3 (ABX) - 9/10

By this point, the whole exercise had been an abject failure, as far as I was concerned. We would post the 'no better than guessing' results online. I relaxed and thought, "what ever will be will be". But in the first ABX sample, the X really did sound more like A. And in the second, more like A. Etc., etc. By sample 9, we were >15 minutes into the ABX, and I was pretty shattered with all the concentration. I got #9 wrong.

**********

FWIW. (Probably not a lot in the eyes of most regulars here.)

Mani.

bboris77 · Dec 26, 2022

I would like to go back to the original set of questions about the importance of measurements in audio posed at the beginning of this thread.

As someone that firmly believed in the concept of ones and zeros in digital audio and video in my 20s, I became seduced by the allure of the magical tube sound narrative and other subjectivist myths. Eventually I made my way back to the objectivist side after I realized that I could not reliably discern between a $99 delta sigma DAC and $699 R2R one when level matched. I also experienced many noise-related issues with various tube amps and started resenting their unpredictability.

IMO, measurements are extremely useful to identify faults in amps that will clearly be audible in normal use. Examples of these are audible distortion with low impedance transducers, high noise floor with high sensitivity transducers, lack of rfi interference rejection, audible power supply hum and ground loop creation potential for USB DACs and amps that offer unbalanced RCA connections only and use 3-pronged power plugs. I’ve experienced all of these issues in various DACs and amps which could have been caught and addressed at the design stage had proper measurements been made.

What measurements are not useful for, after we get past the threshold of audibility, is measuring things for the sake of SINAD race, which is what has been happening increasingly over the last few years. Does not matter if this was presented as a pursuit of excellence in audio engineering, it still effectively became a dick-measuring contest. The sad part is that this SINAD myopia came at the expense of focusing on things like reliability in certain cases.

I’m even more sceptical on the usefulness of blind testing as a measurement tool. Since our hearing and audio memory are both flawed, subjective and easily affected by our biases, comparing how well people do in double-blind tests can only lead to the development of the “perfect pitch” type of elitism where one person’s opinion will be worth more than another’s assuming that their score is higher.

The good news is that we can all enjoy audio excellence now at very affordable prices. For someone that grew up with audio cassettes, I can truly appreciate it. There’s no reason to use the SINAD fallacy to try and con(vince) the newbie audiophiles that they need to spend thousands of dollars on their DACs and amps to experience superior audio. Otherwise, measurement-obsessed objectivists are effectively doing the same thing as various snake oil subjectivist salesmen - shelling expensive gear that makes no actual difference to the listening experience.

danadam · Dec 26, 2022

amirm said:
I am not sure what you are saying. When you listen to two files, you routinely "hear" differences. These differences need to be put to test to see if they are valid -- the very purpose of automated ABX test. When differences shrink, you get hit with more false positives. You need to get through these as quickly as you can until or if you land on a real difference. You can't determine that section a priori. You need the feedback from the test fixture as there is no other way to know.

Isn't that what "Training mode" is for? It runs unlimited trials and updates results after each choice.

voodooless · Dec 26, 2022

danadam said:
Isn't that what "Training mode" is for? It runs unlimited trials and updates results after each choice.

@amirm likes to have his training wheels on all the time it seems. Makes him bike faster

… just kidding guys, I’ll stop now

Edit: on a serious note: if you have young kinds, don’t buy training wheels. They’ll only slow down the learning curve of cycling! Just buy a good size bike, and usually they’ll be cycling within the hour, without any training wheels.

birdog1960 · Dec 26, 2022

bboris77 said:
I would like to go back to the original set of questions about the importance of measurements in audio posed at the beginning of this thread.

As someone that firmly believed in the concept of ones and zeros in digital audio and video in my 20s, I became seduced by the allure of the magical tube sound narrative and other subjectivist myths. Eventually I made my way back to the objectivist side after I realized that I could not reliably discern between a $99 delta sigma DAC and $699 R2R one when level matched. I also experienced many noise-related issues with various tube amps and started resenting their unpredictability.

IMO, measurements are extremely useful to identify faults in amps that will clearly be audible in normal use. Examples of these are audible distortion with low impedance transducers, high noise floor with high sensitivity transducers, lack of rfi interference rejection, audible power supply hum and ground loop creation potential for USB DACs and amps that offer unbalanced RCA connections only and use 3-pronged power plugs. I’ve experienced all of these issues in various DACs and amps which could have been caught and addressed at the design stage had proper measurements been made.

What measurements are not useful for, after we get past the threshold of audibility, is measuring things for the sake of SINAD race, which is what has been happening increasingly over the last few years. Does not matter if this was presented as a pursuit of excellence in audio engineering, it still effectively became a dick-measuring contest. The sad part is that this SINAD myopia came at the expense of focusing on things like reliability in certain cases.

I’m even more sceptical on the usefulness of blind testing as a measurement tool. Since our hearing and audio memory are both flawed, subjective and easily affected by our biases, comparing how well people do in double-blind tests can only lead to the development of the “perfect pitch” type of elitism where one person’s opinion will be worth more than another’s assuming that their score is higher.

The good news is that we can all enjoy audio excellence now at very affordable prices. For someone that grew up with audio cassettes, I can truly appreciate it. There’s no reason to use the SINAD fallacy to try and con(vince) the newbie audiophiles that they need to spend thousands of dollars on their DACs and amps to experience superior audio. Otherwise, measurement-obsessed objectivists are effectively doing the same thing as various snake oil subjectivist salesmen - shelling expensive gear that makes no actual difference to the listening experience.

this argues for a measurement that could be called the "value quotient": nearness to optimal available performance/cost.. I'm sure a great many would be interested in that. I know I would. If optimal available performance within a subset of audio equipment can't be defined, then the science needs to mature or is irrelevant.

SIY · Dec 26, 2022

bboris77 said:
I’m even more sceptical on the usefulness of blind testing as a measurement tool.

It's not a measurement tool, it's a basic and absolutely necessary control to have any kind of experimental validity for determining audible factors.

Master Thread: Are measurements Everything or Nothing?

Grand Contributor

Grand Contributor

Founder/Admin

Grand Contributor

Founder/Admin

Grand Contributor

Major Contributor

Grand Contributor

Founder/Admin

Master Contributor

Founder/Admin

Senior Member

Major Contributor

Grand Contributor

Addicted to Fun and Learning

Senior Member

Major Contributor

Grand Contributor

Senior Member

Grand Contributor

Similar threads