# Judging ABX results subjectively? What do they mean to me.

#### Blumlein 88

In a couple of recent threads it has been discussed how you can be sure blind test results are not chance at very low percentages with large numbers of trials. (Read Amir's reference on the topic if you have not yet).

So 15 of 20 correct is enough you have less than 5% probability the result is chance. So instead of guessing 2 of 4 you are correctly picking 3 of 4 or a 75% rate correct.

With 5100 of 10,000 correct you also have less than a 5% probability the result is random. You are correct in 51% of your choices rather than just 50% you get guessing. This strikes some people as odd or at least non-intuitive. (also lets not get picky about precise numbers mine are more or less in the real ballpark)

In blind testing one should avoid getting too caught up in p-values (the odds the results are random). You also need to keep in mind the size of the effect. Is it really large enough to matter in listening to music?

So would like to hear thoughts on this from others. I will also break up this post into another few to give my thinking on how this matters.

#### Blumlein 88

Personally, having done some blind testing of things with Foobar and a few without Foobar, I have become comfortable doing blind testing up to 20 trials or so. I typically do 20 trials twice on something I am interested in. While comfortable doing it, it is drudgery.

I have found that comparing shorts segments of no more than 5 seconds is far more discerning than anything longer. Likely due to echoic memory limits in humans. There are things I can perceive at or just almost 100% with 5 second segments, and not at all over 50% using 30 second segments. So does that mean listening to music such an effect actually can't matter at all to me? (Unless we have some master works lasting 5 seconds)

It seems to me the short 5 second testing gets closest to the physical limits of my hearing mechanism. If something is not perceivable this way, it likely isn't perceivable by my hearing mechanism. If my hearing mechanism can't pick it up obviously the difference can't get passed onto my brain. So I believe it becomes a non-factor.

However, there is such a huge drop in discernment between 5 seconds and 15 or 30 seconds or longer I lean toward thinking if you need to shorten it to 5 seconds to hear it, then it also almost surely is of little enough subjective consequence you could ignore it. I am leaning toward thinking if you can't hear it with 30 second segments it can be ignored. Some effect might be perceived in well done ABX comparisons with short segments, and still have absolutely no subjectively perceived musical significance.

The other side I am thinking on is the large trial testing. If superbly done, unimpeachable blind protocols in place, excellent fidelity gear, trained listeners in wonderful conditions get you 5100 of 10,000 correct responses the effect may be real. However is it meaningful to me just listening to music? I would say the effect must be so small as to be practically something you can fully ignore.

I know some people insist effects not even heard short term become important long term. And I can't rule out the possibility. It doesn't match with my experience. And other than the recent meta-analysis suggesting 30 seconds or longer is needed to perceive hires benefits I don't know of such things being the case. Fertile field for investigation perhaps.

So my current thinking is if short segment blind testing doesn't detect it then the effect is not available to human hearing. Additionally if the effect isn't heard on 30 second segments it is probably meaningless for music. I am just picking a number mostly, but regardless of the large number of trials, if you can't tell the difference correctly 75% of the time it also is probably small enough you aren't missing much to forget about it.

#### amirm

I have found that comparing shorts segments of no more than 5 seconds is far more discerning than anything longer. Likely due to echoic memory limits in humans. There are things I can perceive at or just almost 100% with 5 second segments, and not at all over 50% using 30 second segments. So does that mean listening to music such an effect actually can't matter at all to me? (Unless we have some master works lasting 5 seconds)
Answering this part first, your acuity could over time become better where you hear the differences over larger segments. On the other hand, it does indicate very small differences that in the larger context may not be material.

Answering your larger point, I go for 100% ability to detect differences in ABX tests (putting aside occasional wrong selection). I don't go by 95% confidence factor. I either can find the difference or I can't.

As I note in my article, the 95% confidence is arbitrary. Any number could be selected.
So my current thinking is if short segment blind testing doesn't detect it then the effect is not available to human hearing.
If expert listeners are used, that would most likely be true.

#### ceedee

So my current thinking is if short segment blind testing doesn't detect it then the effect is not available to human hearing. Additionally if the effect isn't heard on 30 second segments it is probably meaningless for music. I am just picking a number mostly, but regardless of the large number of trials, if you can't tell the difference correctly 75% of the time it also is probably small enough you aren't missing much to forget about it.
I think you're probably right. Those who are opposed to controlled testing usually throw out the "it only reveals itself over time" argument, sometimes saying that it takes weeks or months! The fantastic thing about ABX testing is that the listener can choose whatever duration or intervals he wants.

During ABX tests I have also come to the same conclusion after finding that I performed much better on the tests when using short segments.

#### Blumlein 88

As I note in my article, the 95% confidence is arbitrary. Any number could be selected.
Yes, and the 95% seems mainly effective at winnowing out the really obvious wrong ideas. Many physical sciences quickly moved to at least 3 sigma results (99.7%) because even chasing 1 in 20 happenstance positives created chaos over time. Going to the tighter cutoff seemed to eliminate it. That was also the experience of those using quality control for manufacturing. Those who used 95% levels found it actually worsened your control over quality in complex chains of manufacture vs doing nothing. When they went to 3 sigma they actually gained so much more control it took little more effort to reach even higher levels of quality. Some have criticized social sciences and the biomedical field for not advancing to this stage.

#### Blumlein 88

If expert listeners are used, that would most likely be true.
Yes doing my own testing of myself I can only be certain it applies to me. I am old enough my high frequency hearing cannot stand as a proxy for the whole population or even most of it. .

#### Phelonious Ponk

ABX is good for hearing very small differences, and very small differences generally yield very small results, resulting in the need for methodological rigor, and lots of trials. But the stuff audiophiles talk about? In my subjective opinion, simple, unscientific, statistically insignificant blind AB listening beats sighted listening every time. The choice is simple: Do I stare lovingly at my new DAC, amp, pre gleaming beautifully on a shelf and decide it sounds much better than my old one? Of course. When I can't see which one is playing, is it still a night and day difference? Am I now struggling to hear the difference? Often, yes. That's valuable knowledge to have, if you can face up to it.

Tim