• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

I don't think ABX is the best way to test blind

But a combination of measurements and ABX testing is way better than subjective sighted viewings.
I think you missed the point. I am not arguing that non-blind listening tests would be better.
But if you think you know better, make a scientific case for it and present it to the industry.
Actually this is something we can test.

We can divide the participants into 2 groups where one group does ABX and the other does blind AB.

Blind AB testing is already used in studies. It's not something new i have invented.
 
For untrained listeners (when you just want to find out what the general public prefers if anything) i do think blind AB testing is the way to do. I do think it works better with human psychology since people don't want to fail (you can kinda fail an ABX test in you being unable to hear a difference while someone else does).
The way I've preferred to do ABX blind tests is first to evaluate AX, BX, AA, BB, AB, XX not knowing which is A, B or X and choose same/different, not expressing a preference. If there's no statistical difference between A and B, then a preference is meaningless.

Then, only if there's a statistically valid difference between A and B can one start evaluating preferences. Rapid switching between alternatives is exquisitely sensitive to small differences which is why levels must be matched very closely, I aim for 0.1dB.
I do find it interesting how people have different preferences for how to test blind.

If it works it works, how you get there is less important to be honest.
 
Last edited:
Sure it is one way to see if you can tell the difference but you can actually do it a different way.

You simply listen to both versions (not knowing which is which) and then you pick the one which sounds the best. You can set it repeat where the versions alternate so if you are unsure you simply listen a bit more.

Then you simply repeat that until you reach statistical significance or give up.

This gives you the added information of letting you know what actually sounds better in a blind test which can be interesting to know (i once got statistical significance for 16-44 sounding better than 24-96).

Maybe you don't know which version sounded the best but you found some difference anyway, then you can simply test for that each time instead to see if it persists.

Note that the probability of getting 10 out of 10 in favor of one alternative is 2/2^10 = 0.00195315 (10-0 or 0-10)

You could for example discover that some song sounds better encoded as say mp3 320 Kbps (to you) and get statistical significance for that. Then it makes sense to listen to the mp3 version of that song instead (even if you keep the flac for archival purposes).
I want the most accurate sound possible. After that, particular speakers in a unique room will have their own effect. I then use PEQ to taste, which is set differently for two listening positions in two rooms with different systems. After that, additional PEQ or tone controls for a small percentage of problematic recordings.

All that seems far more important to me than whatever theoretical difference going above 16/44.1 might yield. I trust my ears that lossy formats just aren't good enough for well recorded acoustic music, especially classical. For me, the idea that lossy formats could be preferred for some recordings isn't worth the effort of AB or ABX testing.
 
AFAIK "repeat until you reach" is actually cheating. You should decide on the number of trials in the test beforehand and either do all of them or discard the test.
It's not "cheating" but you have to keep it in mind when it comes to the results.

I ended one test when i got 30 out of 43 in favor of 16-44 over 24-96 (i was thinking at ending it at 27 of 40 but decided to to 3 extra to verify i could actually distinguish them).

But yea having a fixed number of trials can be better, especially if you don't want to spend a lot of time on it.
 
One of our expert members, when doing pro audio systems testing, complained that:
With trained listeners, ABX was to sensitive to small overlooked uncontrolled differences (that were not part of the feature being tested).
 
i do think blind AB testing is the way to do. I do think it works better with human psychology
It depends on the purpose of the test...

It's fine once you've demonstrated that there is a real difference. A/B listening is fine for comparing speakers and ABX is pretty useless because you can always identify A or B.

But the problem is... Even if A & B aren't known, it's not "as blind". Once you are convinced that A sounds better you may always think A sounds better and you can have a placebo effect, even there is no difference in an ABX test.
 
ABX is used to see if there is a difference to be heard at all. If you find there is, you can then test for preference.

The problem with trying to do both together is that there might be an audible difference but no preference difference. In that case you will be randomly guessing a preference which will then appear as if there is no audible difference - or at least you don't know if there is an audible difference.
Earlier this year I set up a carefully controlled ABX test between my then current DAC and an AV processor. I wanted to prove to myself that there was no audible difference between the two.

I was shocked to discover that there was a huge difference. With further investigation I discovered the source of the huge difference. One of the factory terminated XLR cables connected to the AV processor was wired in reverse polarity. The huge difference was that when listening to the AV processor the speakers were playing out of phase!

In this case, preference between A and B was 100% easily picked and arrived at the correct answer. During ABX testing with 20 samples the out of phase processor was missed 4 times out of 20. Meaning A and B were confused 4 of the 20 times. The delay between A, B, and X added a level of confusion that gave me an erroneous result even though statistically the correct results were obtained.
 
The delay between A, B, and X
Why was there a delay? Nothing in ABX protocol says there needs to be a delay - you should be able to switch freely and instantaneously between A B and X.
 
Why was there a delay? Nothing in ABX protocol says there needs to be a delay - you should be able to switch freely and instantaneously between A B and X.
There has to be a delay as you can't listen to them simultaneously! :)
 
Last edited:
There has to be a delay as you can't listen to them simultaneously! :)
OK - you've lost me. So when doing just A B testing, there is no delay? You can listen to A and B simultaneously??? :confused:. How were you comparing when you got the 100% preference result?

Or have I completely misunderstood the point you were making.
 
OK - you've lost me. So when doing just A B testing, there is no delay? You can listen to A and B simultaneously??? :confused:. How were you comparing when you got the 100% preference result?

Or have I completely misunderstood the point you were making.
In my test we would play A or B then X. The order of A and B were randomly chosen so X would sometimes follow A and other times follow B.

When we simply played A followed by B with no knowledge of what device was A or B the comparison was simple and immediate.
 
Last edited:
Just what about 26khz signals do you think you can judge in the audible range? Depends how you set it up on a computer.
I'm *guessing* just harmonics and if they add anything to the perceived sound ie pitch or timbre to the fundamental?
 
I already covered that.

But you should ideally decide that the first round (what you are actually going to listen for).

But in regardless of method you can get fail to get any significant result due to not knowing what to listen for.
In abx setup you are allowed to know the true identity of a and b. Only the x is unknown.
 
When listening for tiny differences I just want to concentrate on A vs B, I find the additional thing of determining if A or B is the same as X is an enough distraction to make me lose my concentration.

For me, it works better if I put the tracks in parallel in my DAW, shut my eyes and just concentrate on the differences I may hear when seamlessly flipping between the tracks. This will make me relax more, I was even able to hear the small difference in the reverb tail in an upsampled track vs a non-upsampled track.

It's questionable if small differences like in the above example would make it sound better for actual music listening, but I’m open for the possibility that the less our hearing has to add the small differences that may be missing, the more natural and less fatiguing the sound will be for longer listening sessions… maybe? :)
How do you know your ab results are more reliable than your abx results?
 
In my test we would play A or B then X. The order of A and B were randomly chosen so X would sometimes follow A and other times follow B.

When we simply played A followed by B with no knowledge of what device was A or B the comparison was simple and immediate.
Ok - but then that is just down to your specific implentation of an ABX test. Not an inherent limitation of the test itself. In fact by my understanding, i tis not how ABX is supposed to be. You are supposed to be able to compare A to X and B to X as you like. Instantly switching from A to X or from B to X. There shouldn't be a B in the middle when you are trying to compare A to X and vice versa.
 
Knowing in advance there is a difference influences the test as the expectation is there is one and causes guessing in AB testing. This only be overcome if all samples are A or all B in some tests so expectation bias is not there. Age of the participant is big issue.
 
I'm *guessing* just harmonics and if they add anything to the perceived sound ie pitch or timbre to the fundamental?
They don't. Human ears don't have any structures in them that can detect that sort of frequency. They therefore are unable to send any signals to the brain to tell it that frequency is present or not.
 
AFAIK "repeat until you reach" is actually cheating. You should decide on the number of trials in the test beforehand and either do all of them or discard the test.
No, you dont discard the test. You record it as a failure to detect a difference.

Repeat testing until you get the result you want is also cheating.
 
Ok - but then that is just down to your specific implentation of an ABX test. Not an inherent limitation of the test itself. In fact by my understanding, i tis not how ABX is supposed to be. You are supposed to be able to compare A to X and B to X as you like. Instantly switching from A to X or from B to X. There shouldn't be a B in the middle when you are trying to compare A to X and vice versa.
In my test, since I wanted to test a real world example and not audio clips switched by computer, there was a physical delay as a human turned a selector for the subject performing the listening. The ability to replay A or B and then X was allowed when the listener requested a replay, but the test was not instantaneous.

Short of a simulated comparison (reviewing captured audio clips) which I didn't want as that adds other uncertainties or being in a university perceptual studies lab with a massive budget and advanced switching equipment, I would submit the ABX test I conducted was about as good as can be done by amateurs at home.

My point was that even something as dramatically wrong as an out of phase signal was difficult to capture 100% of the time when there were brief switching delays in the 1 to 2 second range and uncertainty added.
 
Back
Top Bottom