• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

I don't think ABX is the best way to test blind

vintologi

Active Member
Joined
May 7, 2023
Messages
105
Likes
18
Sure it is one way to see if you can tell the difference but you can actually do it a different way.

You simply listen to both versions (not knowing which is which) and then you pick the one which sounds the best. You can set it repeat where the versions alternate so if you are unsure you simply listen a bit more.

Then you simply repeat that until you reach statistical significance or give up.

This gives you the added information of letting you know what actually sounds better in a blind test which can be interesting to know (i once got statistical significance for 16-44 sounding better than 24-96).

Maybe you don't know which version sounded the best but you found some difference anyway, then you can simply test for that each time instead to see if it persists.

Note that the probability of getting 10 out of 10 in favor of one alternative is 2/2^10 = 0.00195315 (10-0 or 0-10)

You could for example discover that some song sounds better encoded as say mp3 320 Kbps (to you) and get statistical significance for that. Then it makes sense to listen to the mp3 version of that song instead (even if you keep the flac for archival purposes).
 
ABX is used to see if there is a difference to be heard at all. If you find there is, you can then test for preference.

The problem with trying to do both together is that there might be an audible difference but no preference difference. In that case you will be randomly guessing a preference which will then appear as if there is no audible difference - or at least you don't know if there is an audible difference.
 
The problem with trying to do both together is that there might be an audible difference but no preference difference. In that case you will be randomly guessing a preference which will then appear as if there is no audible difference - or at least you don't know if there is an audible difference.
I already covered that.
Maybe you don't know which version sounded the best but you found some difference anyway, then you can simply test for that each time instead to see if it persists.
But you should ideally decide that the first round (what you are actually going to listen for).

But in regardless of method you can get fail to get any significant result due to not knowing what to listen for.
 
good observation.

I read the phrase "what sounds best" in your post and so I had some thoughts.
I ask to everyone, some general questions on this topic regarding comparisons and test.
Mine is pure curiosity.

-How do you establish what sounds best?

-What are you looking for in your comparisons?

-Is your search my-fi, therefore subjective taste or hi-fi, therefore objective system characteristics and therefore always valid on average for each listening?

-What are the basic parameters you are looking for and why do you think those parameters are important to know in order to obtain a correct Hi-fi reproduction?

-Do you think that what you generally look for with comparisons is objectively essential to then set the system in a way that can give you the most correct hi-fi experience?

-Last question: are you sure you know what to investigate?

Thanks to anyone who wants to answer.
 
Sure it is one way to see if you can tell the difference but you can actually do it a different way.
There are quite a few alternate formats, depending on what specifically you're trying to test. They all have pluses and minuses, but the absolute sine qua non is matched levels and double blind. Coincidentally, in another thread, I just linked an old article of mine that reviews a few of the other formats.
 
Sure it is one way to see if you can tell the difference but you can actually do it a different way.

You simply listen to both versions (not knowing which is which) and then you pick the one which sounds the best. You can set it repeat where the versions alternate so if you are unsure you simply listen a bit more.

Then you simply repeat that until you reach statistical significance or give up.

This gives you the added information of letting you know what actually sounds better in a blind test which can be interesting to know (i once got statistical significance for 16-44 sounding better than 24-96).

Maybe you don't know which version sounded the best but you found some difference anyway, then you can simply test for that each time instead to see if it persists.

Note that the probability of getting 10 out of 10 in favor of one alternative is 2/2^10 = 0.00195315 (10-0 or 0-10)

You could for example discover that some song sounds better encoded as say mp3 320 Kbps (to you) and get statistical significance for that. Then it makes sense to listen to the mp3 version of that song instead (even if you keep the flac for archival purposes).
:)
 
Is "double blind" even a thing when it's you yourself doing the test on a computer?

Last question: are you sure you know what to investigate?
The main thing i am interested in is very high frequencies (such as above 26 Khz) i would like to figure out what difference those frequencies can make for the listening experience (if any).

I don't actually find lossy format tests that interesting, those formats don't really make sense for music anymore given how cheap bandwidth and storage has become.
 
Is "double blind" even a thing when it's you yourself doing the test on a computer?


The main thing i am interested in is very high frequencies (such as above 26 Khz) i would like to figure out what difference those frequencies can make for the listening experience (if any).

I don't actually find lossy format tests that interesting, those formats don't really make sense for music anymore given how cheap bandwidth and storage has become.
Just what about 26khz signals do you think you can judge in the audible range? Depends how you set it up on a computer.
 
How do you establish what sounds best?
By ear, can determine preference only.
Measurements of what's most accurate to the source is the only way to determine which IS best. ;)
 
Is "double blind" even a thing when it's you yourself doing the test on a computer?
If the computer is in charge switching, yes, the same purpose is served.
 
I'm getting to the point where I will simply force myself to enjoy whatever measures best.
If I see a graph, data, and a scientific explanation of the results that's convincing enough for me.
You only need to look at the divorce rate to know ignoring data is a bad idea.
 
I'm getting to the point where I will simply force myself to enjoy whatever measures best.
If I see a graph, data, and a scientific explanation of the results that's convincing enough for me.
You only need to look at the divorce rate to know ignoring data is a bad idea.

I won't force myself to enjoy anything. But if something measures well but sounds bad, then I know it's the song or recording that's offensive, not the speakers themselves.
 
When listening for tiny differences I just want to concentrate on A vs B, I find the additional thing of determining if A or B is the same as X is an enough distraction to make me lose my concentration.

For me, it works better if I put the tracks in parallel in my DAW, shut my eyes and just concentrate on the differences I may hear when seamlessly flipping between the tracks. This will make me relax more, I was even able to hear the small difference in the reverb tail in an upsampled track vs a non-upsampled track.

It's questionable if small differences like in the above example would make it sound better for actual music listening, but I’m open for the possibility that the less our hearing has to add the small differences that may be missing, the more natural and less fatiguing the sound will be for longer listening sessions… maybe? :)
 
But in regardless of method you can get fail to get any significant result due to not knowing what to listen for.
My view is that if it is so difficult to hear that you need to know what to listen for, then the difference is probably not worth worrying about.
 
Sure it is one way to see if you can tell the difference but you can actually do it a different way.

You simply listen to both versions (not knowing which is which) and then you pick the one which sounds the best. You can set it repeat where the versions alternate so if you are unsure you simply listen a bit more.

Then you simply repeat that until you reach statistical significance or give up.

This gives you the added information of letting you know what actually sounds better in a blind test which can be interesting to know (i once got statistical significance for 16-44 sounding better than 24-96).

Maybe you don't know which version sounded the best but you found some difference anyway, then you can simply test for that each time instead to see if it persists.

Note that the probability of getting 10 out of 10 in favor of one alternative is 2/2^10 = 0.00195315 (10-0 or 0-10)

You could for example discover that some song sounds better encoded as say mp3 320 Kbps (to you) and get statistical significance for that. Then it makes sense to listen to the mp3 version of that song instead (even if you keep the flac for archival purposes).
Dylan Hester, Cirrus Logic engineer, once made a point towards a preference for simpler AB blind test over ABX test on the AVS forum :
I don't now if the caveat of forbidden access to volume control in an ABX test was or still is relevant, or if solutions exist to mitigate that concern.
 
My view is that if it is so difficult to hear that you need to know what to listen for, then the difference is probably not worth worrying about.
What can be worse is that in many instances, once you learn to listen for and hear defects, you can't unlearn and ignore it!
 
I don't now if the caveat of forbidden access to volume control in an ABX test was or still is relevant, or if solutions exist to mitigate that concern.
That is absolutely NOT a requirement for ABX (or any other format) as long as the volume control causes equal changes for both DUTs.
 
ABX testing may not be perfect, but it's the best way we know now, and it has proven it's value. But off course it does not tell everything. But a combination of measurements and ABX testing is way better than subjective sighted viewings. And like SIY says, matched volumes for both devices (whatever those are) are essential in that, as louder is often perceived as better.

But if you think you know better, make a scientific case for it and present it to the industry. But a non scientific opinion on a forum won't make a difference, you will be classified as another subjectivist. You need to come with hard prove before the industry (and the people here) will accept it, that is how science works.
 
The way I've preferred to do ABX blind tests is first to evaluate AX, BX, AA, BB, AB, XX not knowing which is A, B or X and choose same/different, not expressing a preference. If there's no statistical difference between A and B, then a preference is meaningless.

Then, only if there's a statistically valid difference between A and B can one start evaluating preferences. Rapid switching between alternatives is exquisitely sensitive to small differences which is why levels must be matched very closely, I aim for 0.1dB.

S.
 
Back
Top Bottom