• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). There are daily reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

voodooless

Master Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
5,637
Likes
9,279
Location
Netherlands
I was finally able to do the test, I scored a 10 correct out of 16. About as good as I would have expected: some dice could have done better as @solderdude showed :mad:

That was really hard! I thought one was a bit brighter and there was more "black between the notes" (pardon my audiophile indulgence;)), but it could all be imagination.

Also hard was the fact that I kept trying to see history in A, B and X, so I tied to correlate past memory to the new values, but obviously, that doesn't work. The colours might not have helped either ;) I would also have liked a keyboard interface, so one could switch between A, B and X without needing to look... Who knows, doing it actually blind might be beneficial? Also things like: "I already picked 'A' last time, what are the odds that it's again 'A'?" :eek: Mind you I'm not blaming my failures on any of these factors. If the difference would have been obvious, it would not have mattered either way.

My basic strategy was to first listen to A and B and determine the difference, and then try to match X to it. Well... It didn't pay off :facepalm:;)

Used MacOS + Dirac on Loxjie A30 via USB to KEF LS50 (original) nearfield.
 
Last edited:

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
12,607
Likes
28,284
Location
The Neitherlands
That's the difficulty and pitfall when doing AB tests.
One thinks they hear a difference and then, when about equal to chance, a result pops up that is between 50/50 and not 100% people tend to believe they actually heard differences and the result not being 100% nor 50/50 seems to confirm this.
In all cases where it is really difficult to really tell the final result should not be 'It looks like a could indeed hear a difference' but rather 'hmm that was extremely hard to do and not 'very obvious or clear as day and night'. In case of the latter 100% correct score or just 1 'miss' would validate that line of thought.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,786
ABX is the decided gold standard, but it is not the only way to do a blind test. AB forced response preference testing works too. You are given A and B which are random in order, and simply must pick which one you prefer in each trial. The result is still valid and many find this less stressful. Stress is a valid complaint w.r.t. this type of testing and why you should not do too many trials in a row. There can be advantages for the trained in detecting anomalies from repeated hearing, but you also simply get tired and are unable to focus. With less stress in a preference test, you can have longer test sessions.

.. To add to this, there are different statistical models for interpreting the results between ABX and forced AB preference. I don't want to give the impression they are exactly the same.

One more. ABX as most people on this forum do it, i.e. listening back and force to the two samples and the reference, is really more akin to a triangle test.
 
Last edited:

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
12,607
Likes
28,284
Location
The Neitherlands
Also in this case ... when the differences are really hard to discern then how big an issue is it in reality.
One can span blind AB over many days or even weeks. As long as everything is logged and only one physical thing is different.
The differences within the brain are probably much larger over a longer time period than technical differences.
 
Last edited:

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,156
......

Also threw dice 16x and counted odd/even and did this 4 times
1: 7 even, 9 odd
2: 9 even, 7 odd
3: 10 even, 6 odd
4: 12 even, 4 odd
In this case, all results should be included and we can't just use the best one.

; )

38/64 is p of 0.084

Or 26/64 is p of 0.948

None was significant.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,786
Also in this case ... when the differences are really hard to discern then how big an issue is it in reality.
One can span blind AB over many days or even weeks. As long as everything is logged and only one physical thing is different.
The differences within the brain are probably much larger over a longer time period than technical differences.

I find AB forced preference is much easier for "subjective" audiophiles and non audiophiles to stomach.
 

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,156
......
IMO the main takeaway should hopefully be that the audible difference is definitely not 'night and day' - even if both the price and measurable differences are very significant. :) Hope you found the test interesting!
I am still curious if sub bass heavy song would make a bigger difference.

If you have headphones that can go low and with low bass distortion, like k371, would you please check with "Blade Runner 2049" in your free time and tell us if you can easily hear a difference?

Thanks!
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
I am still curious if sub bass heavy song would make a bigger difference.

If you have headphones that can go low and with low bass distortion, like k371, would you please check with "Blade Runner 2049" in your free time and tell us if you can easily hear a difference?

Thanks!
I'll see what I can do - time is a bit short these days, but I'll try :)
Hello. I did the test, Very difficult ![
0.0667115
]
I do hear a very little difference in how the kick of the drum is rendered, one seems to me more "full" spectrum. But i think it's my mind. Sorry for my english. However here's my result : https://abxtests.com/?results=8x0ocSYLQysKXAtv&test=https://www.dropbox.com/s/96jcoe6c5e1nv0w/DAC+ABX+test+-+Topping+E50+vs+FiiO+Taishan+D03K.yaml?dl=0
Thanks for taking the test and reporting back! For 11 correct out of 16 trials P(X>=x) p-value is 10,506%.
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
So let me give a short overview of results we have so far.

These are the results of participants that took the online test via abxtests.com - we had a total of 22 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Number of correct answers
(out of 16)
p-value
P(X>=x)
Number of attempts
(with the same correct answer count)
Comments
199,998%0
299,974%0
399,791%0
498,936%1
596,159%2
689,494%7
777,275%4
859,819%3Note: In one attempt in this group "A" was selected every time, I suspect it was a test run of a sort and not a 'real' test attempt.
940,181%3
1022,725%1
1110,506%0
123,841%0
131,064%0
140,209%0
150,026%0
160,002%1
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here. Please let me know if anyone spots any mistakes. :)

As we can see, out of the 22 finished (online) test attempts so far, we have one where all provided answers were correct (16/16, p-value = 0,002%). No other attempts achieved p-value lower than 1% (or even 5%). The second-best attempt got 10/16, for a p-value of 22,725%.

In addition to those included above, we had one participant who reported that they didn't finish the test due to not being able to hear a difference. I'm another one that fits into that category - I couldn't hear any difference between "A" and "B" so couldn't complete the test.

One participant reported also attempting the test in foobar2000 ABX comparator, but not being able to do better that guessing there either (ABX comparator results log were not posted, though).

As others have stated, we should be careful when interpreting results of this test (and similar ones). This is because there are still variables that are not controlled - such as individual system differences between participants, things like different operating system audio settings, possibly different browser behaviour, driver configuration, potential use of various audio enhancements, differences in audio equipment and its calibration, etc... So it seems unlikely to me that we can easily make many generalized conclusions from this test.
However, it is IMO interesting to note that while most had difficulty identifying "X" correctly within the constraints of this test, we seem to have had one attempt where a participant was able to reliably select the correct answer in all 16 trials. It would be interesting to hear what this participant used to anchor to when doing the test - e.g. was it listening for the slight increase in 'brightness' we expected from FiiO D03K vs Topping E50 (as predicted from frequency response measurement) or something else?

Please let me repeat again that my intention is not to argue that 'all DACs are the same' - IMO they are not.
E.g. FiiO D03K has some frequency response variations, not impressive SINAD, only 1,5V RMS maximum output and is sensitive to low load impedance. Therefore it is absolutely imaginable that this DAC's limitations may become audible in certain setups - e.g. those with less-than-ideal gain staging and/or when driving a very low impedance analogue inputs.
On the other hand, as long as the test setup is optimized to achieve good performance out of each DAC and some basic listening test controls are applied, it can IMO be surprising how close to transparency some of these budget DACs can be - even if compared to objectively much better performing units as here.

In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment. :)
Let me do an updated overview of results so far:

These are the results of participants that took the online test via abxtests.com - we had a total of 43 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Number of correct answers
(out of 16)
p-value
P(X>=x)
Number of attempts
(with the same correct answer count)
199,998%0
299,974%0
399,791%0
498,936%2
596,159%3
689,494%9
777,275%7
859,819%7
940,181%6
1022,725%3
1110,506%1
123,841%2
131,064%2
140,209%0
150,026%0
160,002%1
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

As we see, we now have a total of five attempts that beat the lax <5% p-value criteria; out of those five attempts two were borderline for the <1% criteria, and only 1 was well below it - scoring all 16 correct out of 16 trials.

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a replay of closing words from my previous overview post :p
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment. :)
 

Jojo

Member
Joined
Jan 15, 2022
Messages
7
Likes
2
Very interesting, thank you.
I think i will try this test again with another configuration (amp-speakers), or may be on some studio monitors) to see if i can hear (or imagine :facepalm:) differences that i didn't heard on a really low budget 2.1 system (the one i did at 10%) :)
 

tom_tom

Member
Joined
Oct 16, 2020
Messages
85
Likes
43
Wow, that wasn't easy. I've got 9/16 on first try ( lenovo laptop output over 3,5 jack to gaming heaphones )
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
Wow, that wasn't easy. I've got 9/16 on first try ( lenovo laptop output over 3,5 jack to gaming heaphones )
I can't say that I distinctly heard the difference, the result
Thanks for taking the test! And please don't be discouraged - if you look at the statistics in post #130 you'll see that most who participated so far couldn't reliably differentiate the files. It is IMHO indeed a very difficult test!
Still, hope it was an interesting exercise! :)
 

Dumdum

Active Member
Joined
Dec 13, 2019
Messages
216
Likes
139
Location
Nottinghamshire, UK
It's not that the forum doesn't like them. It is just that they are meaningless except to the one person who has listened.

You can hear a difference - great. But the measurements tell us that the reason you hear a difference is very unlikely to be from the audio waves arriving at your ears. So what you hear tells us nothing about what anyone else will hear.
What will it be from if it’s not from the audio arriving at his ears?? I’ve seen some things dismissed…. But scoring well on an abx test where you’re comparing sound and saying it’s not what was arriving at his ears is just the funniest thing I’ve read on the most objective site ever… now you are dismissing blind abx testing as some kind of hoax??
 

Jimbob54

Master Contributor
Forum Donor
Joined
Oct 25, 2019
Messages
9,344
Likes
12,102
What will it be from if it’s not from the audio arriving at his ears?? I’ve seen some things dismissed…. But scoring well on an abx test where you’re comparing sound and saying it’s not what was arriving at his ears is just the funniest thing I’ve read on the most objective site ever… now you are dismissing blind abx testing as some kind of hoax??
I dont think the thread you have jumped on is someone scoring well on an abx- its user @Volikovvv who, best as I can tell, can clearly hear differences in sighted listening. But it does get somewhat confusing!

Edit- specifically this statement ..."We can get very similary measured su-9n and d1se, and give me a try. They sound very different to me, but when we look to the digits, they are the same... I know this forum don't like statements like that, so....." Note he isnt talking about the abx in the OP, but (I think was assumed) sighted listening tests of 2 other dacs, hence he knows statements like that (absent of supporting evidence of controlled tests ) dont go down well on here. He was right, they dont.
 
Last edited:

tonycollinet

Major Contributor
Joined
Sep 4, 2021
Messages
3,066
Likes
4,609
Location
UK/Cheshire
What will it be from if it’s not from the audio arriving at his ears?? I’ve seen some things dismissed…. But scoring well on an abx test where you’re comparing sound and saying it’s not what was arriving at his ears is just the funniest thing I’ve read on the most objective site ever… now you are dismissing blind abx testing as some kind of hoax

You need to read not only my statement, but the conversation that preceded it. It was’t referring to a well scored ABX Test.
 

Weeb Labs

Senior Member
Forum Donor
Joined
Jun 24, 2020
Messages
465
Likes
1,102
Location
Ireland
Sub-optimal test conditions on my part but I will nonetheless share the results. While performing the test, I quickly reached the conclusion that I was unable to distinguish the stimuli with any real degree of reliability.

Switching from A to B (or X) did occasionally result in a perceived difference but it is very difficult to determine the cause. It could simply have resulted from the music's progression at the moment of switching (coupled with expectation bias) or it may have been a real occasion on which I happened to focus on a characteristic more audibly affected by the slight change in frequency response. Over the course of the test, I have no doubt that both factors came into play.

1645212060979.png
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
Let me do another update of result overview so far:

These are the results of participants that took the online test via abxtests.com - we had a total of 72 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
199,998%0
299,974%0
399,791%0
498,936%2
596,159%5
689,494%10
777,275%11
859,819%13
940,181%16
1022,725%6
1110,506%2
123,841%3
131,064%3
140,209%0
150,026%0
160,002%1
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Pretty distribution graph:
1645213228952.png


As we see, out of the total 72 attempts we now have a total of seven attempts that beat the lax <5% p-value criteria; out of those seven attempts three were borderline for the more strict <1% criteria, and only one was well below it - scoring all 16 correct out of 16 trials.

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment. :)

Sub-optimal test conditions on my part but I will nonetheless share the results. While performing the test, I quickly reached the conclusion that I was unable to distinguish the stimuli with any real degree of reliability.

Switching from A to B (or X) did occasionally result in a perceived difference but it is very difficult to determine the cause. It could simply have resulted from the music's progression at the moment of switching (coupled with expectation bias) or it may have been a real occasion on which I happened to focus on a characteristic more audibly affected by the slight change in frequency response. Over the course of the test, I have no doubt that both factors came into play.

View attachment 187489
Thanks for doing the test and sharing your results! If you want a bit more control (e.g. capability to loop just a portion of the files), you can also download the source files from OP and use foobar2000 ABX comparator plugin to do the test (in this case it is suggested to use WASAPI exclusive driver to avoid OS audio processing shenanigans :)).
 

CREMA

Member
Joined
Jan 24, 2020
Messages
19
Likes
51
Location
South Korea
Thank you, Audacity!

If you see a ridiculous result on an online test, make sure to bring the person offline and take the test.

If you take Spectrum Analyzer away from him, his ears will be a fool.
 

Attachments

  • Digital Ears.PNG
    Digital Ears.PNG
    530.9 KB · Views: 91
Top Bottom