So let me give a short overview of results we have so far.
These are the results of participants that took the online test via abxtests.com - we had a total of 22 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Number of correct answers
(out of 16) | p-value
P(X>=x) | Number of attempts
(with the same correct answer count) | Comments |
1 | 99,998% | 0 | |
2 | 99,974% | 0 | |
3 | 99,791% | 0 | |
4 | 98,936% | 1 | |
5 | 96,159% | 2 | |
6 | 89,494% | 7 | |
7 | 77,275% | 4 | |
8 | 59,819% | 3 | Note: In one attempt in this group "A" was selected every time, I suspect it was a test run of a sort and not a 'real' test attempt. |
9 | 40,181% | 3 | |
10 | 22,725% | 1 | |
11 | 10,506% | 0 | |
12 | 3,841% | 0 | |
13 | 1,064% | 0 | |
14 | 0,209% | 0 | |
15 | 0,026% | 0 | |
16 | 0,002% | 1 | |
Note: p-value P(X>=x) has been calculated with
this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked
here. Please let me know if anyone spots any mistakes.
As we can see, out of the 22 finished (online) test attempts so far, we have one where all provided answers were correct (16/16, p-value = 0,002%). No other attempts achieved p-value lower than 1% (or even 5%). The second-best attempt got 10/16, for a p-value of 22,725%.
In addition to those included above, we had one participant who reported that they didn't finish the test due to not being able to hear a difference. I'm another one that fits into that category - I couldn't hear any difference between "A" and "B" so couldn't complete the test.
One participant reported also attempting the test in foobar2000 ABX comparator, but not being able to do better that guessing there either (ABX comparator results log were not posted, though).
As others have stated, we should be careful when interpreting results of this test (and similar ones). This is because there are still variables that are not controlled - such as individual system differences between participants, things like different operating system audio settings, possibly different browser behaviour, driver configuration, potential use of various audio enhancements, differences in audio equipment and its calibration, etc... So it seems unlikely to me that we can easily make many generalized conclusions from this test.
However, it is IMO interesting to note that while most had difficulty identifying "X" correctly within the constraints of this test, we seem to have had one attempt where a participant was able to reliably select the correct answer in all 16 trials. It would be interesting to hear what this participant used to anchor to when doing the test - e.g. was it listening for the slight increase in 'brightness' we expected from FiiO D03K vs Topping E50 (as predicted from frequency response measurement) or something else?
Please let me repeat again that my intention is not to argue that 'all DACs are the same' - IMO they are not.
E.g. FiiO D03K has some frequency response variations, not impressive SINAD, only 1,5V RMS maximum output and is sensitive to low load impedance. Therefore it is absolutely imaginable that this DAC's limitations may become audible in certain setups - e.g. those with less-than-ideal gain staging and/or when driving a very low impedance analogue inputs.
On the other hand, as long as the test setup is optimized to achieve good performance out of each DAC and some basic listening test controls are applied, it can IMO be surprising how close to transparency some of these budget DACs can be - even if compared to objectively much better performing units as here.
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.