• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). There are daily reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,372
Likes
1,648
Location
La Garriga, Barcelona
1665350386961.png

As we see, out of the total 350 attempts we have a total of 19 attempts that beat the lax <5% p-value criteria; out of those 19 attempts 10 were borderline for the more strict <1% criteria, and only 4 were well below it - scoring 15 or all 16 correct out of 16 trials.
As I've commented before, statistically there are two populations here, as the 4 cases on the right are very unlikely to be generated by a gaussian with the sample mean and variance. A simple t-test would confirm that, but it's quite obvious. Of course it can happen by change, but this chance is really low, or put in another way, the probability of the outcome you get, the distribution, is generated by a gaussian with the sample mean and variance is very low, so it's more likely there are two populations. Intuitively, gaussians decay very quickly and very deeply, they don't revive so to speak.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Do you know if the 4 attempts on the right are performed by people with other attempts? Are they performed by 4 different subjects? I ask this because the variable person is meaningful here, as one's knwoledge, training and capacity make all the difference, so we can't loose track of this variable.

From here perhaps next steps would be to confirm that these people are indeed capable of always telling apart the tow DACs, and then just ask them how they do it, so we all can learn. Then this knowledge could be taught to a small random sample of people of the read reagion and make them do the test again, and if they can then prove to be able to tell the DACs apart, you would have a very nice proof that with training, anyone can hear very subtle differences. Or I'm just building castles in the air? :)
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
580
Likes
1,999
@xaviescacs Thanks a lot for (another) thoughtful response!

Do you know if the 4 attempts on the right are performed by people with other attempts? Are they performed by 4 different subjects? I ask this because the variable person is meaningful here, as one's knwoledge, training and capacity make all the difference, so we can't loose track of this variable.
I cannot reliably say, sadly - there are clues in the metadata sometimes that may indicate the same person took the test multiple times, but really I cannot be certain. There is no way for me to identify individual test participants - it is an anonymous test.

Also, please note that it is possible to 'cheat' in this test, as the FR differences between sound clips in the stream can be measured - see post #76 where I show the difference between the files, and then posts #140 and #141 where another user suggested this. Another example of how the test can be made easier can be found in post #98.

IMHO there is unfortunately not enough control in this test to assume that an individual participant who scored highly actually heard a difference, and it is also quite possible that some of those who scored under the threshold could score better under controlled circumstances and/or with training.

TBH my intention here was just to provide a simple-to-use demonstration illustrating that differences between very different measuring DAC can be much smaller than anticipated (looking at the data and/or price) - and I hoped this would be especially interesting to those that otherwise hadn't had a chance to participate in a level-matched, double-blind ABX test.

From here perhaps next steps would be to confirm that these people are indeed capable of always telling apart the tow DACs, and then just ask them how they do it, so we all can learn. Then this knowledge could be taught to a small random sample of people of the read reagion and make them do the test again, and if they can then prove to be able to tell the DACs apart, you would have a very nice proof that with training, anyone can hear very subtle differences. Or I'm just building castles in the air? :)
To be honest, I was actually originally expecting more participants would be able to tell these DACs apart because there are frequency response differences between them. I'm not sure if that is what most would call 'subtle differences', as many good DACs will have better-matched frequency responses than these two - usually the differences would be in noise level and distortion.

If we pushed this line of investigation forward I personally believe we'd just find that the >10kHz frequency response deviation of the FiiO DAC is meaningful to those with preserved high-frequency hearing who know what to listen for. Removing the FR difference (e.g. with PEQ) would likely make the test even more difficult - and perhaps impossible.

In summary, I'm not sure that concluding what makes one identify these two DACs reliably would allow us to make a conclusion regarding identification of subtle differences in general.

What I'd personally be interested in is whether we can do better in determining the maximum distortion levels that are still inaudible. IMHO then we would have a much clearer view on when mid-performing audio electronics become 'transparent' :)
However I'm not sure if such tests will ever materialize - given the widespread availability of very high performing audio electronics there seems to be less of a practical need to pinpoint the thresholds.
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
2,959
Likes
1,705
Location
Salvador-Bahia-Brasil
got 12 of 16 on first try, but the first 5 or so I was still figuering out where to listen to.
curious if I can do better on second try, later perhaps.

did again and got 11.
So I guess I am not fully guessing, but the diference is too small to be sure?
I guess I could do better with a high pass, but I don't want to mess up the results.

I focussed on the "s" of "myself", SINAD diference seams impossible
 

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,372
Likes
1,648
Location
La Garriga, Barcelona
Your post is thoughtful! :)
Also, please note that it is possible to 'cheat' in this test, as the FR differences between sound clips in the stream can be measured - see post #76 where I show the difference between the files, and then posts #140 and #141 where another user suggested this. Another example of how the test can be made easier can be found in post #98.

IMHO there is unfortunately not enough control in this test to assume that an individual participant who scored highly actually heard a difference, and it is also quite possible that some of those who scored under the threshold could score better under controlled circumstances and/or with training.
I see, thanks for all the references. I agree with you then, controls aren't solid enough to conclude anything, more experiments should be performed to confirm if there is someone really capable of telling the tow DACs apart. My suggestion is a bit naive actually.
TBH my intention here was just to provide a simple-to-use demonstration illustrating that differences between very different measuring DAC can be much smaller than anticipated (looking at the data and/or price) - and I hoped this would be especially interesting to those that otherwise hadn't had a chance to participate in a level-matched, double-blind ABX test.
IMO you accomplished that, and this post is very useful to provide people an example whenever someone says they can hear a difference between DACs etc etc
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
2,959
Likes
1,705
Location
Salvador-Bahia-Brasil
I think the shape of the result shows a clear image. those 4 results on the right are clear outliners which suggest some kind of cheat was used.
 

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,372
Likes
1,648
Location
La Garriga, Barcelona
did again and got 11.
So I guess I am not fully guessing, but the diference is too small to be sure?
If you try 10 times and get 11 or 12 each time, is more or less the same as scoring 15 one time (havent done the numbers, just want to make my point). It's very unlikely that you can score 11 or 12 five or ten times in a row acting randomly. So it looks you have your foot on its throught.
 
Last edited:

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
2,905
Likes
2,197
I think the shape of the result shows a clear image. those 4 results on the right are clear outliners which suggest some kind of cheat was used.
That's a possible explanation of the outliers. Another is that a subset of people can genuinely hear the difference, as discussed in #182 above.
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
2,959
Likes
1,705
Location
Salvador-Bahia-Brasil
That's a possible explanation of the outliers. Another is that a subset of people can genuinely hear the difference, as discussed in #182 above.

they could be obviously "build diferently", but statisticly there will always be cheaters lol
Maybe they tried stuff out, like the high-pass I personaly considerated; I would just like to try it out....but didn't.
but putting an analyser just to troll is just to easy for nobody having done it
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
580
Likes
1,999
did again and got 11.
So I guess I am not fully guessing, but the diference is too small to be sure?
I guess I could do better with a high pass, but I don't want to mess up the results.

I focussed on the "s" of "myself", SINAD diference seams impossible
If you only did these two trials (1st with 12/16 correct, and then 2nd with 11/16) and we assume them to be a single trial with 23 correct out of 32 attempts we get p-value of 1,0031%:
1665559562244.png

(binomial test calculator link)

So while probability of the result being caused by chance is relatively low (but not zero!), the number of incorrect trials is a testament that the audible differences are far from obvious. :D

Either way, good job and thanks for taking the test! :)
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
2,959
Likes
1,705
Location
Salvador-Bahia-Brasil
So while probability of the result being caused by chance is relatively low (but not zero!), the number of incorrect trials is a testament that the audible differences are far from obvious. :D

that was intuitively my conclusion, too.
also matches my impression while listenening.

I think though that this diference might become a little(!) more obvious with material that is harsh, or on the boundery of harsh. a bright mastered rock song with a lot of cymbals for example.

good job with the test.
 

PierreBCN

New Member
Joined
Mar 3, 2020
Messages
2
Likes
2
I find this thread fascinating and an eye-opener!


Maybe somebody already asked....in the 350 plus tests taken, can we identify some users who consistenly pick above 12? (or where ever you'd think the number of correct answers become statistically significant).

Cheers,
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
580
Likes
1,999
I find this thread fascinating and an eye-opener!
Thanks, I'm very glad you found it interesting!

Maybe somebody already asked....in the 350 plus tests taken, can we identify some users who consistenly pick above 12? (or where ever you'd think the number of correct answers become statistically significant).
I can't identify users, so I can't really say for sure. I also can't be sure whether or not some (of the very few that did score well) used spectrum meters or similar to 'cheat'. There's unfortunately no way for me to control for that - it is a limitation of a remote/online test format. :)

However even so, as you can see, there are in general very few attempts that did well - even in this test where the two DACs measure very differently (there is even a significant frequency response difference between them). I.e. this is a test that should be relatively easy.
 

fpitas

Major Contributor
Forum Donor
Joined
Jul 7, 2022
Messages
3,237
Likes
3,875
Location
Northern Virginia, USA
That might be due to timing. Your test has coincided with holiday season. Many people are traveling or hosting relatives and probably haven't had a good time to sit and do such a test yet. Be patient.

OTOH, when I've posted actual files for people to listen to and choose without knowing, the participation levels have always been abysmal.
On the bright side, no matter what the outcome is, subjectivists will go on believing what they wish. So it all works out.
 
Top Bottom