• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

If there are enough participants, it is statistically unlikely that no-one scores perfectly by chance. So it's not about "tell" it's about "stat."
Exactly, this is a good point often missed in these kind of massive tests (don't know how to call them), like, it doesn't mater how many times I take the test, but it does matter. The subject in statistics is multiple testing and p-value correction.

@dominikz, do you know if everyone has taken the test a single time or they can just report the best they got after an arbitrary tries? Why do you think there is no 0, 1 or 2 by chance? They just don't report those results?
 
If there are enough participants, it is statistically unlikely that no-one scores perfectly by chance. So it's not about "tell" it's about "stat."
That is certainly one possible explanation. Another is that a small proportion of people have identified a difference they can hear reliably. If each participant reported a different 'tell' or couldn't describe the difference then chance would seem more likely. If they all described a similar characteristic then there could be something to it. Further research is needed to find out either way - can those participants repeat their score?
 
That is certainly one possible explanation. Another is that a small proportion of people have identified a difference they can hear reliably. If each participant reported a different 'tell' or couldn't describe the difference then chance would seem more likely. If they all described a similar characteristic then there could be something to it. Further research is needed to find out either way - can those participants repeat their score?
There's some truth to this, but the counter argument is that if there were a non-random aspect (e.g., some people could identify while others were random), the normal distribution would be shifted and asymmetic. It doesn't appear to be.

But yes, if someone who got 10/10 claimed it was deliberate and audible, the next experiment would be to check this to see if they can replicate it. Same thing with the people who got 0/10, which the hopeful folks try to gloss over...
 
@dominikz, do you know if everyone has taken the test a single time or they can just report the best they got after an arbitrary tries? Why do you think there is no 0, 1 or 2 by chance? They just don't report those results?
So the thing works like this:
  • Every completed attempt is reported to me automatically. But partial attempts are not reported at all. E.g. someone could do the test 100 times without actually completing the test, and then only finalize it on the 101st try - and I would just know about the last attempt.
  • In this format I have no way to reliably identify multiple attempts by the same participants; as I wrote before:
I cannot reliably say, sadly - there are clues in the metadata sometimes that may indicate the same person took the test multiple times, but really I cannot be certain. There is no way for me to identify individual test participants - it is an anonymous test.
  • All taken into account, I do believe at least some people (any possibly many) give up before completing the test once they realize they can't reliably distinguish between the files. As you suggested, this might indeed perhaps explain the lack of results with 0-2 correct responses.

That is certainly one possible explanation. Another is that a small proportion of people have identified a difference they can hear reliably. If each participant reported a different 'tell' or couldn't describe the difference then chance would seem more likely. If they all described a similar characteristic then there could be something to it. Further research is needed to find out either way - can those participants repeat their score?
Please note that my intention with this test was *not* to prove that no-one could ever hear the difference between these two files.

To the contrary, the differences in frequency response between them definitely suggest people with well-preserved high-frequency hearing *might* hear a small difference.
So this might even be considered an easy test in this context - the measured performance differences between the two DACs are dramatic here, and the difference in FR seems it might well be audible.
In many other cases the measured differences between DACs will be smaller (e.g. no significant FR difference), so my hope is that this test puts claims of night-and-day audible differences of informal listening tests in perspective (for readers without previous experience with controlled listening tests).

This is IMHO actually what makes the results of this test interesting: we've had over 500 completed test attempts and it still seems the vast majority can't reliably differentiate these two recordings. Given that this is ASR we might even assume many participants had an above-average interest in audio and critical listening - which didn't seem to help.

When this test was launched I half-expected I'd get so many 16/16 responses that I'd have to prepare the more difficult Phase 2 test almost right away. But even after this much time I still don't see the need for it. :D
 
Last edited:
If you try 10 times and get 11 or 12 each time, is more or less the same as scoring 15 one time (havent done the numbers, just want to make my point). It's very unlikely that you can score 11 or 12 five or ten times in a row acting randomly. So it looks you have your foot on its throught.
Yesterday I got 5x test results within the hour, all with identical metadata.

If we assume these were all done by the same participant (a reasonable assumption - though it could be wrong!), we can view it as a single test with 80 trials.
The results look like this:
AttemptCorrect guesses (K)Number of trials (n)Calculated p-value P(X>=x)
1.
12​
16​
3,84%​
2.
9​
16​
40,18%​
3.
11​
16​
10,51%​
4.
7​
16​
77,28%​
5.
10​
16​
22,73%​
Total
49
80
2,83%
(Calculated with the "Easy Binomial Test Calculator")

As we can see, even though most of the individual attempts don't have a very low p-value (but a couple are pretty good), when we aggregate the results we see there's only 2,8% probability of getting such a result by random guesses. This is still not passing the stricter p-value<1% criteria, but it's pretty close!
Age was set to 22 in each attempt, which should indicate good high frequency hearing.

Just thought it might be interesting to share. :)
 
I just completed the test with a fresh pair of HD 490's and a Topping DX3. I must admit I felt like I was guessing most of the time. It proves to me that almost all opinions on sound quality can be ignored unless there is proof after an ABX.
p-value
?
CorrectIncorrect
0.0667115
 
I did not do very well. I think I got 6 right and 10 wrong. It was a bit annoying to listen to the same song so many times so there will not be attempts to improve the score from me.
 
You call it "proper controls"? This method is seriosly flawed, how are you suppose to tell difference between these samples when you play them trough another dac/amplifier/speakers? It is almost as stupid as youtube "reviews" of audio when they record audio setups and expecting that you can heard difference on your computer. I'm skeptic there are differences which can be heard, but this method is stupid and doesn't prove anything. You MAY be able to tell diferrence if playing with amplifier/speakers with very low distortions and in very good room in terms of acoustics. With your method you won't be able to tell anything as you cover one distortions (of source DACs) with other, much greater distortions (of recording, compression, another DAC, amplifier and speakers/headphones).
 
You call it "proper controls"? This method is seriosly flawed, how are you suppose to tell difference between these samples when you play them trough another dac/amplifier/speakers? It is almost as stupid as youtube "reviews" of audio when they record audio setups and expecting that you can heard difference on your computer. I'm skeptic there are differences which can be heard, but this method is stupid and doesn't prove anything. You MAY be able to tell diferrence if playing with amplifier/speakers with very low distortions and in very good room in terms of acoustics. With your method you won't be able to tell anything as you cover one distortions (of source DACs) with other, much greater distortions (of recording, compression, another DAC, amplifier and speakers/headphones).
The main point of these tests is to focus on the differences between the audio files being compared, rather than the specific equipment each person uses. While it's true that factors like headphones, speakers, and room acoustics can influence perceptions, the goal is to determine whether listeners can reliably identify differences. As long as the testing conditions are consistent, we can still draw meaningful conclusions about the ability to hear those differences, regardless of peoples' varying setups.
 
The main point of these tests is to focus on the differences between the audio files being compared, rather than the specific equipment each person uses. While it's true that factors like headphones, speakers, and room acoustics can influence perceptions, the goal is to determine whether listeners can reliably identify differences. As long as the testing conditions are consistent, we can still draw meaningful conclusions about the ability to hear those differences, regardless of peoples' varying setups.
How are you suppose to tell difference if you have to play these samples through your own DAC / amplifier / speakers? DAC introduce very low distortions in comparison to amplifier/spekaers so IF there is a detecable difference, it may only appear in very transparent setup, under perfect listening conditions. It looks like most of the people taking this "test" forget that they listen these samples distorted again by their own DAC :) It is like people judging audio setups by watching how they play on youtube :) Or judging how monitor has accurate colors, by watching it on youtube review on your laptop. It is the same level of stupidity.
 
How are you suppose to tell difference if you have to play these samples through your own DAC / amplifier / speakers? DAC introduce very low distortions in comparison to amplifier/spekaers so IF there is a detecable difference, it may only appear in very transparent setup, under perfect listening conditions. It looks like most of the people taking this "test" forget that they listen these samples distorted again by their own DAC :) It is like people judging audio setups by watching how they play on youtube :)
I understand your perspective. Your emphasis on noise and distortion is indeed valid when discussing differences in DACs.

That said, I think the primary goal here is to debunk the myth regarding the different "tonalities" of DACs. Please feel free to correct me if I'm mistaken about this assumption, but this is what the OP mentioned on page 1:
This thread is meant to once and for all settle the age-old question on whether different non-broken DACs have a "sound" or if all DACs sound the same!
With this focus, I believe the tests and results are valid.
 
You call it "proper controls"? This method is seriosly flawed, how are you suppose to tell difference between these samples when you play them trough another dac/amplifier/speakers? It is almost as stupid as youtube "reviews" of audio when they record audio setups and expecting that you can heard difference on your computer. I'm skeptic there are differences which can be heard, but this method is stupid and doesn't prove anything. You MAY be able to tell diferrence if playing with amplifier/speakers with very low distortions and in very good room in terms of acoustics. With your method you won't be able to tell anything as you cover one distortions (of source DACs) with other, much greater distortions (of recording, compression, another DAC, amplifier and speakers/headphones).

Correct me if I am wrong but does not even a cheap typical dongle-IEM combo have less distortion than the best of studio monitoring setups in any control room? Even with the extra AD/DA trip the signal is taking.

Edit: I am asking this because you are worried about masking distortion with distortion. But your preferred method would have MORE new distortion if I am not mistaken.

Edit2: I am on a dongle-IEM combo, just to clarify.
 
I understand your perspective. Your emphasis on noise and distortion is indeed valid when discussing differences in DACs.

That said, I think the primary goal here is to debunk the myth regarding the different "tonalities" of DACs. Please feel free to correct me if I'm mistaken about this assumption, but this is what the OP mentioned on page 1:

With this focus, I believe the tests and results are valid.
Not really. Let me give an analogy. Imagine you are doing "ABX tests" with two wide-gamut display monitors. You are setting up the test, to tell if you see the difference of how they reproduce colors. But to present the monitors, you use a video of them showing some pictures, video which already compress the color information to 8-bit, and expect people watching them on their monitors to tell difference. Of course they won't be able to tell any difference, becasue there is no way to tell as all of the test process is flawed as it can't reproduce the accuracy of the colors "end-to-end".

It is the same here. You won't be able to tell any diffrence, but you can't draw any conclusions after this tests other that the test is flawed. I'm not saying there are detecable diffrences, I'm saying this kind of test is pointless and it may actually draw false conslusion that there are no differences at all when using these DACs in well controled test (by example with some high-end headphones connected directly to DACs).
 
Correct me if I am wrong but does not even a cheap typical dongle-IEM combo have less distortion than the best of studio monitoring setups in any control room? Even with the extra AD/DA trip the signal is taking.

Edit: I am asking this because you are worried about masking distortion with distortion. But your preferred method would have MORE new distortion if I am not mistaken.

Edit2: I am on a dongle-IEM combo, just to clarify.
Thats correct, however you still mask the source signal with the distortion of your combo. You introduce another DAC (and its own distortions) in the path, so you can (and should) assume your own DAC introduce its own distortions which will mask any difference of D/A and then A/D conversions happened before. You can't really draw any conclusions from such test, only testing DACs directly with your headphones, with equal sound level, you may draw any conclusion.
 
Thats correct, however you still mask the source signal with the distortion of your combo. You introduce another DAC (and its own distortions) in the path, so you can (and should) assume your own DAC introduce its own distortions which will mask any difference of D/A and then A/D conversions happened before. You can't really draw any conclusions from such test, only testing DACs directly with your headphones, with equal sound level, you may draw any conclusion.

Plugging the headphones directly to the DAC would be also testing the headphone amp performance of the DACs, yes? That would be a variable you don't want if you just want to test the DA conversion. Which is what I think we are doing because of the nature of the test.

Of course it depends what you want to test. But naturally, in an online test like this, the DACs are connected to a line in impedance (10k-ish ohms?). A thing to keep in mind thinking about which question are we looking answers to.

edit again: I am genuinely just trying to understand what we are testing here and not looking for an argument. I felt I needed to say this because this is the internet:D
 
Not really. Let me give an analogy. Imagine you are doing "ABX tests" with two wide-gamut display monitors. You are setting up the test, to tell if you see the difference of how they reproduce colors. But to present the monitors, you use a video of them showing some pictures, video which already compress the color information to 8-bit, and expect people watching them on their monitors to tell difference. Of course they won't be able to tell any difference, becasue there is no way to tell as all of the test process is flawed as it can't reproduce the accuracy of the colors "end-to-end".

It is the same here. You won't be able to tell any diffrence, but you can't draw any conclusions after this tests other that the test is flawed. I'm not saying there are detecable diffrences, I'm saying this kind of test is pointless and it may actually draw false conslusion that there are no differences at all when using these DACs in well controled test (by example with some high-end headphones connected directly to DACs).
Your analogy is okay, but I don’t think it’s quite comparable. If you’re looking for color precision (and I admit I’m not an expert on this topic), I would guess it’s similar to frequency response or "tonality" in DACs. For instance, when comparing two different Blu-ray players on the same monitor, any differences would be noticeable on that monitor, even if it doesn’t perfectly meet color standards.

This aligns with the tests discussed in this thread, where you look to identify audible differences using personal equipment, regardless of their own coloration.

For example, if I use my Sony headphones, which have a specific frequency response, any difference in a DAC's "tonality" will still be evident, as the headphones' frequency response is a constant. The only variables in this case are the two DAC files being compared.
 
Oh, I would like to point out the 100% correct testers. Perhaps they are goldenears or cheated using some helping analysing software/gear - but even if they cheated the difference was detectable despite the extra AD round the signals were subjected to, so at least there is no masking problem (at the extra AD stage of this test).

Unless they lucked out. But that is very unlikely.
 
Thats correct, however you still mask the source signal with the distortion of your combo.

The ADC convertor used in this test has a SINAD of 118dB while the devices under test have a SINAD of 120dB versus 91dB. So when you replay the test files with a DAC that performs reasonably well (SINAD better than 91dB, which is very common) there is no masking by distortion of 'your combo'. Almost any HiFi DAC performes a lot better than the FiiO Taishan D03K being tested, that's the reason this DAC was chosen for the test.
 
The ADC convertor used in this test has a SINAD of 118dB while the devices under test have a SINAD of 120dB versus 91dB. So when you replay the test files with a DAC that performs reasonably well (SINAD better than 91dB, which is very common) there is no masking by distortion of 'your combo'. Almost any HiFi DAC performes a lot better than the FiiO Taishan D03K being tested, that's the reason this DAC was chosen for the test.
Sure but SINAD doesn't give whole picture, THD is also plays role, moreover better measurments doesn't mean always better sound reception (for sure it is good indiciator). Anyway with this method what you evaluate is distortion of your headphones, amplifier, your DAC, A/D conversion in the middle, and one of other DACs (tested). Every step introduce its own distoritions, and I think we can assume that bigger distortions covers smaller. Thats why I think to be able to tell diference between DACs, you must have rest of the setup really close to perfection so its distortions won't cover how DACs "plays".
 
For example, if I use my Sony headphones, which have a specific frequency response, any difference in a DAC's "tonality" will still be evident, as the headphones' frequency response is a constant. The only variables in this case are the two DAC files being compared.
True but I don't think diffreneces in DACs are in tonality, the frequency response is almost identical across all of them (and human ear is not very sensitive to changes in dB across it, and very quickly adapts to the curve - frequency response in the room is far from linear and we are not able to tell if let say high frequency band will lower by 1-3dB (and this is way higher than we could expect from DACs). What I'll look comparing DACs is clarity of the sound, thinking more of how much information is lost in DA conversion. To give an idea, we could listen to 8-bit or 6-bit converted sample to know how the sample sounds when we loose information, and look for same characteristic but way way more subtle.
 
Back
Top Bottom