Thats why I think to be able to tell diference between DACs, you must have rest of the setup really close to perfection so its distortions won't cover how DACs "plays".
Can you specify which exact setup do you think would be 'close enough to perfection' so that you could hear differences between various DACs?
Note that e.g. the $28,000/pair state-of-the-art KEF Blade 2 Meta loudspeakers (
full measurements) have best-case -60dB multitone distortion (equivalent to only ~10bits resolution), even going up above -40dB (~6bits) in higher frequencies and higher volumes. Some speakers may do better, but definitely not better enough.
Basically any non-broken DAC beats this performance - including the 20$ FiiO Taishan D03K from
post #1.
Some headphones and IEMs can do significantly better (THD levels at about -70dB to -80dB), but again not close to the level of well-measuring DACs.
Additionally, most residential rooms won't have more than about 70-80dB (~11-13bits) maximum available dynamic range (limited by ambient noise on the bottom, and by human pain threshold at the top). In practice it will be even less, because most people wouldn't (shouldn't?

) come close to the pain threshold while listening.
Some might use these points to argue that we shouldn't worry too much about (reasonably competently designed) DACs at all, given than most produce better than 16bit resolution.
True but I don't think diffreneces in DACs are in tonality, the frequency response is almost identical across all of them
Except the frequency response was not the same in the two DACs used for this ABX test, please refer to comparative measurements in
post #1.
and human ear is not very sensitive to changes in dB across it
Actually, research has shown that human ear is quite good at identifying low-Q frequency response deviations in ABX tests with fast switching, even when low in level. The likely problem in this specific case is that the FR difference between these two DACs is at very high frequencies where human hearing sensitivity is naturally lower, and degrades/is lost as we get older.
Anyway with this method what you evaluate is distortion of your headphones, amplifier, your DAC, A/D conversion in the middle, and one of other DACs (tested). Every step introduce its own distoritions, and I think we can assume that bigger distortions covers smaller.
Frequency response deviations are linear and additive, so any system with sufficient bandwidth will show FR differences between these two recordings of DAC output.
Harmonic and non-harmonic distortions are indeed subject to masking and (assuming a system with optimal gain-staging) distortions induced by the DAC will be almost certainly hidden by (any) headphone or loudspeaker you use - regardless of other components in the reproduction system.
Sure but SINAD doesn't give whole picture, THD is also plays role
Please note that
post #1 provides much more measurements than just SINAD, as well as references to the full measurement suite of both DACs and the ADC.