The DUT for this test is the DAC (objective of OP is to compare DAC performance). So if the ADC that is part the loopback test is fouling up the results then you need to address that. Get a better ADC. Try to correct the ADC error. Or abandon the effort, which is what I think has happened.All scientists here can of course do their own assessments of presented data and should not be devoid of knowing all that possible details! (?) Given that there still doesn't seem to be adequate research in what is really detectable and what is not - better not reduce resolution in measurements... or? That would be the scientific approach.
I assume a proper measurement setup e.g. that the equipment used for measurements is significant better performing than any DUT and used in a proper manner.
Skip any one (1) "figure-of-merit" - it will not work. It is better to continue to educate us in the topic in order to be able to analyse a full set of measurements and draw conclusion from that. The correlation to SQ is left TBD.
//