• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Why DA/AD diff tests not getting the attention it deserves in measurements

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
Many of you might know well about the famous DA-AD loopback tests from Gearslutz by didier.brest. The goal is to test how much a music file will change by going through one DA-AD loop using a particular audio interface. People contribute their recordings using their own device, then didier.brest will do the diff analysis and post the results representing the differences between two wav files. pkane here also did analysis using weighted metrics to reproduce many results. AP, REW, RMAA, Multitone Analyzer all used some ADDA loopback to obtain their results but I find the whole music file differences/null tests sometimes tell a different story. For example the 20 year old MOTU 2408 can outperform some AD/DA pairs with far better measurement scores in this test, shown in both Gearslutz and pkane's results (I appreciate if someone explain to me why. Thanks!)

I find these results fascinating as they used actual music files instead of single or multiple tones, can be applied to real studio scenarios (that audio files will actually go through ADDA multiple times to use analog hardwares), and are reproducible using relatively cheap measuring devices and simpler software. The results also make more sense to persuade audiophiles to stop their fantasy in a transparent device. But it seems not many reviewers are using such benchmarks in their measurements.

AD/DAs don't have to be measured in pairs like didier.brest did in the gearspace thread (as most people there are audio engineers, they care more about the whole package performance of AD/DA within one device). But I'd really love to see the results of each DAC output the same wav file and go through the same reference level ADC like AP or more affordable E1DA cosmos ADC, then compare the differences in this controlled test. This can very interesting to see IMO.

First new thread post here, love this place!
 
Last edited:

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
2,690
Likes
6,013
Location
Berlin, Germany
Sorry to dissapoint you, but that test is very broken by design and does NOT test what we are inclined to think it would. That was pointed out in that thread many times, by myself, by ASR member @pkane and various others. Nice effort, but completely futile and utterly misleading, actually spreading fake news about device audio quality!

The problem is that the test method is incredibly prone to get dominated by minute simple, linear and completely benign and inaudible frequency response differences (actually, by the microscopic phase shifts associated with that). This test does NOT test for lowest alteration of the signal (like from noise and distortion), rather it mostly test for flat frequency (and thus flat phase) response at irrelevant -- ultra subsonic -- frequencies.

A good example is the comparision between my modded RME Adi-2 Pro (modded for DC coupled input) and the RME Baybface. The Adi-2 Pro achieves its stellar score mainly by the absence of any highpass filters that are normally present (at least in effectively any ADC) whereas the Babyface with its ADC highpass filter at a rather high frequency (in relative terms) places way way down on the list just because of that and nothing else. It would place as bad even when everything else were 10 times better than in any other tested DAC/ADC...

Also, one and the same ADI-2 Pro scores very differently depending on the highpass filter settings (there are two such filters, for DAC and ADC sections). Obviously, sond quality is not affected the slightest by that but Didier's test fails to factor out these irrelevant microscopic linear changes.
 

storing

Active Member
Forum Donor
Joined
Aug 27, 2021
Messages
226
Likes
220
I find these results fascinating as they used actual music files instead of single or multiple tones
Personally I'd rather have them more complete and informative, like by plotting amplitude vs frequency for sine sweeps or multitone so that one can see if there are frequency-dependent effects; instead of one single number. And I'd like to read more about the exact methods used. Like: how does his Matlab code compensate for the I/O latency exactly? Being even one sample off would result in changes in the results. And one final nitpick: this isn't a DA/AD diff test, but an audio player->OS->DAC->ADC->audio recorder test i.e. can be a DA/AD diff test but only if none of the other components alter the sound in any way.

can be applied to real studio scenarios
Perhaps that's one of the reasons this doesn't get more attention here: as far as I'm aware the majority of users aren't in real studio scnearios.
 
OP
F

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
Sorry to dissapoint you, but that test is very broken by design and does NOT test what we are inclined to think it would. That was pointed out in that thread many times, by myself, by ASR member @pkane and various others. Nice effort, but completely futile and utterly misleading, actually spreading fake news about device audio quality!
Thank you very much for the detailed explaination. Though I may need more time to investigate and understand everything you said. Sorry if I post something already being concluded many times here. I'd appreciate if you could point me to threads that you or other members discussed about this and so I can educate myself.
incredibly prone to get dominated by minute simple, linear and completely benign and inaudible frequency response differences
Do you think the PK metric from @pkane (or some other modified/improved metrics) can somehow compensate this by weighting relevant and irrelevant microscopic parts to improve the validity of this method?

Another question would be whether going through the loop multiple times can increase the "signal-to-noise" of this measurement. We can actually hear audible differences by looping 400 times so different DACs will perform differently in a meaningful way. Thanks for your time!
 
OP
F

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
Personally I'd rather have them more complete and informative, like by plotting amplitude vs frequency for sine sweeps or multitone so that one can see if there are frequency-dependent effects; instead of one single number. And I'd like to read more about the exact methods used. Like: how does his Matlab code compensate for the I/O latency exactly? Being even one sample off would result in changes in the results. And one final nitpick: this isn't a DA/AD diff test, but an audio player->OS->DAC->ADC->audio recorder test i.e. can be a DA/AD diff test but only if none of the other components alter the sound in any way.
I agree that a sweep and sine wave response already contains the most information we need. It's only that sometimes people may still consider a music file way more complicated than a sweep (one tone at a time) or multi tones. The facts that different DACs convert music data stream similarly might be more persuasive to people outside the measurement community. Also I didn't know that the playback and recording can be relevant.
Perhaps that's one of the reasons this doesn't get more attention here: as far as I'm aware the majority of users aren't in real studio scnearios.
Yeah, you are right. They are more interested in the changes of a track going through ADDA multiple times so that transparency can be crucial. But most audio users only need the music to go through a DAC once.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
That whole gearspace thread unfortunately ends up misguided. It is mainly the phase differences in different filters that you end up testing for and that doesn't tell you which devices are more accurate in the ways that matter. (I see I have mostly restated, less well, the same ideas of KSTR)

Now Paul has developed lots of useful features in Deltwave. In its simplest form it gives the same results as the Matlab code at gearspace. It also can show you phase differences, a spectrum of the difference and play the difference file with adjustable amplification so it can be heard. It also aligns the file in time within less than 1/1000th of a sample time. It shows frequency response differences too. Then you have the option to let it correct for phase and frequency response so that differences left are for other reasons. Paul has a chart of all the devices listed at gearspace with different levels of correction done and the resulting nulls. Those files from the devices at gearspace can be downloaded by anyone.


So this type testing is very useful, but it has to be properly used and interpreted which sometimes leads to wrong ideas.
 
Last edited:

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
Many of you might know well about the famous DA-AD loopback tests from Gearslutz by didier.brest. The goal is to test how much a music file will change by going through one DA-AD loop using a particular audio interface.
Why do you care about combined performance of these two subsystems? If you are producing music, then the ADC is what matters, not the DAC. If you are playing music, then the DAC matters and not the ADC.

As noted, it is *extremely* non-trivial to create proper null/difference tests. I have spent days at times trying to get what I know to be correct results and failing.

Even if you got correct null results, interpretation is impossible. You don't know what is or is not audible since you have taken out the masking signal in the source.
 
OP
F

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
That whole gearspace thread unfortunately ends up misguided. It is mainly the phase differences in different filters that you end up testing for and that doesn't tell you which devices are more accurate in the ways that matter.

Now Paul has developed lots of useful features in Deltwave. In its simplest form it gives the same results as the Matlab code at gearspace. It also can show you phase differences, a spectrum of the difference and play the difference file with adjustable amplification so it can be heard. It also aligns the file in time within less than 1/1000th of a sample time. It shows frequency response differences too. Then you have the option to let it correct for phase and frequency response so that differences left are for other reasons. Paul has a chart of all the devices listed at gearspace with different levels of correction done and the resulting nulls. Those files from the devices at gearspace can be downloaded by anyone.


So this type testing is very useful, but it has to be properly used and interpreted which sometimes leads to wrong ideas.
Thanks for the reply. I love DeltaWave and Paul's work towards comparison of music files. I find that listening to the delta file generated often results in a very small volume of the original music. That's when I start to realize that it seems to be a very difficult task to do music null tests even with the levels matched and time carefully aligned in DeltaWave.
 
OP
F

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
Why do you care about combined performance of these two subsystems? If you are producing music, then the ADC is what matters, not the DAC. If you are playing music, then the DAC matters and not the ADC.

As noted, it is *extremely* non-trivial to create proper null/difference tests. I have spent days at times trying to get what I know to be correct results and failing.

Even if you got correct null results, interpretation is impossible. You don't know what is or is not audible since you have taken out the masking signal in the source.
Thank you, Amir, for the explanation. I think people in gearspace care about the combination of two subsystems because they'll have to choose a main rack audio interface for their studio as an ADDA pair for many operations in their DAW. For example, they'll send the music track into an 1176 analog compressor and get the results back from the same device in the DAW to get a signature from a certain analog device, so the ADDA pair test makes sense there. Other people will only care about DAC or ADC performance separately, so I assume people can use a fixed ADC or DAC to measure other devices. For example, use E1DA cosmos ADC to measure all DACs, and Topping D90se to measure all ADCs.

I started to understand that this task is non-trivial from this thread, that the difficulty of a solid comparison could be beyond my imagination. It can be exciting to see in the future when the method can be proved to be valid with certain improvements.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,368
Likes
234,388
Location
Seattle Area
For example, they'll send the music track into an 1176 analog compressor and get the results back from the same device in the DAW to get a signature from a certain analog device, so the ADDA pair test makes sense there.
In such cases, the effect of the compressor, etc. dwarfs anything the DAC does.
 
OP
F

FINFET

Active Member
Forum Donor
Joined
Jul 27, 2022
Messages
113
Likes
202
In such cases, the effect of the compressor, etc. dwarfs anything the DAC does.
You are right. The analog device's random noise base will dominate and eliminate any efforts elsewhere in the chain. But just as we here are seeking the highest SINAD of DACs, that difference may not be audible at all. They are just similar people pursuing the cleanest ADDA pair as proof of the product quality or a symbol of the technical power of the manufacturers. There are no controlled blind tests that prove any valid practical use of this in music production unless the track will go through the ADDA hundres of times. Not to mention there might be some fatal flaws in their methodology.

Edit: I may misunderstood your meaning. Yes the compressor, hardware EQ, etc. will change the waves so it's impossible to do any null test with such things in the middle. The loop (connecting only by wires) is just the simplest situation.
 
Last edited:

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
You are right. The analog device's random noise base will dominate and eliminate any efforts elsewhere in the chain. But just as we here are seeking the highest SINAD of DACs, that difference may not be audible at all. They are just similar people pursuing the cleanest ADDA pair as proof of the product quality or a symbol of the technical power of the manufacturers. There are no controlled blind tests that prove any valid practical use of this in music production unless the track will go through the ADDA hundres of times. Not to mention there might be some fatal flaws in their methodology.

Edit: I may misunderstood your meaning. Yes the compressor, hardware EQ, etc. will change the waves so it's impossible to do any null test with such things in the middle. The loop (connecting only by wires) is just the simplest situation.
I've got a couple threads where I did 8th generation copies looping thru AD to DA and back again. Just if you are curious. Fairly modest gear. It shows how little degradation there is and if I were to do it now, the measured degradation would be considerably less. You get to hear the original file and the 8th generation result.


 

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
2,690
Likes
6,013
Location
Berlin, Germany
Do you think the PK metric from @pkane (or some other modified/improved metrics) can somehow compensate this by weighting relevant and irrelevant microscopic parts to improve the validity of this method?
In DW you can set up options so that it tries to compensate for differences from the "trivial" linear errors ("non-linear calibration" section in the settings, a bit of a misleading name IMHO).
This together, with the concept of the PK metric, usually gives a good indicator.
Of course, at some point, enough of a magnitude frequency response difference, like more than 0.1dB, may become audible and should affect the rating, that is, it should not be factored out.
 

Grooved

Addicted to Fun and Learning
Joined
Feb 26, 2021
Messages
679
Likes
441
In such cases, the effect of the compressor, etc. dwarfs anything the DAC does.
It would still be interesting to get as much as possible the effect of any external processing without getting any effect coming from the DAC-ADC chain
The method is not right, but trying to get result on this is not interesting at all, even if it's not needed for everybody.
Someone who is producing inside the DAW only, and mixing "in the box" only only need a good DAC. Then, there's possibility that the mastering will use analog external processing in which case would need DAC-ADC
If external processing is used during mixing, it also needs DAC-ADC loop
It's to be sure to hear as much as possible the effect of the external gear and not external gear+any loss done by DAC-ADC combo
In DW you can set up options so that it tries to compensate for differences from the "trivial" linear errors ("non-linear calibration" section in the settings, a bit of a misleading name IMHO).
This together, with the concept of the PK metric, usually gives a good indicator.
Of course, at some point, enough of a magnitude frequency response difference, like more than 0.1dB, may become audible and should affect the rating, that is, it should not be factored out.
I trust more Deltawave than the Matlab script of Gearspace for that, but still, the results gives some surprises, like the 20 years old MOTU 24i/o, 2408 MK3 and 828 MK2 providing very good PK metric.

There's a point I'm not sure why, and maybe you can help on that: the result in Pkane for the 828 MK2 is not right because of the file used. It used the first file in the Gearspace thread, which is the only one that was done with unbalanced cable.
I ran it with a 828 MK2 and got -109dB PK Metric using balanced cable (so around the same values than the 24i/o and 2408 MK3, which is logical as they share the same chips, and just behind the Babyface Pro at -110dB). But the simplest test in Deltawave still give the same difference.
It's only once you enable Level EQ, Phase EQ and Non-linear drift correction that there is a big difference between unbalanced and balanced.

From what comes this differences?
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,632
Likes
10,205
Location
North-East
There's a point I'm not sure why, and maybe you can help on that: the result in Pkane for the 828 MK2 is not right because of the file used. It used the first file in the Gearspace thread, which is the only one that was done with unbalanced cable.
I ran it with a 828 MK2 and got -109dB PK Metric using balanced cable (so around the same values than the 24i/o and 2408 MK3, which is logical as they share the same chips, and just behind the Babyface Pro at -110dB). But the simplest test in Deltawave still give the same difference.
It's only once you enable Level EQ, Phase EQ and Non-linear drift correction that there is a big difference between unbalanced and balanced.

From what comes this differences?
Non-linear EQ in DeltaWave corrects for many sins in the loopback chain, including phase differences and even some simple jitter errors that are the result of modulating the clock by periodic tones. It can also correct for the differences in frequency response caused by the reconstruction filter. When simple DW configuration (without non-linear correction) produces the same result as with linear correction, that means there's no periodic amplitude/phase differences that were detected, or perhaps the data was too noisy to find a pattern that could be corrected. Sometimes using a longer recording can help overcome this, as this allows more data to be averaged.
 

Grooved

Addicted to Fun and Learning
Joined
Feb 26, 2021
Messages
679
Likes
441
Non-linear EQ in DeltaWave corrects for many sins in the loopback chain, including phase differences and even some simple jitter errors that are the result of modulating the clock by periodic tones. It can also correct for the differences in frequency response caused by the reconstruction filter. When simple DW configuration (without non-linear correction) produces the same result as with linear correction, that means there's no periodic amplitude/phase differences that were detected, or perhaps the data was too noisy to find a pattern that could be corrected. Sometimes using a longer recording can help overcome this, as this allows more data to be averaged.
Thanks @pkane , but is it normal to get such a difference between -109dB PK Metric in balanced and -89dB in unbalanced, with the same device? (both tests with linear correction on)
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,632
Likes
10,205
Location
North-East
Thanks @pkane , but is it normal to get such a difference between -109dB PK Metric in balanced and -89dB in unbalanced, with the same device? (both tests with linear correction on)

Compare balanced to unbalanced captures, look at the spectrum differences. PK Metric will be worse if there are significant differences in the audible range, since PK Metric is weighted by the equal loudness curves.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,523
Likes
37,056
Thanks @pkane , but is it normal to get such a difference between -109dB PK Metric in balanced and -89dB in unbalanced, with the same device? (both tests with linear correction on)
My first guess might be a little more hum on unbalanced. With nulls of -89 db and lower it takes very little to worsen a null. As Paul said it should show up in FR or maybe phase (though I don't know why there would be a phase difference in balanced vs unbalanced).

So what are the differences if you null these two file captures against each other?
 
Last edited:

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,632
Likes
10,205
Location
North-East
My first guess might be a little more hum on unbalanced. With nulls of -89 db and lower it takes very little to worsen a null. As Paul said it should show up in FR or maybe phase (though I don't know why there would be a phase difference in balanced vs unbalanced).

So what are the differences if you null these two file captures against each other?

The likely source of differences is a simple (or complex) ground loop with the single-ended connection that is eliminated by balanced.
 
Last edited:
Top Bottom