E1DA 9038D performance according to df-metric

solderdude · Sep 19, 2023

NTK said:
The ratings from the Df metric look to me like a random number generator.
For example, the Sony NWA105 got a median Df of -71.4 dB and the E1DA 9038 got -42.2 dB.

That's what OP found to be the case and all the 'perhaps' in his post points to OP wondering about the why. It looks like OP did not find an explanation but wanted 'ASR' to know.

I wonder if the same thing would occur with PK metric ?

Would also be nice if someone else repeated the measurements to see if it is generic or a single device.
In any case it is a 'null test' with a poorer result than most others and it is unknown why.

NTK · Sep 19, 2023

solderdude said:
That's what OP found to be the case and all the 'perhaps' in his post points to OP wondering about the why. It looks like OP did not find an explanation but wanted 'ASR' to know.

Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

From the E1DA picture, I am guessing each time slice is 400 ms wide (notice the Tw = 400 ms).

Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.

[Edit] Added the python code to generate the above plot for others to check my errors.

Python:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

def dB(x):
    return 20.0*np.log10(np.abs(x) + 5.0*np.finfo(float).eps)

def Df(x, y):
    return dB(np.sqrt(1.0 - np.abs(pearsonr(x, y).correlation)))

duration = 0.4
fs = 96000
nsamples = int(duration * fs)
t = np.arange(nsamples) / fs

clk_error_list = np.array([1e-4, 1e-5, 1e-6, 1e-7])
frequencies = np.logspace(np.log10(20), np.log10(20480), 201)

df_list = np.empty((np.size(clk_error_list), np.size(frequencies)))

for indx1, clock_error in enumerate(clk_error_list):
    for indx2, f_sig in enumerate(frequencies):
        signal_1 = np.sin(2*np.pi*f_sig * t)
        signal_2 = np.sin(2*np.pi*(1.0+clock_error)*f_sig * t)
        df_list[indx1, indx2] = Df(signal_1, signal_2)

fig, ax = plt.subplots(figsize=(8, 5))
for indx, clock_error in enumerate(clk_error_list):
    ax.semilogx(frequencies, df_list[indx, :], label='{:.2f} PPM'.format(1e6*clock_error))
   
ax.grid(True, which='both', axis='both')
ax.legend(loc='upper left')
ax.set_title('Df of Single Frequency Sine Waves Reproduced with Clock Error\n(Sample Duration 400 ms, Fs=96000 Hz)')
ax.set_xlabel('Sine Wave Frequency [Hz]')
ax.set_ylabel('Df [dB]')
fig.tight_layout()
plt.savefig('df_plot.png')
plt.show()

pkane · Sep 20, 2023

solderdude said:
I wonder if the same thing would occur with PK metric ?

No need to wonder

DF Metric is already computed by DeltaWave and has been for years, so it's fairly easy to compare it to other error values computed by DW, including to PK Metric.

DF metric is much more like the RMS of the null value than PK Metric, in that it doesn't use any psychoacoustics in the computation, but instead, computes a short time correlation coefficient between two waveforms (similar to the difference, but a different, and less obvious computation). It's rather an engineering metric based on the whole, unweighted frequency spectrum, just like the RMS null is. For this reason, phase differences, low or high frequency differences, DC filters, etc. can all affect the result of both, DF-metric and the RMS of null.

PK Metric is a psychoacoustically-weighted version of the difference (null) file that is taking into account equal loudness curves at various levels, and frequency and level masking. DF metric does no such thing. After conversations with Serge here and on other fora, I have not seen any explanation for why this metric might be appropriate or better than any other, including the RMS of null file.

solderdude · Sep 20, 2023

The question remains if the findings will be the same (is it better because of weighting).
And the other question also remains.... what could be the cause and is it audible ?
I mean if it is just amplitude variations in the order of 0.03dB for instance over time no one is going to notice. Small phase shifts over a wide bandwidth also are not audible.
So the question (which had been raised in the past) is the relation between audible and measurable, which was the goal after all, is it shown here.
The other question also remains... what is causing the findings ?
That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.
Of course most likely the ADC will differ which may react differently to HF noise etc.

I mean it is just one measurement now of 1 device measured on 1 ADC not under the same circumstances. The fact that Serge only 'suspects' and not comes with the actual explanation in his reporting tells me that the reporting is incomplete.

solderdude · Sep 20, 2023

NTK said:
Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

NTK said:
Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.
View attachment 313273
[Edit] Added the python code to generate the above plot for others to check my errors.

Yes, the ADC is also a part but if the same one is used for all tests it can be said to reach at least -80dB so that should be O.K.
It must have come from the E1DA.

Now the question pops up .... is it indeed the clock generator and would that not show up in J-test ?
And the second question is whether or not there happens to be a poor functioning oscillator in the E1DA or whether or not this is typical for these dongles (I don't think it is) so would a second one show the same thing in the same setup.
So... it would be interesting to get a second opinion (repeat of the test).

MaxwellsEq · Sep 20, 2023

This does feel like a test of DUT clock behaviour relative to the test rig clock behaviour. A malfunctioning DUT clock would result in more jitter, which should be measurable.

One claim with DF seems to be a reasonable correlation with subjective reviews of devices. So a logical argument might be that subjective reviewers are reacting to small frequency amplitude difference introduced by clock behaviours.

EDIT: could this be simulated by taking thousands of frequency sweeps and looking at the distribution? A stable clock would have a different result than one that has variable behaviour.

pkane · Sep 20, 2023

solderdude said:
The question remains if the findings will be the same (is it better because of weighting).
And the other question also remains.... what could be the cause and is it audible ?

Should be easy to tell if someone records a known track, say, this one: http://www.mediafire.com/file/5hg6wl6ygql7217/Original2.wav/file with an excellent ADC and 9038d. DeltaWave shows frequency, phase, and timing errors, including jitter and clock drift.

solderdude said:
That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.

Analysis can be done by anyone, if the recording is shared by someone who has access to a 9038d and a high quality ADC.

pkane · Sep 20, 2023

solderdude said:
That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.

Agreed. Until someone makes such a recording, here's an unrelated example, showing various DeltaWave results (including DF metric) using a well known ADI-2 Pro FS. THD+N of about -112dB, and SNR of about -115dB. Music recording done by someone on GearSpace forum:

RMS of difference (null): -53.6dB, -68.9dBA
DF Metric: -39.8dB
PK Metric: -80.4dBr

Here are the actual results:

Serge Smirnoff · Sep 20, 2023

AnalogSteph said:
Speaking of rain and parade and stuff - what specific underlying properties are these metrics supposed to isolate? (That would make the difference between a metric and a good metric.) What do they do better than the ones established for 20-30 years+? Or is it just the usual jumbled mess that people with no actual background in converter technology (or even EE at all) tend to come up with, a linear combination of so many factors that you can't make heads or tails of it? (The Gearslutz diff test debacle comes to mind.) I sure can't, and I've been dabbling in digital audio for like 20 years. I mean, I do know what output impedance is, so there's that, but otherwise...

The most specific underlying property which is isolated by df-metric (and is not defined/detected by the ones established for 20-30 years+) is the level of overall transparency of a device. Above a certain level of transparency an audio device can be bought without listening it as its artifact/sound signature is reliably imperceptible for human hearing. Such level of transparency can be easily set and measured within df-metric (this will be the topic for another article/paper).

Serge Smirnoff · Sep 20, 2023

MaxwellsEq said:
I think I understand, but it would be helpful if you better explained your methodology. Are you arguing that most measurements in the frequency domain are the sum of thousands of "instances" in the time domain. So if you look closely at the behaviour of those instances, you can differentiate between two devices whose frequency behaviour or whose averaged-time behaviour look the same?

Sorry for the lack of explanation. As csglinux has already described the methodology pretty simple and clear I can only add that your angle of view is also valid - the behavior of a DUT in time/freq domains with real music material characterizes it in detail. So, we can compare and sort the measured DUTs.

pjug · Sep 20, 2023

Serge Smirnoff said:
The most specific underlying property which is isolated by df-metric (and is not defined/detected by the ones established for 20-30 years+) is the level of overall transparency of a device. Above a certain level of transparency an audio device can be bought without listening it as its artifact/sound signature is reliably imperceptible for human hearing. Such level of transparency can be easily set and measured within df-metric (this will be the topic for another article/paper).

Can you put up the white noise files so we can compare them in DeltaWave?

Serge Smirnoff · Sep 20, 2023

amirm said:
My DAC measurements are at high impedance. You can't compare them to loading the output with 32 ohm. They are also at -1 dBFS. Your fine print says -10 dBFS??? If so, that is another variation. I have not seen anyone measure a DAC at -10 dBFS. Or with loading down to 32 ohm.

We also measured the device without 32Load:

The differences can be considered insignificant in the current discussion.

staticV3 · Sep 20, 2023

@Serge Smirnoff please upload the music mix "Diversity" so that we can try to replicate your results.

Serge Smirnoff · Sep 20, 2023

NTK said:
The ratings from the Df metric look to me like a random number generator.
For example, the Sony NWA105 got a median Df of -71.4 dB and the E1DA 9038 got -42.2 dB.
View attachment 313108 View attachment 313109

And here are Amir's measurements. Noise alone in the Sony is much worse than the E1DA. The Df numbers made no sense to me.

Thanks for the df-slides. They show that our measurements of Sine signals are also better for #9038. But more complex signals are better for NWA105. This is the point.

Serge Smirnoff · Sep 20, 2023

NTK said:
Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

View attachment 313269 View attachment 313270
From the E1DA picture, I am guessing each time slice is 400 ms wide (notice the Tw = 400 ms).
View attachment 313272

Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.
View attachment 313273
[Edit] Added the python code to generate the above plot for others to check my errors.

Python:

import numpy as np import matplotlib.pyplot as plt from scipy.stats import pearsonr def dB(x): return 20.0*np.log10(np.abs(x) + 5.0*np.finfo(float).eps) def Df(x, y): return dB(np.sqrt(1.0 - np.abs(pearsonr(x, y).correlation))) duration = 0.4 fs = 96000 nsamples = int(duration * fs) t = np.arange(nsamples) / fs clk_error_list = np.array([1e-4, 1e-5, 1e-6, 1e-7]) frequencies = np.logspace(np.log10(20), np.log10(20480), 201) df_list = np.empty((np.size(clk_error_list), np.size(frequencies))) for indx1, clock_error in enumerate(clk_error_list): for indx2, f_sig in enumerate(frequencies): signal_1 = np.sin(2*np.pi*f_sig * t) signal_2 = np.sin(2*np.pi*(1.0+clock_error)*f_sig * t) df_list[indx1, indx2] = Df(signal_1, signal_2) fig, ax = plt.subplots(figsize=(8, 5)) for indx, clock_error in enumerate(clk_error_list): ax.semilogx(frequencies, df_list[indx, :], label='{:.2f} PPM'.format(1e6*clock_error)) ax.grid(True, which='both', axis='both') ax.legend(loc='upper left') ax.set_title('Df of Single Frequency Sine Waves Reproduced with Clock Error\n(Sample Duration 400 ms, Fs=96000 Hz)') ax.set_xlabel('Sine Wave Frequency [Hz]') ax.set_ylabel('Df [dB]') fig.tight_layout() plt.savefig('df_plot.png') plt.show()

The “efficient resampling method” was developed. The time warping algo does it with any predefined accuracy. For example White Noise can be shrinked/stretched with df=-100dB accuracy in the current version. So, the origin of the time inconsistency in #9038 is not the time warping algo.

Serge Smirnoff · Sep 20, 2023

pkane said:
No need to wonder DF Metric is already computed by DeltaWave and has been for years, so it's fairly easy to compare it to other error values computed by DW, including to PK Metric.

DF metric is much more like the RMS of the null value than PK Metric, in that it doesn't use any psychoacoustics in the computation, but instead, computes a short time correlation coefficient between two waveforms (similar to the difference, but a different, and less obvious computation). It's rather an engineering metric based on the whole, unweighted frequency spectrum, just like the RMS null is. For this reason, phase differences, low or high frequency differences, DC filters, etc. can all affect the result of both, DF-metric and the RMS of null.

PK Metric is a psychoacoustically-weighted version of the difference (null) file that is taking into account equal loudness curves at various levels, and frequency and level masking. DF metric does no such thing. After conversations with Serge here and on other fora, I have not seen any explanation for why this metric might be appropriate or better than any other, including the RMS of null file.

Yes, that is the difference between df-metric and all others )) - it does not account psychoacoustics, instead it defines a threshold of transparency easily measurable and achievable at the current level of chip manufacturing. At that level the psychoacoustic properties of a degradation are not important, can be safely ignored.

Serge Smirnoff · Sep 20, 2023

staticV3 said:
@Serge Smirnoff please upload the music mix "Diversity" so that we can try to replicate your results.

We will provide all used/required signals for verification of the results.

pkane · Sep 20, 2023

Serge Smirnoff said:
Yes, that is the difference between df-metric and all others )) - it does not account psychoacoustics, instead it defines a threshold of transparency easily measurable and achievable at the current level of chip manufacturing. At that level the psychoacoustic properties of a degradation are not important, can be safely ignored.

Well, I wouldn't say it's "the difference between df-metric and all others"

RMS of null difference is just as valid as a threshold of transparency measure as df-metric, and perhaps is easier to understand as it is simply the RMS of the difference signal computed by subtracting the original and the recorded waveforms.

solderdude · Sep 20, 2023

pkane said:
DF Metric: -39.8dB
PK Metric: -80.4dBr

That's quite a difference in number. Seeing that PK metric is weighted it makes more sense as to how audible it is.
I mean... I have not seen any negative comments about the E1DA when it concerns sound quality.
-40dB is bordering on audible where -80 is inaudible.

Maybe the one Serge had has an issue with the clock or other issue.
For the sake of science it should be repeated with another device. Maybe even some ABX testing ?
Otherwise it is just a meaningless number skewed by something that may not even be audible.

Serge Smirnoff · Sep 20, 2023

pkane said:
Well, I wouldn't say it's "the difference between df-metric and all others" RMS of null difference is just as valid as a threshold of transparency measure as df-metric, and perhaps is easier to understand as it is simply the RMS of the difference signal computed by subtracting the original and the recorded waveforms.

The null-difference has one disadvantage - it depends on the level of a reference signal. Df is a ratio of the null RMS to the level of reference signal. So, Df is a relative parameter and has exactly the same physical meaning as simple RMS of null difference.

E1DA 9038D performance according to df-metric

Grand Contributor

Major Contributor

Master Contributor

Grand Contributor

Grand Contributor

Major Contributor

Master Contributor

Master Contributor

Active Member

Active Member

Major Contributor

Active Member

Master Contributor

Active Member

Active Member

Active Member

Active Member

Master Contributor

Grand Contributor

Active Member

Similar threads