• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

E1DA 9038D performance according to df-metric

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,132
Likes
36,721
Location
The Neitherlands
The ratings from the Df metric look to me like a random number generator.
For example, the Sony NWA105 got a median Df of -71.4 dB and the E1DA 9038 got -42.2 dB.

That's what OP found to be the case and all the 'perhaps' in his post points to OP wondering about the why. It looks like OP did not find an explanation but wanted 'ASR' to know.

I wonder if the same thing would occur with PK metric ?

Would also be nice if someone else repeated the measurements to see if it is generic or a single device.
In any case it is a 'null test' with a poorer result than most others and it is unknown why.
 

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,733
Likes
6,056
Location
US East
That's what OP found to be the case and all the 'perhaps' in his post points to OP wondering about the why. It looks like OP did not find an explanation but wanted 'ASR' to know.
Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

df_1.png
df_2.png

From the E1DA picture, I am guessing each time slice is 400 ms wide (notice the Tw = 400 ms).
E1DA-9038D.png


Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.
df_plot.png

[Edit] Added the python code to generate the above plot for others to check my errors.
Python:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

def dB(x):
    return 20.0*np.log10(np.abs(x) + 5.0*np.finfo(float).eps)

def Df(x, y):
    return dB(np.sqrt(1.0 - np.abs(pearsonr(x, y).correlation)))

duration = 0.4
fs = 96000
nsamples = int(duration * fs)
t = np.arange(nsamples) / fs

clk_error_list = np.array([1e-4, 1e-5, 1e-6, 1e-7])
frequencies = np.logspace(np.log10(20), np.log10(20480), 201)

df_list = np.empty((np.size(clk_error_list), np.size(frequencies)))

for indx1, clock_error in enumerate(clk_error_list):
    for indx2, f_sig in enumerate(frequencies):
        signal_1 = np.sin(2*np.pi*f_sig * t)
        signal_2 = np.sin(2*np.pi*(1.0+clock_error)*f_sig * t)
        df_list[indx1, indx2] = Df(signal_1, signal_2)

fig, ax = plt.subplots(figsize=(8, 5))
for indx, clock_error in enumerate(clk_error_list):
    ax.semilogx(frequencies, df_list[indx, :], label='{:.2f} PPM'.format(1e6*clock_error))
   
ax.grid(True, which='both', axis='both')
ax.legend(loc='upper left')
ax.set_title('Df of Single Frequency Sine Waves Reproduced with Clock Error\n(Sample Duration 400 ms, Fs=96000 Hz)')
ax.set_xlabel('Sine Wave Frequency [Hz]')
ax.set_ylabel('Df [dB]')
fig.tight_layout()
plt.savefig('df_plot.png')
plt.show()
 
Last edited:

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,740
Likes
10,472
Location
North-East
I wonder if the same thing would occur with PK metric ?

No need to wonder :) DF Metric is already computed by DeltaWave and has been for years, so it's fairly easy to compare it to other error values computed by DW, including to PK Metric.

DF metric is much more like the RMS of the null value than PK Metric, in that it doesn't use any psychoacoustics in the computation, but instead, computes a short time correlation coefficient between two waveforms (similar to the difference, but a different, and less obvious computation). It's rather an engineering metric based on the whole, unweighted frequency spectrum, just like the RMS null is. For this reason, phase differences, low or high frequency differences, DC filters, etc. can all affect the result of both, DF-metric and the RMS of null.

PK Metric is a psychoacoustically-weighted version of the difference (null) file that is taking into account equal loudness curves at various levels, and frequency and level masking. DF metric does no such thing. After conversations with Serge here and on other fora, I have not seen any explanation for why this metric might be appropriate or better than any other, including the RMS of null file.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,132
Likes
36,721
Location
The Neitherlands
The question remains if the findings will be the same (is it better because of weighting).
And the other question also remains.... what could be the cause and is it audible ?
I mean if it is just amplitude variations in the order of 0.03dB for instance over time no one is going to notice. Small phase shifts over a wide bandwidth also are not audible.
So the question (which had been raised in the past) is the relation between audible and measurable, which was the goal after all, is it shown here.
The other question also remains... what is causing the findings ?
That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.
Of course most likely the ADC will differ which may react differently to HF noise etc.

I mean it is just one measurement now of 1 device measured on 1 ADC not under the same circumstances. The fact that Serge only 'suspects' and not comes with the actual explanation in his reporting tells me that the reporting is incomplete.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,132
Likes
36,721
Location
The Neitherlands
Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.
View attachment 313273
[Edit] Added the python code to generate the above plot for others to check my errors.
Yes, the ADC is also a part but if the same one is used for all tests it can be said to reach at least -80dB so that should be O.K.
It must have come from the E1DA.

Now the question pops up .... is it indeed the clock generator and would that not show up in J-test ?
And the second question is whether or not there happens to be a poor functioning oscillator in the E1DA or whether or not this is typical for these dongles (I don't think it is) so would a second one show the same thing in the same setup.
So... it would be interesting to get a second opinion (repeat of the test).
 

MaxwellsEq

Major Contributor
Joined
Aug 18, 2020
Messages
1,783
Likes
2,712
This does feel like a test of DUT clock behaviour relative to the test rig clock behaviour. A malfunctioning DUT clock would result in more jitter, which should be measurable.

One claim with DF seems to be a reasonable correlation with subjective reviews of devices. So a logical argument might be that subjective reviewers are reacting to small frequency amplitude difference introduced by clock behaviours.

EDIT: could this be simulated by taking thousands of frequency sweeps and looking at the distribution? A stable clock would have a different result than one that has variable behaviour.
 
Last edited:

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,740
Likes
10,472
Location
North-East
The question remains if the findings will be the same (is it better because of weighting).
And the other question also remains.... what could be the cause and is it audible ?

Should be easy to tell if someone records a known track, say, this one: http://www.mediafire.com/file/5hg6wl6ygql7217/Original2.wav/file with an excellent ADC and 9038d. DeltaWave shows frequency, phase, and timing errors, including jitter and clock drift.

That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.

Analysis can be done by anyone, if the recording is shared by someone who has access to a 9038d and a high quality ADC.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,740
Likes
10,472
Location
North-East
That's why it would be interesting to repeat the test, by someone else, with a E1DA under load and done while comparing DF and PK.

Agreed. Until someone makes such a recording, here's an unrelated example, showing various DeltaWave results (including DF metric) using a well known ADI-2 Pro FS. THD+N of about -112dB, and SNR of about -115dB. Music recording done by someone on GearSpace forum:

RMS of difference (null): -53.6dB, -68.9dBA
DF Metric: -39.8dB
PK Metric: -80.4dBr

Here are the actual results:
1695208678708.png


1695208752667.png
1695208698548.png


1695208713522.png



1695208729999.png


1695209002914.png
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
Speaking of rain and parade and stuff - what specific underlying properties are these metrics supposed to isolate? (That would make the difference between a metric and a good metric.) What do they do better than the ones established for 20-30 years+? Or is it just the usual jumbled mess that people with no actual background in converter technology (or even EE at all) tend to come up with, a linear combination of so many factors that you can't make heads or tails of it? (The Gearslutz diff test debacle comes to mind.) I sure can't, and I've been dabbling in digital audio for like 20 years. I mean, I do know what output impedance is, so there's that, but otherwise...
The most specific underlying property which is isolated by df-metric (and is not defined/detected by the ones established for 20-30 years+) is the level of overall transparency of a device. Above a certain level of transparency an audio device can be bought without listening it as its artifact/sound signature is reliably imperceptible for human hearing. Such level of transparency can be easily set and measured within df-metric (this will be the topic for another article/paper).
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
I think I understand, but it would be helpful if you better explained your methodology. Are you arguing that most measurements in the frequency domain are the sum of thousands of "instances" in the time domain. So if you look closely at the behaviour of those instances, you can differentiate between two devices whose frequency behaviour or whose averaged-time behaviour look the same?

Sorry for the lack of explanation. As csglinux has already described the methodology pretty simple and clear I can only add that your angle of view is also valid - the behavior of a DUT in time/freq domains with real music material characterizes it in detail. So, we can compare and sort the measured DUTs.
 

pjug

Major Contributor
Forum Donor
Joined
Feb 2, 2019
Messages
1,776
Likes
1,563
The most specific underlying property which is isolated by df-metric (and is not defined/detected by the ones established for 20-30 years+) is the level of overall transparency of a device. Above a certain level of transparency an audio device can be bought without listening it as its artifact/sound signature is reliably imperceptible for human hearing. Such level of transparency can be easily set and measured within df-metric (this will be the topic for another article/paper).
Can you put up the white noise files so we can compare them in DeltaWave?
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
My DAC measurements are at high impedance. You can't compare them to loading the output with 32 ohm. They are also at -1 dBFS. Your fine print says -10 dBFS??? If so, that is another variation. I have not seen anyone measure a DAC at -10 dBFS. Or with loading down to 32 ohm.

We also measured the device without 32Load:
[df40]E1DA-9038D(BW)[Wf].png


[df40]E1DA-9038D(BW,noLoad)[Wf].png


The differences can be considered insignificant in the current discussion.
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
The ratings from the Df metric look to me like a random number generator.
For example, the Sony NWA105 got a median Df of -71.4 dB and the E1DA 9038 got -42.2 dB.
View attachment 313108View attachment 313109

And here are Amir's measurements. Noise alone in the Sony is much worse than the E1DA. The Df numbers made no sense to me.
index.php
index.php


Thanks for the df-slides. They show that our measurements of Sine signals are also better for #9038. But more complex signals are better for NWA105. This is the point.
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
Alright. Here is my attempt. Below are from Serge's AES convention paper retrieved from SoundExpert.org. The Df metric is basically computed from the Pearson correlation coefficient ρ(x,y) of the samples of 2 time aligned signals, with Df = sqrt( 1 - abs( ρ(x,y) ) )

As it was noted, Df is very sensitive to clock differences between the DAC and ADC (i.e. small frequency shift in the sampled signal due to clock frequency differences between the DAC and ADC). There is no indication that the "efficient resampling method" to mitigate this problem has been implemented. I am guessing the the current mitigation is to take relatively short time slices and time align each of those slices.

View attachment 313269View attachment 313270
From the E1DA picture, I am guessing each time slice is 400 ms wide (notice the Tw = 400 ms).
View attachment 313272

Here are my simulations of the Df value of single frequency sine tones (of frequencies from 20 to 20 kHz) captured with DAC/ADC clock offsets. The typical accuracies of crystal oscillators are in the tens of PPM. So there you go.
View attachment 313273
[Edit] Added the python code to generate the above plot for others to check my errors.
Python:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

def dB(x):
    return 20.0*np.log10(np.abs(x) + 5.0*np.finfo(float).eps)

def Df(x, y):
    return dB(np.sqrt(1.0 - np.abs(pearsonr(x, y).correlation)))

duration = 0.4
fs = 96000
nsamples = int(duration * fs)
t = np.arange(nsamples) / fs

clk_error_list = np.array([1e-4, 1e-5, 1e-6, 1e-7])
frequencies = np.logspace(np.log10(20), np.log10(20480), 201)

df_list = np.empty((np.size(clk_error_list), np.size(frequencies)))

for indx1, clock_error in enumerate(clk_error_list):
    for indx2, f_sig in enumerate(frequencies):
        signal_1 = np.sin(2*np.pi*f_sig * t)
        signal_2 = np.sin(2*np.pi*(1.0+clock_error)*f_sig * t)
        df_list[indx1, indx2] = Df(signal_1, signal_2)

fig, ax = plt.subplots(figsize=(8, 5))
for indx, clock_error in enumerate(clk_error_list):
    ax.semilogx(frequencies, df_list[indx, :], label='{:.2f} PPM'.format(1e6*clock_error))
  
ax.grid(True, which='both', axis='both')
ax.legend(loc='upper left')
ax.set_title('Df of Single Frequency Sine Waves Reproduced with Clock Error\n(Sample Duration 400 ms, Fs=96000 Hz)')
ax.set_xlabel('Sine Wave Frequency [Hz]')
ax.set_ylabel('Df [dB]')
fig.tight_layout()
plt.savefig('df_plot.png')
plt.show()
The “efficient resampling method” was developed. The time warping algo does it with any predefined accuracy. For example White Noise can be shrinked/stretched with df=-100dB accuracy in the current version. So, the origin of the time inconsistency in #9038 is not the time warping algo.
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
No need to wonder :) DF Metric is already computed by DeltaWave and has been for years, so it's fairly easy to compare it to other error values computed by DW, including to PK Metric.

DF metric is much more like the RMS of the null value than PK Metric, in that it doesn't use any psychoacoustics in the computation, but instead, computes a short time correlation coefficient between two waveforms (similar to the difference, but a different, and less obvious computation). It's rather an engineering metric based on the whole, unweighted frequency spectrum, just like the RMS null is. For this reason, phase differences, low or high frequency differences, DC filters, etc. can all affect the result of both, DF-metric and the RMS of null.

PK Metric is a psychoacoustically-weighted version of the difference (null) file that is taking into account equal loudness curves at various levels, and frequency and level masking. DF metric does no such thing. After conversations with Serge here and on other fora, I have not seen any explanation for why this metric might be appropriate or better than any other, including the RMS of null file.

Yes, that is the difference between df-metric and all others )) - it does not account psychoacoustics, instead it defines a threshold of transparency easily measurable and achievable at the current level of chip manufacturing. At that level the psychoacoustic properties of a degradation are not important, can be safely ignored.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,740
Likes
10,472
Location
North-East
Yes, that is the difference between df-metric and all others )) - it does not account psychoacoustics, instead it defines a threshold of transparency easily measurable and achievable at the current level of chip manufacturing. At that level the psychoacoustic properties of a degradation are not important, can be safely ignored.

Well, I wouldn't say it's "the difference between df-metric and all others" :) RMS of null difference is just as valid as a threshold of transparency measure as df-metric, and perhaps is easier to understand as it is simply the RMS of the difference signal computed by subtracting the original and the recorded waveforms.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,132
Likes
36,721
Location
The Neitherlands
DF Metric: -39.8dB
PK Metric: -80.4dBr
That's quite a difference in number. Seeing that PK metric is weighted it makes more sense as to how audible it is.
I mean... I have not seen any negative comments about the E1DA when it concerns sound quality.
-40dB is bordering on audible where -80 is inaudible.

Maybe the one Serge had has an issue with the clock or other issue.
For the sake of science it should be repeated with another device. Maybe even some ABX testing ?
Otherwise it is just a meaningless number skewed by something that may not even be audible.
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
Well, I wouldn't say it's "the difference between df-metric and all others" :) RMS of null difference is just as valid as a threshold of transparency measure as df-metric, and perhaps is easier to understand as it is simply the RMS of the difference signal computed by subtracting the original and the recorded waveforms.
The null-difference has one disadvantage - it depends on the level of a reference signal. Df is a ratio of the null RMS to the level of reference signal. So, Df is a relative parameter and has exactly the same physical meaning as simple RMS of null difference.
 
Top Bottom