For
@bobbooo, who keeps bringing the topic up in other review threads.
Thanks! I was about to create a thread myself actually. It was just two review threads by the way
Both in response to related comments from other members - the
first asking Amir directly if he could do null difference tests using DeltaWave (which also computes the Df metric), and the
second a discussion about whether the high measured SINAD (and other standard metrics) of the Okto dac8 make it the 'best DAC in the world'. My point there was that could not be determined without quantifying the DAC's performance when playing real music (or a close analogue thereof in the form of an already standardized test signal which was developed to have similar spectral content to music i.e. the Program Simulation Noise). But I agree those two threads were getting overloaded with this discussion which deserves its own thread. (Unfortunately it seems the previous thread on the topic devolved into ad hominem and only tangentially related arguments, so I think a fresh start is warranted.)
The Df values on Serge's site aren't referenced to electrical values, SPL or psychoacoustic metrics. They are self-contained and map null results among each other. "Total sound degredation" measured this way is an abstraction.
View attachment 74538
The electrical setup is very specific. There would have be a lot of work put into a standard set of tests which would explain the Df metric. It would be time-consuming to say the least.
For example, I looked up Program Simulation Noise, which apparently is soft-clipped filter-shaped pink noise. What device behaviour is causing the Questyle QP1R to react to that signal? Is it clipping? Is the DAC filter inadequate?
What is the Df between Program Simulation Noise and regular pink noise? What is the Df between pink noise or white noise generated by different sources? [Edit: I meant between different digital sources. Because pink/white noise works according to a probability function, there will be differences in the signal despite sounding the same.] What about Df for the same signal, with one is attenuated by -10dBFS?
Apparently the Program Simulation Noise Df for the Chord Hugo 2 is -32.6dB, the best result, and is -25.6dB for the FiiO M11, out of eleven DUTs. What does that ~7dB range mean?
Unless I've misunderstood, the Df "sound signature" can't be mapped audibly and it can't be used to diagnose engineering problems.
I'll repost my answers from the previous threads for reference as I've gone over some of this before:
It's the Df metric Serge of SoundExpert uses i.e. the ratio of the RMS level of the difference signal to the RMS level of the original test signal, as defined in his
AES paper:
They're [SINAD and the Df metric] both effectively signal to 'noise' (in its most general sense of unwanted sound) ratios, or a 'noise' to signal ratio in the case of the Df metric. The latter just seems a more generalized version of SINAD to me, taking into account all signal degradation instead of just THD+noise (just actual noise), and applicable to any input signal, including real music.
So if the Df metric is an abstraction, SINAD is just as much so. Serge does actually specify reference levels for playback tests - the same maximum level recommended by the EN 50332-2 standard (see his full methodology at the bottom of
this page). It could be argued this is an arbitrary choice, but the same thing could be said about the level used for SINAD measurements. Note: the EN 50332-2 level was chosen by Serge because he was initially interested in testing portable devices, which the standard was made for. The standard also specifies testing with 32 ohm loads, I presume to simulate an average pair of headphones. (As almost all modern portable players follow this standard these days, just setting them to max volume will usually yield the same 150mV level after the 32 ohm loads, as Serge's testing diagram prescribes). But this does not mean the Df metric could not also be used for larger DACs/amps used for speaker playback - standard test levels just need to be chosen (again, just like SINAD), and the 32 ohm loads would not be needed. Testing portable players also without loads would give their line-out performance anyway, so that would still be useful and easily obtainable data, if adding loads would be too time consuming (although that would just be a one-time soldering job really).
I presume you meant what device behaviour is causing the
FiiO M11 (not the Questyle QP1R) to 'react' badly to the Program Simulation Noise (hereafter PSN), yet not the sine signal? (The inverse is true for the QP1R.) I think that's the beauty of the Df metric in a way - it highlights all possible sound degradation when playing actual (or simulated) music, some of which we may not currently know the cause or mechanism of. In the FiiO M11's case I can't imagine it's clipping otherwise I would have thought the sine Df would also be adversely affected, no? Maybe it's come kind of as yet unknown nonlinear effect due to the complexity of the PSN/music waveform, who knows. Of course, from an engineering perspective this would be useful to know, but for ranking sound degradation (what the Df metric was intended for) not really - all that matters are the correlations between the inputs and outputs, everything else can be a black box for that purpose.
To your point about generated pink noise (and so the PSN) not being identical due to probability functions, this can easily be overcome by just using the same identical source file for all tests, such as the ones pre-generated and included in
Audio Precision's Audio Player Test Utility. As for the Df between the PSN and pink noise, that could be determined through the generation of the PSN from a known pink noise file, saving both and running them through DeltaWave to compute the Df. (Not exactly sure why you want to know this value though.) Linear level (as well as time shift) differences are adjusted for in the Df computation, so two signals with their only difference being a 10dBFS attenuation would have identical Df values.
Can you say what a 7dB difference in SINAD really
means intuitively? That's not that easy for me, and I don't think you can even get a feel for that for Df values until a large number and range of devices are measured. What you can see (and hear) using DeltaWave that is intuitive to understand is the actual difference signal between the original and recorded sound file produced by any device, which is quite fascinating. If you listen to a difference signal of real music and turn your headphones/speakers up, you can actually still hear the form of the original music, and comparing these difference signals between devices, hear the differences in level and noise in this signal across DUTs, directly listening to the degradation the devices are imparting on your music. Thinking about this, it may be possible to actually work out a limit on perceptible relative Df values between devices, by for example ABXing difference signals of ever-closer Df value, until they can no longer be distinguished. Of course this doesn't take into account perceptual masking when listening to real music, but it could be a useful hard lower limit at which it can be safely said two devices with Df values closer than this limit will have comparable levels of degradation to your ears (the limit could even be individual depending on your performance in the ABX test). In a similar way, a hard lower limit on
absolute Df value audibility, and so pretty much guaranteed transparency with music, could be determined by ABXing difference signals of ever-decreasing Df value against digital silence, ending up with whatever the Df equivalent of the often quoted ~120dB limit on SINAD audibility turns out to be. Additionally, Serge is also
attempting to quantify the correlation between null difference measurements and listening tests results here, which has shown some promising results. Is there strong quantifiable evidence for the correlation between SINAD and listening tests?
This AES paper by Steve Temme and Sean Olive doesn't sound too promising (my emphasis):
In summary, there appears to be some moderate positive correlations between the amount of THD measured in the headphones and their sound quality rating.
In summary, among the distortion metrics we chose in this study, non-coherent distortion based on music appears to be to be more correlated with listeners’ preference ratings than the THD, IM and Multitone.
Finally, this study provides further experimental evidence that traditional nonlinear distortion measurements are not particularly useful at predicting how good or bad high caliber headphone sounds.
I personally see the Df metric being most useful as an objective, pure measure of audio signal degradation though i.e. a natural extension and expansion of SINAD to encompass all unwanted changes in the electrical audio chain, and using real (or simulated) music instead of test tones for a more accurate relation to real-world listening.