• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Df Measurement

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,004
Likes
36,218
Location
The Neitherlands
One naive question on the Df metric. It is defined as a function of the linear correlation of two signals.
What about non linear correlations?

Linear (phase and amplitude) and non linear differences are all included.
However, when one measures with a resistive load one completely misses the effects output resistance can have on amplitude so it is rather limited in this aspect. Perhaps he can measure DF with 2 or 3 loads where 2 of them are rather extreme.
The pitfall is that all linear and non linear differences become an amplitude difference.
Something that should be done is listening to the Df signal to determine if it sounds 'nasty' (non linear dist) or perhaps pleasant (linear distortion) and one should know what of it is phase/time error (sample/DAC frequency variations are also converted to amplitude differences but may not be audible as long as there aren't 'jumps' but adjustment is gradual.

Serge is seriously working it, which I applaud. For me to consider it he would need to show what psychoacoustic evaluations he uses if he wants to convince me about a single number having relevance to 'accurate' sound quality.

I am a big proponent of nulling (especially amps with actual loads and cables) as they lend themselves for it perfectly but it is important to listen to the null and have 'classic' measurements (SINAD is not needed) to look for linear and non-linear distortions.
This holds the key to a successful evaluation of the Df. Of course, when Serge has an evaluation model that is as good, better or approaching the hearing and is able to asses the Df this way I would be really happy.

To convince 'subjective' audiophiles as well in their quest for 'nicer sounding' gear I don't think there will ever be full correlation as these individuals use their own 'filter' (brain) that not only takes their preference for tonal balance in consideration but also input from the eyes and how far they have turned up a volume control knob, the 'gear' they have in there (mindset) etc.
I wish Serge success with his endeavors but IMO is still not nearly there until he made a successful Df evaluation procedure and also tests DAPs/phones under different loads.
 
Last edited:
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
Thanks or posting the details. I have no conclusions to advance just processing the information and testing them for validity.

I will assume that he is reporting the results without any bias or manipulation.


Sometimes being "controlled listening tests" is overused. It depends on the purpose as to whether it is necessary. If you wanted to make conclusions based on a small sample space as to whether two sounds are different, then you would need a controlled listening test for the express reason of eliminating other variables that could explain the difference.

But this is more of a statistical sampling where the listening conditions are sufficiently randomized over a large sample space so presumably no one single variable dominates. In such a scenario, if you tossed a coin and reported the results, over a sufficiently large attempts, they would fall on a binomial distribution around the middle. As many people would roughly say they are different/worse as they say it is the same. But if there is a skew in that curve towards same or worse, it is possible to statistically conclude that the two are considered the same or considered different/worse to statistical significance.

As a form of empirical science, that would be a valid observation.

If he is able to find a positive correlation between Df values and the results that detected a difference to statistical significance then, while it might not be a perfect measure, it may have some validity. Note that in such empirical studies, it is not necessary to have a mechanistic explanation as to why this is so or even have a theory to explain it.

So the following

isn't necessarily a knock against it for the limited conclusions being drawn. If he claimed that Df somehow captures a mechanistic explanation of what makes sounds be perceived differently, then his metric must be consistent with psychoacoustic analysis. I don't see him making any such claim. So he shouldn't be held to that standard.



It is a valid question as to whether his measurement itself is robust and repeatable. While he might have done a large sampling of listeners, the number of samples he has is relatively small and so it would be difficult to make the case that his method is well-defined enough. All he can say is that for that set of measurements done, there was a positive correlation between Df values computed and perceived difference to statistical significance.

But for that conclusion to have validity and confidence, this needs to be tested against totally different set of samples (preferably by someone else using similar equipment which is why repeatability is important in science process) and a similar large sampling of listening conducted. If it shows the same positive correlation between his thresholds and perceived difference then the confidence level in the correlation increases and the measure becomes more useful. If the second set showed no such correlation, then his first sampling was a statistical aberration. If it showed a very different cut-off point then the reliability of that metric to capture "badness" would be in question.

The final test is one of forecastability. Take another sample, compute the Df values and based on earlier tests predict which ones would be perceived as different/worse. Now test that hypothesis over a sufficiently large population. If the predictions were supported in a statistically significant way, then it would be a valid metric for that purpose.


This is a valid concern. It is possible that the values fall so close to each other that just arbitrarily drawing a threshold between them may not be statistically justifiable. It could mean that the metric does not have sufficient resolution/granularity to separate out goodness from badness. BUT, it could also mean that the units tested were too similar to cause such a little spread and so the audibility correlation might be an accidental consequence of small numbers (spread) which would be caught in a totally different sample.

At the least, it suggests he needs a more "discriminating" metric that would spread it out. But then he is constrained by trying to differentiate between units that show very similar SINAD to claim that his metric can differentiate between them. That may just not be possible with his current metric.



Yes, it could be a diagnostic tool for QC purposes, not as a defining metric but rather as a way to prompt further enquiry if the number was to fall in the badness range with the caveat that it could be a false positive.

For a metric like that to be used here, it would require the forecastability test I mentioned above. On the other hand, we don't have any such result for SINAD other than at extremes. So, we are discussing a known devil vs an unknown devil. :)
The not-fully-expressed undercurrent in this discussion is that Df is supposedly a better metric for ranking than SINAD. Let's go with that.

Here's a pretty serious claim:
1595519676929.png

http://soundexpert.org/articles/-/blogs/audio-quality-of-sbc-xq-bluetooth-audio-codec

It's hard to evaluate the claim itself, so let's look at his rationale: both are under -50dB DF and their artefact signature is similar:
1595520648360.png

http://soundexpert.org/articles/-/blogs/audio-quality-of-bluetooth-aptx

The approach makes sense. But, reading the bolded section and the sentence before it, that means that there can be no general ranking based on Df if we what also expect is correlation with audibility (Serge's claims about the DF threshold of transparency notwithstanding). So there is no added utility over SINAD.
 
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
The current approach, suggesting elaboration of some set of objective audio parameters that will allow to produce audio equipment with some moderate but inaudible degradation of an audio signal is ineffective, leads to endless discussion about goodness or badness of various types of distortion and hardly results in any consensus as such discussion involves subjective tastes. This approach creates those muddy waters - a comfortable environment for the audio industry, allowing the manufacturers to produce audio devices of mediocre quality but convincing consumers that their “distortions” are best/inaudible/pleasant. This semantic level of audio information should not be touched at all, audio quality requirements can be safely defined on syntactic level, on the level of signal.

The final goal is to control the audio market that is now driven by manufacturers, which use asymmetry of information on the market and absence of reliable AQ metric for profiting. By means of creative marketing they have learned to control both supply and demand curves, which determine the price. Audio consumers, who do not want to be fooled by marketers, should return the “demand curve” under their control. Purchasing decisions should be rational (not emotional) and cooperative. Self-organization is the key for achieving this goal. Internet offers many opportunities for such self-organization, we just do not use them and instead - fighting with each other about tiny aspects of audio reproduction, lament over reluctance of manufacturers to follow recommendations of audio science (and the same time searching for excuses why it is not possible to achieve), complain about stupidity of audiophile community, etc. What an epic waste of time and intellectual resources.
This is worth quoting to understand his perspective. One of Serge's posts here.
 
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
Ah. I did not see this mentioned anywhere on Serge's website. But here he wrote:
Df level -50dB is for mass portable audio market. For other listening environments it could be, say, -70dB for critical listening environments, -90dB for studio equipment, -120dB for laboratory equipment (just rough estimations as an example). The solution is to make them very low, where audibility of distortion does not matter.
So things are getting more rather than less relative.
 

Vasr

Major Contributor
Joined
Jun 27, 2020
Messages
1,409
Likes
1,925
The approach makes sense. But, reading the bolded section and the sentence before it, that means that there can be no general ranking based on Df if we what also expect is correlation with audibility (Serge's claims about the DF threshold of transparency notwithstanding). So there is no added utility over SINAD.

I am not sure it is an all-or-nothing situation. Just because you can rank all units with a computed metric like SINAD doesn't mean it necessarily models anything in physical reality. Or that if you cannot do such ranking, it isn't useful.

Most we can say with SINAD is that as the number increases beyond a certain point (not very well defined), we can be confident that it is audibly transparent and if it falls below a certain point, it may have audible artifacts (but no guarantees). The ordering within those broad thresholds have no physical reality meaning necessarily. So, just being able to order with a metric linearly isn't necessarily an indicator of its usefulness. It just makes for easily consumed candy. :)

I think any problems with interpreting his thesis comes from evaluating it as a similar type of metric to SINAD. It isn't.

From what I understand of his thesis, he seems to be saying that Df measurements are useful within "similar" pieces of equipment defined as units with similar artifacts signatures. He claims with correlation studies that the measurement of device outputs within such "similar" devices show perceived differences in audibility AND (this is his main claim) that the Df measurement within those have sufficient "resolution" to be correlated with that perception. Whether you agree with that claim or not, such a metric if it works will provide more information than the SINAD which would classify all of them as of similar in quality for practical purposes. It is like looking in a microscope with 10x power to separate out cells with similar structure and then turning up the power to 50x to notice differences between them that correlate with their behavioral differences. His thesis seems to be that current metrics are that 10x lens and he has a 50x lens (the magnification numbers I have used are arbitrary to illustrate as an analogy not related to any of his numbers).

So while Df may not be useful to rank every unit (with different artifact signatures), it may further differentiate those with similar signatures (or roughly translated similar SINAD numbers). IF his thesis were to hold, then you can look at SINAD as broadly classifying pieces of equipment (much as the colored bands do in the charts on this site) with no audibility difference claims within each while Df would be a magnifying glass that separate out audibly different units within segments of the SINAD chart.

The minimum threshold is a different claim/thesis from the above. The claim there is that within the above correlation, a Df value below a certain threshold would correlate with units that cannot be perceived as different in any audibility tests (whether they are transparent or not is irrelevant). So that would be the number below which one can say one unit is pretty much the same as another unit also with a number below it whatever their absolute quality may be. It is not an indicator of transparency but of resolution.

It seems to be a well-defined thesis but that is different from whether it holds sufficient evidence to back it up. It could be an incorrect well-defined thesis. :)
 
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
I am not sure it is an all-or-nothing situation. Just because you can rank all units with a computed metric like SINAD doesn't mean it necessarily models anything in physical reality. Or that if you cannot do such ranking, it isn't useful.

Most we can say with SINAD is that as the number increases beyond a certain point (not very well defined), we can be confident that it is audibly transparent and if it falls below a certain point, it may have audible artifacts (but no guarantees). The ordering within those broad thresholds have no physical reality meaning necessarily. So, just being able to order with a metric linearly isn't necessarily an indicator of its usefulness. It just makes for easily consumed candy. :)

I think any problems with interpreting his thesis comes from evaluating it as a similar type of metric to SINAD. It isn't.

From what I understand of his thesis, he seems to be saying that Df measurements are useful within "similar" pieces of equipment defined as units with similar artifacts signatures. He claims with correlation studies that the measurement of device outputs within such "similar" devices show perceived differences in audibility AND (this is his main claim) that the Df measurement within those have sufficient "resolution" to be correlated with that perception. Whether you agree with that claim or not, such a metric if it works will provide more information than the SINAD which would classify all of them as of similar in quality for practical purposes. It is like looking in a microscope with 10x power to separate out cells with similar structure and then turning up the power to 50x to notice differences between them that correlate with their behavioral differences. His thesis seems to be that current metrics are that 10x lens and he has a 50x lens (the magnification numbers I have used are arbitrary to illustrate as an analogy not related to any of his numbers).

So while Df may not be useful to rank every unit (with different artifact signatures), it may further differentiate those with similar signatures (or roughly translated similar SINAD numbers). IF his thesis were to hold, then you can look at SINAD as broadly classifying pieces of equipment (much as the colored bands do in the charts on this site) with no audibility difference claims within each while Df would be a magnifying glass that separate out audibly different units within segments of the SINAD chart.

The minimum threshold is a different claim/thesis from the above. The claim there is that within the above correlation, a Df value below a certain threshold would correlate with units that cannot be perceived as different in any audibility tests (whether they are transparent or not is irrelevant). So that would be the number below which one can say one unit is pretty much the same as another unit also with a number below it whatever their absolute quality may be. It is not an indicator of transparency but of resolution.

It seems to be a well-defined thesis but that is different from whether it holds sufficient evidence to back it up. It could be an incorrect well-defined thesis. :)
I agree.

This is one of those cases where a lot of resources, staff and a lab would help.
 

bobbooo

Major Contributor
Joined
Aug 30, 2019
Messages
1,479
Likes
2,079
To be clear, there are several things here that need to be delineated. I think this quote from Serge helps (my emphasis):
The purpose of such analysis is to find whether these results support our two main working hypotheses:

(1) There is a dependency between degradation of waveform of a stimulus signal and degradation of perceived sound quality of that stimulus signal.
(2) This dependency is more pronounced when type/nature of degradation is similar for tested items.

Note #1.We measure degradation of waveforms with Difference level (Df, dB), perceived sound quality – with Quality scores (Qs) from listening tests and similarity of degradation – with Spearman's distance between sequences of Df values. Each Df sequence represents piece-wise (400ms) degradation of a stimulus signal and can be considered as a degradation signature of a codec. So, similarity of degradation is similarity between degradation signatures.

So as we see here, there are actually three distinct metrics Serge is using in his analysis - the difference level (median Df value), subjective sound quality score (Qs), and a similarity metric between sequences of Df values for different DUTs (all Df values computed for every 400ms section of the difference signal). Qs will obviously depend on the quality of the subjective data, which I'm sure everyone is aware is no small feat to achieve. The similarity metric, while commendably ambitious, is necessarily complex and probably even more difficult to get right. The first metric however, median Df value, is relatively straightforward, and can be used independently of the other metrics, yet still have utility in objective audio measurements. I have been talking exclusively about this first metric, the median Df value so far, as I think it's important we nail that down and determine its advantages/disadvantages first, before moving onto the other, more complex and ambitious metrics.

The not-fully-expressed undercurrent in this discussion is that Df is supposedly a better metric for ranking than SINAD. Let's go with that.

Here's a pretty serious claim:
View attachment 74805
http://soundexpert.org/articles/-/blogs/audio-quality-of-sbc-xq-bluetooth-audio-codec

It's hard to evaluate the claim itself, so let's look at his rationale: both are under -50dB DF and their artefact signature is similar:
View attachment 74806
http://soundexpert.org/articles/-/blogs/audio-quality-of-bluetooth-aptx

The approach makes sense. But, reading the bolded section and the sentence before it, that means that there can be no general ranking based on Df if we what also expect is correlation with audibility (Serge's claims about the DF threshold of transparency notwithstanding). So there is no added utility over SINAD.

Please let me know which (if any) of the below points you disagree with that leads you to conclude median Df value (not any of Serge's other metrics I mentioned above) has no added utility over SINAD:
  1. SINAD is a measure of total signal degradation, only for cases in which the signal is a pure sine tone.
  2. DUTs can be ranked by their SINAD, which will tell you how much they degrade a sine tone input signal.
  3. An approximate limit of SINAD audibility can be determined (suggested as ~120dB on here), above which a DUT can be said to impart no audible signal degradation on the sine tone. (This limit will likely depend on the noise floor of the listening environment, so could in practice be lower in different situations with higher noise floor.)
  4. Df is a measure of total signal degradation, for all possible input signals.
  5. DUTs can be ranked by their median Df value, which will tell you how much they degrade any input signal, including real/simulated music.
  6. An approximate limit of median Df value audibility can be determined (suggested as ~-50dB by Serge, for portable listening situations), below which a DUT can be said to impart no audible signal degradation to music, which is after all what we listen to, not sine tones. (This limit will likely depend on the noise floor of the listening environment, so could in practice be lower in different situations with lower noise floor.)
 
Last edited:
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
@bobbooo This is becoming unproductive. I took a lot of time to address the technique and my issues with it in detail.

In short: before this technique is adopted, controlled testing should show how it responds to different signals and electrical conditions. Like a few reviews which shows how Df can be used to diagnose problems traditional measurements cannot, and how Df can be used to characterize the total electrical behaviour of a DUT. This would be enough. Afterward that's been established, relate the numerical Df claims to psychoacoustic metrics (e.g., loudness analysis).
 

patate91

Active Member
Joined
Apr 14, 2019
Messages
253
Likes
137
At this point I think experts, scientists, people interest of trying new path and push boundaries should participate.

Changes are "hard" To deal with, and industries corporations won't change anytime soon. The research will remains underground for a while.
 

bobbooo

Major Contributor
Joined
Aug 30, 2019
Messages
1,479
Likes
2,079
@bobbooo This is becoming unproductive. I took a lot of time to address the technique and my issues with it in detail.

In short: before this technique is adopted, controlled testing should show how it responds to different signals and electrical conditions. Like a few reviews which shows how Df can be used to diagnose problems traditional measurements cannot, and how Df can be used to characterize the total electrical behaviour of a DUT. This would be enough. Afterward that's been established, relate the numerical Df claims to psychoacoustic metrics (e.g., loudness analysis).

I'm sorry you feel that way, I think this thread has been quite productive actually, and has consolidated my understanding of both the Df metric and SINAD, with the help of various members' comments, including yours, as I hope it has done for others too. I really appreciate you taking the time to look into this in detail :) And those comments will be useful when looking at the other two ancillary metrics Serge uses that attempt to relate Df to subjective sound quality. I think there's been some wires crossed somewhere, maybe partly due to my previous loose language not making it clear I was specifically talking about the median Df value, and not any correlation it has with perceived sound quality.

The approach makes sense. But, reading the bolded section and the sentence before it, that means that there can be no general ranking based on Df if we what also expect is correlation with audibility (Serge's claims about the DF threshold of transparency notwithstanding). So there is no added utility over SINAD.

My point was we shouldn't expect the part I've bolded here for the median Df value alone, just as we don't expect it for SINAD, as they are both just total signal degradation metrics. (We should expect it however in conjunction with Serge's subjective sound quality metric Qs and the Df similarity metric.)

I now see you do have a caveat about the audible transparency threshold, which I must have missed on first read. Given that, do you agree that median Df value (or another null metric such as @pkane 's DeltaWave RMS of the difference signal, for which the argument will be the same) has utility over SINAD as described by the benefits of point 6 over point 3 in my previous post i.e. it can be used to determine the total signal degradation of a DUT when playing music, and if a transparency threshold can be determined highlight those DUTs that are audibly transparent when playing music instead of just sine tones as SINAD is limited to?

I do agree more real-world tests need to be done. I have done some myself, but only using my PC soundcard's ADC for the measurements. To really progress though a high quality ADC with low signal degradation is needed, and unfortunately I'm not in a position to afford that at this time.
 
Last edited:

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,698
Likes
37,434
I'm sorry you feel that way, I think this thread has been quite productive actually, and has consolidated my understanding of both the Df metric and SINAD, with the help of various members' comments, including yours, as I hope it has done for others too. I really appreciate you taking the time to look into this in detail :) And those comments will be useful when looking at the other two ancillary metrics Serge uses that attempt to relate Df to subjective sound quality. I think there's been some wires crossed somewhere, maybe partly due to my previous loose language not making it clear I was specifically talking about the median Df value, and not any correlation it has with perceived sound quality.



My point was we shouldn't expect the part I've bolded here for the median Df value alone, just as we don't expect it for SINAD, as they are both just total signal degradation metrics. (We should expect it however in conjunction with Serge's subjective sound quality metric Qs and the Df similarity metric.)

I now see you do have a caveat about the audible transparency threshold, which I must have missed on first read. Given that, do you agree that median Df value (or another null metric such as @pkane 's DeltaWave RMS of the difference signal, for which the argument will be the same) has utility over SINAD as described by the benefits of point 6 over point 3 in my previous post i.e. it can be used to determine the total signal degradation of a DUT when playing music, and if a transparency threshold can be determined highlight those DUTs that are audibly transparent when playing music instead of just sine tones as SINAD is limited to?

I do agree more real-world tests need to be done. I have done some myself, but only using my PC soundcard's ADC for the measurements. To really progress though a high quality ADC with low signal degradation is needed, and unfortunately I'm not in a position to afford that at this time.
Here is the problem with null testing if you just drop it into a discussion as one number. You can have some null results of -50 db or so which are fully transparent, and others which are not at all transparent. You'll have similar issues with the Df number which doesn't seem to make much of a step toward eliminating that problem. In fact you have the same issues though with a much larger gray area with SINAD.

In the Gearslutz null testing thread, you have some mighty high end items with pedestrian results. In such cases what you often find if you investigate is phase issues at the lower and upper extremes are ruining the null result even though such effects are completely innocuous to a human listener. I don't see that Df get's us past it.
 

bobbooo

Major Contributor
Joined
Aug 30, 2019
Messages
1,479
Likes
2,079
you have the same issues though with a much larger gray area with SINAD

So that's another area in which Df improves on SINAD.
In the Gearslutz null testing thread, you have some mighty high end items with pedestrian results.

How are you defining 'high end'? I hope not by price, because as @pozz has shown here that's not a good predictor of SINAD, so there's no reason to expect it to be a good predictor of other performance metrics, including Df. If by 'high end' you instead mean 'high SINAD', that's also problematic. Of the 45 devices with full Df measurements by Serge so far, there's poor correlation between the Df value of a DUT when playing a sine tone and the Df when playing real/simulated music e.g. the previously mentioned FiiO M11 which has a relatively low (good) measured sine Df, yet a relatively high (bad) music Df. As sine Df is indicative of SINAD (effectively the approximate inverse ratio), this shows that SINAD is a poor predictor of a DUT's total audio degradation when playing music. So I see this 'discrepancy' of 'high end' devices showing 'pedestrian' results as merely exposing the true audio degradation of the DUT under the actual conditions of its usage (playing music).

Then we are left with the question of the threshold of audibility of this degradation. The high bar of 120dB for SINAD was chosen as to be safely above the 'gray area' in all listening situations. An equivalently high bar can be determined for median Df value. As Serge said, this hard limit is likely to be significantly below -50dB (this is just the value he suggested for portable listening on-the-go, in which the environmental noise floor is likely to be relatively high). That hard lower limit of audibility of Df value will be safely beyond any gray area, just like the 120dB limit for SINAD, so at that point you won't need to worry about any psychoacoustics. The Gearslutz results show low enough Df values are definitely achievable, with a few DUTs reaching ~-70dB. And remember, these are DA/AD loop tests of audio interfaces, so Df values of high-quality dedicated DACs as measured by a high-quality ADC e.g. an AP Analyzer could reveal even lower true Df values. Popularizing the Df metric and computing it for more DUTs will only drive manufacturers to improve their performance when playing music and lower these Df values even further, just as ASR's SINAD measurements have done for sine tone performance.

As for your point about inaudible phase differences ruining the null, notwithstanding the above argument that choosing a low enough hard Df limit would render that moot, DeltaWave can limit the Df computation to the audible band, so these phase differences are restricted to this range and so there is always a chance they could be audible if above the hard audibility limit. And if you wish, you can even use DeltaWave's 'non-linear phase EQ' option for its Df computation which will reduce these phase differences, resulting in an alternative, more 'leniant' (lower) Df value. (I personally would prefer not to do this, to guarantee inaudibility if the Df value is below the hard limit, but maybe both the original and non-linear phase EQed Df values could be given for those who want it - it would only take an extra few seconds for a second computation with the Program Simulation Noise recording anyway.)
 
OP
pozz

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
I'm sorry you feel that way, I think this thread has been quite productive actually, and has consolidated my understanding of both the Df metric and SINAD, with the help of various members' comments, including yours, as I hope it has done for others too. I really appreciate you taking the time to look into this in detail :) And those comments will be useful when looking at the other two ancillary metrics Serge uses that attempt to relate Df to subjective sound quality. I think there's been some wires crossed somewhere, maybe partly due to my previous loose language not making it clear I was specifically talking about the median Df value, and not any correlation it has with perceived sound quality.



My point was we shouldn't expect the part I've bolded here for the median Df value alone, just as we don't expect it for SINAD, as they are both just total signal degradation metrics. (We should expect it however in conjunction with Serge's subjective sound quality metric Qs and the Df similarity metric.)

I now see you do have a caveat about the audible transparency threshold, which I must have missed on first read. Given that, do you agree that median Df value (or another null metric such as @pkane 's DeltaWave RMS of the difference signal, for which the argument will be the same) has utility over SINAD as described by the benefits of point 6 over point 3 in my previous post i.e. it can be used to determine the total signal degradation of a DUT when playing music, and if a transparency threshold can be determined highlight those DUTs that are audibly transparent when playing music instead of just sine tones as SINAD is limited to?

I do agree more real-world tests need to be done. I have done some myself, but only using my PC soundcard's ADC for the measurements. To really progress though a high quality ADC with low signal degradation is needed, and unfortunately I'm not in a position to afford that at this time.
I wrote that Df has "no added utility" over SINAD because both are useful but have similar problems. In the end they are one-number metrics which say nothing about spectra or characteristic nonlinearities they capture.

There is also the supposed ease of the testing procedure: just play Program Simulation Noise and null the results, and post the Df value. But testing is not easy: Df results are specific to electrical conditions, signal types and the timing accuracy of the null. This doesn't mean Df results are invalid, it just narrows the scope of validity. To assess what's been captured by the Df figure you would need to do another round of traditional tests on the DUT to establish electrical behaviour and separately analyze the content of the recorded signal. Until this is done there will be no sense of what causes Dfs to vary for that particular device, and will have to be repeated across a range of devices before Df sensitivity becomes intuitive.

I'm also glad I found Serge's comments about varying thresholds. His listening tests targeted degredation as such without considering listening level or spectrum. This returns us to familiar ground about listening conditions determining audibility and throws his numbers into question. Again, it doesn't mean that his results are invalid, just not generalizable as yet.

There is a lot of work left to be done before Df can be meaningfully introduced.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,670
Likes
10,300
Location
North-East
The Gearslutz results show low enough Df values are definitely achievable, with a few DUTs reaching ~-70dB.
...
And if you wish, you can even use DeltaWave's 'non-linear phase EQ' option for its Df computation which will reduce these phase differences, resulting in an alternative, more 'leniant' (lower) Df value.

This is exactly the reason Df and Gearslutz results need to be taken with a grain of salt. A small non-linear phase difference, say due to the reconstruction or antialias filter, can cause significantly different results with both metrics. As you say, DeltaWave can measure and compensate for these differences. These differences are often very minor, and most likely not audible (this needs to be demonstrated), and yet they can result in a very significant difference in Df and Gearslutz numbers, sometimes on the order of 10-30dB! That same -70dB result posted on Gearslutz thread can produce a better than -90dB result when phase difference is eliminated using DeltaWave. That's something I've discussed with both, Serge and Didier (Gearslutz thread author) but both didn't accept that minor phase differences might matter that much in their respective computations. And yet, that's something that's easy to measure.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,698
Likes
37,434
So that's another area in which Df improves on SINAD.


How are you defining 'high end'? I hope not by price, because as @pozz has shown here that's not a good predictor of SINAD, so there's no reason to expect it to be a good predictor of other performance metrics, including Df. If by 'high end' you instead mean 'high SINAD', that's also problematic. Of the 45 devices with full Df measurements by Serge so far, there's poor correlation between the Df value of a DUT when playing a sine tone and the Df when playing real/simulated music e.g. the previously mentioned FiiO M11 which has a relatively low (good) measured sine Df, yet a relatively high (bad) music Df. As sine Df is indicative of SINAD (effectively the approximate inverse ratio), this shows that SINAD is a poor predictor of a DUT's total audio degradation when playing music. So I see this 'discrepancy' of 'high end' devices showing 'pedestrian' results as merely exposing the true audio degradation of the DUT under the actual conditions of its usage (playing music).

Then we are left with the question of the threshold of audibility of this degradation. The high bar of 120dB for SINAD was chosen as to be safely above the 'gray area' in all listening situations. An equivalently high bar can be determined for median Df value. As Serge said, this hard limit is likely to be significantly below -50dB (this is just the value he suggested for portable listening on-the-go, in which the environmental noise floor is likely to be relatively high). That hard lower limit of audibility of Df value will be safely beyond any gray area, just like the 120dB limit for SINAD, so at that point you won't need to worry about any psychoacoustics. The Gearslutz results show low enough Df values are definitely achievable, with a few DUTs reaching ~-70dB. And remember, these are DA/AD loop tests of audio interfaces, so Df values of high-quality dedicated DACs as measured by a high-quality ADC e.g. an AP Analyzer could reveal even lower true Df values. Popularizing the Df metric and computing it for more DUTs will only drive manufacturers to improve their performance when playing music and lower these Df values even further, just as ASR's SINAD measurements have done for sine tone performance.

As for your point about inaudible phase differences ruining the null, notwithstanding the above argument that choosing a low enough hard Df limit would render that moot, DeltaWave can limit the Df computation to the audible band, so these phase differences are restricted to this range and so there is always a chance they could be audible if above the hard audibility limit. And if you wish, you can even use DeltaWave's 'non-linear phase EQ' option for its Df computation which will reduce these phase differences, resulting in an alternative, more 'leniant' (lower) Df value. (I personally would prefer not to do this, to guarantee inaudibility if the Df value is below the hard limit, but maybe both the original and non-linear phase EQed Df values could be given for those who want it - it would only take an extra few seconds for a second computation with the Program Simulation Noise recording anyway.)
High end being actual high performance.

I'm quite familiar with Deltawave as I've been beta testing it since Paul first released it.
 
Top Bottom