• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

When do we start looking at NCD?

What is tough is that the difference is so high. That is, if original file and recording is -44 dB PK Metric, we can confidently say that the reproduction chain is not transparent to the digital file but the question is what is the difference? To use a statistical analysis, you can run an ANOVA but the Tukey Post Tests help you hone down on more details.

I’d get that sort of score with a UB9000 to Topping PA5 to E1DA Cosmos which should be reasonable, except for my recording level being lower where noise may in fact dominate.

Is that noise noticeable only if I were to raise the recording to a theoretical +0 dB?

That is, the source recording has no reference volume. Presumably for a digital file with reasonable dynamic range in a 24 bit container, you could have 120 dB true dynamic range out of 144 dB and let’s say it’s movie standard so 124 dB average, 144 dB peaks, and noise of 24 dBz. When compared to a recording that may reflect 70 dB average volume, 90 dB peaks, the noise in the system is going to dominate, and when the PK Metric level matches to the recording level to the digital source, the noise will really rise!

I guess question #2. For level matching, does DeltaWave match comparison volume to the reference volume? At which point I should have the recording as reference and digital source as comparison? Or does DeltaWave raise the volume of the quieter file to the louder one?

It’s an easy test but I am admittedly away from my desktop right now…

This is why DeltaWave includes various other analysis tools. PK Metric is just the perceptually-weighted version of the difference file. There are about 20+ other analysis tools built into DeltaWave to let you study the differences between two files in minute detail.

DeltaWave corrects for all the major linear errors between reference file and comparison. It does it by applying corrections to the comparison file, never to the reference. The default linear corrections include:

1. Phase error correction (delay removal) up to a tiny fraction of a sample
2. Level matching, including DC offset removal
3. Clock drift removal when two devices are used that run off different clocks (like in your example)
4. Option to trim front and end of the recording if the equipment doesn't start recording at 100% (can be caused by digital filters, PLLs that don't sync right away, fade-in volume controls, etc).

For other types of errors, like those caused in the frequency (and phase) domain by filters, for example, DeltaWave also provides tools to account for those, but these are more advanced tools and require some understanding of what you're doing. The basis for these tools is blind deconvolution.

Once linear differences are removed, DeltaWave computes various metrics, including RMS null and PK Metric (also DF Metric, and others). Which one you use is up to you, but you need to understand the purpose and the limitations of each.
 
To be fair, @pkane is offering software for free that easily could be sold for thousands. Additionally, as we go beyond simple THD, IMD, Multitone, the more tools/options we have the better…
Oh sure, the software is great and even better that it's free, we're all grateful for that. I routinely recommend Distort for example to those claiming that miniscule amounts of distortion are audible and ask them to do ABX tests to demonstrate their extraordinary claims (sadly rarely if ever do they take this up). But we should also recognize the limitations of the PK metric, and requiring 30s to a minute samples, and failing to properly account for differences that are likely (and were in the above mentioned case of sharp bursts of noise) audible are indeed limitations of the software. (Note I have not and have never mentioned anything about exact audible thresholds.)
 
Oh sure, the software is great and even better that it's free, we're all grateful for that. I routinely recommend Distort for example to those claiming that miniscule amounts of distortion are audible and ask them to do ABX tests to demonstrate their extraordinary claims (sadly rarely if ever do they take this up). But we should also recognize the limitations of the PK metric, and requiring 30s to a minute samples, and failing to properly account for differences that are likely (and were in the above mentioned case of sharp bursts of noise) audible are indeed limitations of the software. (Note I have not and have never mentioned anything about exact audible thresholds.)
So where did Paul not clearly acknowledge the limitations like needing 30 seconds or more? Or needing care in how you use the software? All testing has parameters you need to pay attention to in order to get usable results.
 
Neither the minimum 30-second requirement nor the failure of the algorithm when presented with sharp bursts of noise were recognized as limitations of the software, rather the tester was blamed for "incorrect use of the software", and I was accused of not understanding the goal of the metric, which I do (not difficult), because I read it here:
The goal is to compute a difference result that more directly answers the question of whether the difference between two devices is likely to be audible or not
The metric fails in this goal with the above example. A perceptual metric that does not properly account for all listening conditions has limitations in its software, and these are through no fault of the tester, nor the person who points them out as such.
 
Last edited:
Neither the minimum 30-second requirement nor the failure of the algorithm when presented with sharp bursts of noise were recognized as limitations of the software, rather the tester was blamed for "incorrect use of the software", and I was accused of not understanding the goal of the metric, which I do (not difficult), because I read it here:

What’s your obsession with the number of seconds? Why is this a hindrance for anyone doing a measurement to record 30+ seconds and how is this a failure of the algorithm? It works exactly as designed. The requirement for longer recording is documented, so I really fail to see the problem: the tester used too short a recording. Is this that hard to understand? If you really wanted to know why a longer recording is necessary instead of trying to argue, I could explain the reason, but my feeling would be that it would be a waste of time.

And if a perceptual metric doesn't work properly for samples less than 30s long, that's also a failure of the metric to model human perception, because the ear doesn't need this long to detect differences in distortion/noise.

What are you talking about? How is the need to record a 30+ second measurement a failure of the algorithm that was never meant to be a perfect model of human hearing or perception? I'm sorry, but I do think you're either confused or just trolling at this point.

The goal is to compute a difference result that more directly answers the question of whether the difference between two devices is likely to be audible or not

I'll say it one more time: the point of PK Metric is to improve the null metric by applying perceptual weights that were enumerated in the PK Metric thread. That's all it is: an improvement on a null difference to include some of the more common perceptual weights. Anything else you're reading into it, it is not.
 
NewsFLASH

A small Windows testing program does not perfectly model and mimic human hearing and brain processing of hearing exactly to perfection...........
.........more at 11.
 
@GaryH I was wondering if we could put aside the customer complaint about PK's software. It derailed this thread, and there doesn't seem to be an issue in the first place. Even if there was an issue, not relevant.
 
Yeah believe it or not this thread is supposed to be about non-coherent distortion (which actually has scientific backing in the form of controlled listening tests), not alternative metrics brought up by others and claimed to be better without such evidence. So, back on topic. Despite what some may have you believe earlier in this thread, Temme and Olive did not find that NCD was 'no better than THD' in its (anti)correlation with preference. Here's what the actual paper says:
non-coherent distortion based on music appears to be more correlated with listeners’ preference ratings than the THD, IM and Multitone
These results for headphone distortion were not only corroborated in a following study by Temme, this time of car audio distortion (reproduced binaurally over headphones, and oh look, using samples less than 30 seconds!), which found:
non-coherent distortion based on music correlates significantly better with listeners’ preference ratings than THD, IM and NCD with Pink Noise measurements
non-coherent distortion using music as a stimulus showed the best correlation to human perception
But it also reported on audibility, with listeners hearing noticeable distortion above 0.2% NCD between 200 Hz and 2 kHz at 100 dBC.
 
and oh look, using samples less than 30 seconds!

Still going on about the 30 seconds? Too bad, I suspect you'll never get the difference between a measurement and an actual perception modeling. Moving on...

not alternative metrics brought up by others and claimed to be better without such evidence
What is wrong you, man? Where was any such claim made, and why do you keep arguing about this? I was asked about adding NCD to one of my software tools, and answered that I already have a similar metric and pointed to a scientific paper that shows NCD definition (a non-AES one) that is very similar to what DeltaWave computes. That was it. All this other crap about how PK Metric failed and how it requires 30 seconds and how it's much better than NCD is all in your head. Please stop the nonsense!

How about posting a concise definition of NCD (a mathematical one, please) and the results of these controlled tests showing the correlation to audibility for those of us not willing to pay for AES membership?
 
Still going on about the 30 seconds? Too bad, I suspect you'll never get the difference between a measurement and an actual perception modeling. Moving on...


What is wrong you, man? Where was any such claim made, and why do you keep arguing about this? I was asked about adding NCD to one of my software tools, and answered that I already have a similar metric and pointed to a scientific paper that shows NCD definition (a non-AES one) that is very similar to what DeltaWave computes. That was it. All this other crap about how PK Metric failed and how it requires 30 seconds and how it's much better than NCD is all in your head. Please stop the nonsense!

How about posting a concise definition of NCD (a mathematical one, please) and the results of these controlled tests showing the correlation to audibility for those of us not willing to pay for AES membership?

Section 3 of this earlier 2006 paper by some of the same authors gives NCD in mathematical terms. What you do for the PK metric incorporates this and more from what I can tell. You can look at it anyway if you are interested.
 

Section 3 of this earlier 2006 paper by some of the same authors gives NCD in mathematical terms. What you do for the PK metric incorporates this and more from what I can tell. You can look at it anyway if you are interested.

Yes, that's the paper I cited in my first response to the OP, and the formulation here is what lead me to think that DeltaWave already computes this or something very similar.
 
@GXAlan : I spent a little more time studying various papers on NCD and similar measures.

It appears that NCD is very similar to the DF metric that's already implemented in DeltaWave -- this is not a perceptually weighted metric. DF was first proposed by @Serge Smirnoff here. I found this metric to be not very different than the simple null metric that's been computed by DeltaWave, plus the null metric provides the error signal and spectrum, phase, etc., plots and DF does not.

Another metric that is similar is MTND (multitone total nonlinear distortion) which is more like the original DeltaWave RMS null (originally computed by AudioDiffMaker).

Both, MTDN and NCD include nonlinear distortions and IMD products, along with noise and can be measured with multitone or music test signals. Of course, so does DeltaWave's null, Serge's DF, and PK Metric. Out of all of these, PK Metric is the only measure that applies some of the more known perceptual weights to the resulting error function.

The authors of NCD metric (Steve Temme, et al) published a paper about 4 years later to describe a different metric that this time was based on a perceptual model (PTHD or perceptual total harmonic distortion). PTHD got as inputs a few perceptually weighted results that included frequency and level masking. These results were then fed into a neural network to train a neural model using preferences collected from a human study. The result was a good correlation between human preferences and the PTHD result, except the details of the neural model were never published (at least as far as I can tell), so there's no way for anyone to reproduce this without running a large study with many human subjects. What's more, the neural network layer is a complex non-linear function that provides no explanation as to why it works, other than it fits the data.
 
Last edited:
@GXAlan : I spent a little more time studying various papers on NCD and similar measures.

For me, once I am able to detect a difference, I am not sure where to go next to dig deeper. Is there a way to do an unweighted versus weighted PK metric? Or do unweighted, THD weighted, Noise weighted, THD+N weighted?

As it stands, PK Metric is very good as reducing the risk of “false positive” differences, but once you have a true difference, it can be hard to figure out more.

Obviously, everything you do is out of the goodness of your heart and I assume passion for audio science.

The latest version of multitone that is able to detect the frequency response and phase errors between two channels of a well designed DAC is a nice step forward in figuring out if things that may be heard subjectively can be measured in a way that is not obvious with current common approaches.
 
For me, once I am able to detect a difference, I am not sure where to go next to dig deeper. Is there a way to do an unweighted versus weighted PK metric? Or do unweighted, THD weighted, Noise weighted, THD+N weighted?

As it stands, PK Metric is very good as reducing the risk of “false positive” differences, but once you have a true difference, it can be hard to figure out more.

Obviously, everything you do is out of the goodness of your heart and I assume passion for audio science.

The latest version of multitone that is able to detect the frequency response and phase errors between two channels of a well designed DAC is a nice step forward in figuring out if things that may be heard subjectively can be measured in a way that is not obvious with current common approaches.

Unweighted PK metric is the same as the error spectrum -- it's the actual null or difference file that DeltaWave computes. Expressed as a single number, it's the RMS of the difference file. There's an A-weighted version of it, also, and while it's not as good as PK Metric in applying perceptual weights, it should give a good approximation based on just the audibility of the frequency spectrum:

1667850703640.png
 
Back
Top Bottom