Alternative method for measuring distortion

Blumlein 88 · Dec 8, 2019

Serge Smirnoff said:
Normally in df-metric there is no necessity to weight various distortions because all of them are already perfectly weighted in output music signal (m-signal) of a DUT. And we can measure and research distortion of that m-signal directly. Technical signals (t-signals) are only useful during development of audio equipment; no need to use them for assessment of audio quality. All required info about the latter is in the output m-signal.

There is another reason why distortion of m-signal has the highest status. Scale of its distortion has a point (around -50dB to my current understanding) when DUT becomes transparent for any listener. In other words, the output signal follows the input one so accurately that human hearing can not discern them. In the end a listener wants to have at the output of his amplifier exactly the same waveform, which he has in file.

So, color scale is absolute as well as the scale of Df values. And yes, some signals are distorted badly even in high quality audio devices, so they are red, in accordance with the measurements )).

To ask questions, when you say a Df level is -50 db is that dbFS of the difference, is that RMS level of the difference vs RMS level of the original, in other words what is - 50 db the ratio of in your way of listing it?

Serge Smirnoff · Dec 8, 2019

PierreV said:
OK, so most of the info is "by engineers for engineers", fair enough. I was more thinking in terms of the customers/average users point of view who would possibly be interested in a SQ synthetic measure/estimate that is grounded in science and also would like to know which particular characteristic "fails" an otherwise well-performing device.

Yes, I offer to that average user an aggregated natural (not synthetic)) audio parameter - the level of distortion of two hours of various music material (median of histogram). Hardly any other parameter could be better for predicting perceived audio quality.

Serge Smirnoff · Dec 8, 2019

Blumlein 88 said:
is that RMS level of the difference vs RMS level of the original

correct

Serge Smirnoff · Dec 8, 2019

pkane said:
Is this metric using differences in dB levels over a number of short fragments and then selecting the median value as representative of the device

Correct. 18564 pieces of two-hour signal (Tw=400ms)

Serge Smirnoff · Dec 8, 2019

pma said:
I think I understand. He takes two level matched time aligned files, applies algorithm to minimize phase differences, makes a difference and gets something like this:

Correct.

Serge Smirnoff · Dec 8, 2019

Xulonn said:
Are the listener testing results "preference" or "objective" data - or a combination of the two?

SE listening tests of codecs are normal blind tests with artifact amplification for high-bitrate codecs.

Serge Smirnoff · Dec 8, 2019

Blumlein 88 said:
Might be interesting if @Serge Smirnoff could look at Deltawave, and try it with his algorithm on some given files and see if the results are the same or very similar to Deltawave results. You can download Deltawave here:
https://deltaw.org/

Oh, and welcome to ASR Serge Smirnoff.

Yes, the core idea is the same - to measure and research difference signal for making inferences about audio quality. I will check it for sure.

Thanks, Blumlein.

msmucr · Dec 8, 2019

I admit, I haven't read all accompanying technical articles linked before. Honestly, I'm not really sure about such metrics.

Similar methods are there for quite long time, I recall for example thread at Gearslutz site, where someone collects AudioDiffmaker results from various AD/DA loops.. I personally don't find it very useful, rather misleading. This seems to be similar, just with defined set of various test signals and statistics.
I commonly use residuum analysis after subtraction for various small isolated comparisons of changed components in common chain, debugging of issues, quick checking of transparency etc. and it's indeed very useful for that.
However I was always bit skeptical about single all-encompassing figure based on signal difference level or correlation depth for generic evaluation of audio devices. Or more precisely about relation of such single figure to some perceptual differences.

Such difference is level rough indication about similarity of input and output signal and in general of course, the less difference, the better. But in practice it's not so good indicator for overal comparisons IMO, because there can be multitude of reasons, why signals are different (like all kinds of distortions with very different characteristics, phase shift...) and basically not all those reasons has the same weight in perceptible difference.
So two DUTs can have very similar overal level of diff signal, but one of them will be more transparent sounding.. for example because distortion will have different frequency distribution and different characteristics. Or with different example.. some device would have say 30 dB lower level of diff signal than another one. At first look it can indicate, the second device is vastly inferior to first one, however there can be just minimum phase interpolation filter at its DAC, which cause big difference in diff level, but in listening test, it's not necessarily perceived as problematic.
Also there can be issues with clock drifting of source and capture devices if those aren't synced, there are certainly ways to compensate that in measurement with varying degree of efficiency, but in that case you're also altering effects of clocking at source device, which is still one of important aspects for comparisons.

I'll definitely check those papers.. and please don't take my initial comment as sheer negativity towards your work or "diffing" in general. Just with my previous experiences, it had several natural limitations, which is why I personally wouldn't use it to make any general chart of various audio devices.

Michal

Serge Smirnoff · Dec 8, 2019

msmucr said:
Such difference is level rough indication about similarity of input and output signal and in general of course, the less difference, the better. But in practice it's not so good indicator for overal comparisons IMO, because there can be multitude of reasons, why signals are different (like all kinds of distortions with very different characteristics, phase shift...) and basically not all those reasons has the same weight in perceptible difference.
So two DUTs can have very similar overal level of diff signal, but one of them will be more transparent sounding.. for example because distortion will have different frequency distribution and different characteristics. Or with different example.. some device would have say 30 dB lower level of diff signal than another one. At first look it can indicate, the second device is vastly inferior to first one, however there can be just minimum phase interpolation filter at its DAC, which cause big difference in diff level, but in listening test, it's not necessarily perceived as problematic.
Also there can be issues with clock drifting of source and capture devices if those aren't synced, there are certainly ways to compensate that in measurement with varying degree of efficiency, but in that case you're also altering effects of clocking at source device, which is still one of important aspects for comparisons.

Right you are, amount of diff signal can be misleading in some cases. And I have the simple criterion for discovering such cases - artifact signatures of tested devices differ too much (more than 2-3dB). In other cases Df levels correlate well to perceived quality scores - http://soundexpert.org/articles/-/b...asurements-to-predict-listening-test-results-
Real example of this approach is in the use case - http://soundexpert.org/articles/-/blogs/audio-quality-of-sbc-xq-bluetooth-audio-codec

pkane · Dec 8, 2019

msmucr said:
I admit, I haven't read all accompanying technical articles linked before. Honestly, I'm not really sure about such metrics.

Similar methods are there for quite long time, I recall for example thread at Gearslutz site, where someone collects AudioDiffmaker results from various AD/DA loops.. I personally don't find it very useful, rather misleading. This seems to be similar, just with defined set of various test signals and statistics.
I commonly use residuum analysis after subtraction for various small isolated comparisons of changed components in common chain, debugging of issues, quick checking of transparency etc. and it's indeed very useful for that.
However I was always bit skeptical about single all-encompassing figure based on signal difference level or correlation depth for generic evaluation of audio devices. Or more precisely about relation of such single figure to some perceptual differences.

Such difference is level rough indication about similarity of input and output signal and in general of course, the less difference, the better. But in practice it's not so good indicator for overal comparisons IMO, because there can be multitude of reasons, why signals are different (like all kinds of distortions with very different characteristics, phase shift...) and basically not all those reasons has the same weight in perceptible difference.
So two DUTs can have very similar overal level of diff signal, but one of them will be more transparent sounding.. for example because distortion will have different frequency distribution and different characteristics. Or with different example.. some device would have say 30 dB lower level of diff signal than another one. At first look it can indicate, the second device is vastly inferior to first one, however there can be just minimum phase interpolation filter at its DAC, which cause big difference in diff level, but in listening test, it's not necessarily perceived as problematic.
Also there can be issues with clock drifting of source and capture devices if those aren't synced, there are certainly ways to compensate that in measurement with varying degree of efficiency, but in that case you're also altering effects of clocking at source device, which is still one of important aspects for comparisons.

I'll definitely check those papers.. and please don't take my initial comment as sheer negativity towards your work or "diffing" in general. Just with my previous experiences, it had several natural limitations, which is why I personally wouldn't use it to make any general chart of various audio devices.

Michal

Michal, all valid points but AudioDiffMaker is a bit out of date. Please take a look at DeltaWave. This has the ability to adjust for variable group delay, as well as for filter frequency-related attenuation, etc. before computing a null. The results are significantly lower nulls than with DiffMaker, since various additional distortions can be taken into account during processing.

Serge Smirnoff · Dec 8, 2019

msmucr said:
Or with different example.. some device would have say 30 dB lower level of diff signal than another one. At first look it can indicate, the second device is vastly inferior to first one, however there can be just minimum phase interpolation filter at its DAC, which cause big difference in diff level, but in listening test, it's not necessarily perceived as problematic.

Precise time-warping algo completely removes all linear deformations of time axsis of the output signal, they are not accounted. So, minimum phase interpolation filter will not cause big difference in diff level.

Serge Smirnoff · Dec 8, 2019

pkane said:
The results are significantly lower nulls than with DiffMaker, since various additional distortions can be taken into account during processing.

BTW, Df levels are the lowest possible values for any two given waveforms (in digital domain). Thanks to iterative search for global minimum, which is always one and can be found with any required accuracy (currently 1e-4 dB).

pkane · Dec 8, 2019

Serge Smirnoff said:
Precise time-warping algo completely removes all linear deformations of time axsis of the output signal, they are not accounted. So, minimum phase interpolation filter will not cause big difference in diff level.

You'd think that, but not all devices have linear deformations. Here is a good example. I don't know what kind of filter caused this, but it certainly isn't correctable by simple time warping algorithm. Blue is the phase difference plot between the original file and file played back through the DUT:

Serge Smirnoff · Dec 9, 2019

pkane said:
Here is a good example. I don't know what kind of filter caused this, but it certainly isn't correctable by simple time warping algorithm. Blue is the phase difference plot between the original file and file played back through the DUT:

Yes, in most cases deformation of time scale is not linear. But with 400ms window of Df computing this is not a problem in real life because those non-linear time deformations are usually slow and within 400ms time window can be considered as linear. If such non-linearity occurs within 400ms window then it really increases Df level. Technically it is possible to find real Df level in such case by gradually decreasing time window. But 400ms is a pretty big time period for human hearing, there is high probability that such time deformation of the signal will be perceived. So, it should be registered/accounted. Further analysis of artifact signatures will show if such time distortion is important or not.

PierreV · Dec 9, 2019

Serge Smirnoff said:
Real example of this approach is in the use case - http://soundexpert.org/articles/-/blogs/audio-quality-of-sbc-xq-bluetooth-audio-codec

Hmmm, an interesting read and, indeed, a lot of analysis going on... but nice to make some data and the methodology accessible so anyone can form an opinion.

pkane · Dec 9, 2019

Serge Smirnoff said:
Yes, in most cases deformation of time scale is not linear. But with 400ms window of Df computing this is not a problem in real life because those non-linear time deformations are usually slow and within 400ms time window can be considered as linear. If such non-linearity occurs within 400ms window then it really increases Df level. Technically it is possible to find real Df level in such case by gradually decreasing time window. But 400ms is a pretty big time period for human hearing, there is high probability that such time deformation of the signal will be perceived. So, it should be registered/accounted. Further analysis of artifact signatures will show if such time distortion is important or not.

Sure. On a short interval the phase will not be nearly as big an issue. What is interesting though, in testing DeltaWave over hundreds of test files, correcting for non-linear phase differences improved the computed delta dramatically. Off the cuff, many of the recordings produced a difference closer to -80dB to -90dB RMS over the entire file, compared to only around -50 to -60dB without the non-linear phase corrections.

Again, most likely this will not so dramatic on a 400ms clip. This should be easy to test. If you don't object, I could try to add the df type of measure to DeltaWave to see how it will perform. All the necessary computations are already done by the software, except on a larger time scale.

Serge Smirnoff · Dec 9, 2019

pkane said:
What is interesting though, in testing DeltaWave over hundreds of test files, correcting for non-linear phase differences improved the computed delta dramatically. Off the cuff, many of the recordings produced a difference closer to -80dB to -90dB RMS over the entire file, compared to only around -50 to -60dB without the non-linear phase corrections.

This is how I see the time inconsistency of signal on diffrogram

The image name: iBassoDX50_sine12.5k_mono_100-78.9035-64.8288-46.2091.png
100 = time window in ms
-78.9035-64.8288-46.2091 = Min Median Max of Df levels (excluding the first and the last Df levels of the signal as they are almost always erroneus due to edge effects)

pkane said:
Again, most likely this will not so dramatic on a 400ms clip. This should be easy to test. If you don't object, I could try to add the df type of measure to DeltaWave to see how it will perform. All the necessary computations are already done by the software, except on a larger time scale.

Cool! If you need any details I'm ready. The problem, which I see is time warping algo. You definitely use a different one; this can affect resulting Df levels. Unfortunately my algo is computationally very intensive (12 hours to compute histogram for df-slide) and I have no idea how to make it more efficient. On the other hand it robustly works with any signals, does not require any adjustments for time-warping and returns the lowest possible Df level for a given two signals and time window. This is the beauty of using liner-only phase/pitch correction. So, I probably have so-called reference implementation of the required processing. Matlab code is here - http://soundexpert.org/articles/-/blogs/visualization-of-distortion#part3

pkane · Dec 9, 2019

Serge Smirnoff said:
This is how I see the time inconsistency of signal on diffrogram

View attachment 41948

The image name: iBassoDX50_sine12.5k_mono_100-78.9035-64.8288-46.2091.png
100 = time window in ms
-78.9035-64.8288-46.2091 = Min Median Max of Df levels (excluding the first and the last Df levels of the signal as they are almost always erroneus due to edge effects)

Cool! If you need any details I'm ready. The problem, which I see is time warping algo. You definitely use a different one; this can affect resulting Df levels. Unfortunately my algo is computationally very intensive (12 hours to compute histogram for df-slide) and I have no idea how to make it more efficient. On the other hand it robustly works with any signals, does not require any adjustments for time-warping and returns the lowest possible Df level for a given two signals and time window. This is the beauty of using liner-only phase/pitch correction. So, I probably have so-called reference implementation of the required processing. Matlab code is here - http://soundexpert.org/articles/-/blogs/visualization-of-distortion#part3

My implementation is in the frequency domain, and takes a minute or less for a 2-3min file on a modern PC. It'll take longer to do the histogram for multiple 400ms sections, but I suspect it will not be much more than a few minutes. Do you overlap the sections (and if so, by how much?) and is 400ms the best size or should I make this a variable setting?

Also, can you please point me to a few test files with the corresponding df results using your method? It'll make it easier for me to validate my implementation and to see if we are getting similar results.

Serge Smirnoff · Dec 9, 2019

I do not overlap the sections, but I plan to implement this. It does not affect Df values but it is important for generating correct audio files (time-warped).

Different windows are helpful for psycho-acoustic research; I implemented "any value" solution; for the purposes of testing audio 50 and 400ms are enough.

A few pairs of 30s test signals will be OK?

pkane · Dec 9, 2019

Serge Smirnoff said:
I do not overlap the sections, but I plan to implement this. It does not affect Df values but it is important for generating correct audio files (time-warped).

Different windows are helpful for psycho-acoustic research; I implemented "any value" solution; for the purposes of testing audio 50 and 400ms are enough.

A few pairs of 30s test signals will be OK?

That would be perfect!

Alternative method for measuring distortion

Grand Contributor

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Member

Active Member

Master Contributor

Active Member

Active Member

Master Contributor

Active Member

Major Contributor

Master Contributor

Active Member

Master Contributor

Active Member

Master Contributor

Similar threads