Alternative method for measuring distortion

pkane · Dec 21, 2019

pkane said:
Some initial DeltaWave results for these test files. This is using DeltaWave resampler to match rates and then to align and correct linear timing errors:

(1) se-pwn44.wav vs. se-pwn48.wav
DF Metric (step=100ms, overlap=0%): Median=-147.5dB Max=-99.6dB Min=-300dB 1% > -300.0dB 10% > -300.0dB 25% > -276.59dB 50% > -147.5dB 75% > -143.21dB 90% > -124.56dB

(2) se-pwn44.wav vs. se-pwn48flip.wav
DF Metric (step=100ms, overlap=0%): Median=-0.1dB Max=0.1dB Min=-0.3dB 1% > -0.28dB 10% > -0.19dB 25% > -0.16dB 50% > -0.1dB 75% > -0.05dB 90% > -0.01dB

(3) se-pwn44.wav vs. se-pwn48mix.wav
DF Metric (step=100ms, overlap=0%): Median=-5.4dB Max=-5dB Min=-5.7dB 1% > -5.62dB 10% > -5.52dB 25% > -5.44dB 50% > -5.36dB 75% > -5.27dB 90% > -5.2dB

I can now vary the time interval and the overlap, if desired. I see small changes due to varying these parameters, but not huge, maybe +/-1dB or so.

Here's the spectrogram of the difference of the spectra for the last comparison, (3):

And the DF metric distribution:

Serge Smirnoff · Dec 21, 2019

You've done it! Excellent results.

Serge Smirnoff · Dec 21, 2019

Looking forward I think we should continue developing both measurement tools - yours and mine. Your algo is faster and can be used by everyone for experiments and research. My matlab code being more simple, straightforward and ... slow can be used for computing diffrograms and creating beautiful df-slides )). What is really important - to ensure similarity of our measurements. For the purpose I propose to define a set of test signals with theoretically predictable df levels. I think two types of signals - Sine and White noise (band limited) at different sampling rates will cover all practical df measurements with all possible audio signals (including m-signals). The signals can be as follows:

(a) Band-limited white noise (2Hz - 22kHz) @176kHz and @192kHz
(b) Sine (2Hz, 1kHz, 22kHz) @176kHz and @192kHz

Sampling and combining these signals we can design all required test cases with extreme points (df = -Inf dB and df = 0 dB) and two anchor df levels - one close to 0 dB (-5.3329 dB with mix signal) and another one close to -100 dB (?).

I'm not sure about necessity of 2Hz - 22kHz (20Hz - 20kHz could be enough).
High anchor signal could be, for example, a mix with different proportion resulting df level between -90dB and -100dB. But a lot of other variants are possible.

pkane · Dec 21, 2019

Serge Smirnoff said:
Looking forward I think we should continue developing both measurement tools - yours and mine. Your algo is faster and can be used by everyone for experiments and research. My matlab code being more simple, straightforward and ... slow can be used for computing diffrograms and creating beautiful df-slides )). What is really important - to ensure similarity of our measurements. For the purpose I propose to define a set of test signals with theoretically predictable df levels. I think two types of signals - Sine and White noise (band limited) at different sampling rates will cover all practical df measurements with all possible audio signals (including m-signals). The signals can be as follows:

(a) Band-limited white noise (2Hz - 22kHz) @176kHz and @192kHz
(b) Sine (2Hz, 1kHz, 22kHz) @176kHz and @192kHz

Sampling and combining these signals we can design all required test cases with extreme points (df = -Inf dB and df = 0 dB) and two anchor df levels - one close to 0 dB (-5.3329 dB with mix signal) and another one close to -100 dB (?).

I'm not sure about necessity of 2Hz - 22kHz (20Hz - 20kHz could be enough).
High anchor signal could be, for example, a mix with different proportion resulting df level between -90dB and -100dB. But a lot of other variants are possible.

Fully agree. I’ve added DF metric to DeltaWave and can make it available for others to start testing. I wanted to hold off on doing so until we have a confirmation that DW algorithm produces results that are very similar to your Matlab code.

By design, DW was created to analyze complex music waveforms recorded through different equipment so as to measure how much distortion is introduced. While it’s pretty accurate it has limitations and will most likely never equal a perfect exhaustive optimization algorithm.

I actually started with a minimization algorithm but soon gave up due to how long a single comparison would run. The final result is a more pragmatic approach to matching waveforms. It may not be perfect but it is rarely far off.

My other software (DISTORT) may be useful in creating some test signals as it is capable of introducing a controlled amount of harmonic, time and noise distortions into any digital signal.

pkane · Dec 21, 2019

For anyone curious, the updated version of DeltaWave, v1.0.49 includes the DF metric as one of the measurements.

Suggested use:

1. For simple testing, disable the non-linear EQ settings. Here are the settings I used for running the last set of test Serge has provided:

For more advanced testing, especially with real music recordings, you can try to turn on the non-linear EQ settings as these may help improve the DF metric in the presence of non-linear phase effects.

2. Select tabs with measurements you want to see, turn off any you don't in settings. Here are the ones I found useful (don't forget to turn ON the DF metric):

3. Select the two files you want to compare, as you'd normally do with DeltaWave and press Match:

4. When finished, switch to DF Metric tab to see df results. The Results tab will also contain the measured DF values, including median, min/max, and various percentages:

Click on the plot settings button near top right to make changes to the window size (400ms is default) and Window overlap percent (0 is default).

5. Let me know what you find!

EDIT: One more thing. If you are comparing simple periodic waveforms, such as sine-waves, square waves, etc., select 'Measure Simple Waveforms' in settings, otherwise the cross-correlation step in DeltaWave will get confused with too many equally good matches:

Serge Smirnoff · Dec 23, 2019

I ended up choosing these test vectors for calibration accuracy of df-measurements. They all are derived from one instance of pseudo-white noise, band-limited and consisting of 200 000 sine waves of random freqs/phases:

- se2-pwn44.wav - band-limited (20Hz-20kHz) pseudo-white noise @44.1kHz sampling rate
- se2-pwn48.wav - band-limited (20Hz-20kHz) pseudo-white noise @48kHz sampling rate, which has mathematically identical waveform as se-pwn44.wav
- se2-pwn48flip.wav - the same as se-pwn48.wav but flipped left to right
- se2-pwn48mix10.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -10dB
- se2-pwn48mix20.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -20dB
..............................
- se2-pwn48mix90.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -90dB
- se2-pwn48mix100.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -100dB

They are in the test package - https://www.dropbox.com/s/fk17sdhookkv8dv/se2-testVectors.rar?dl=0

Multiple test points with decreasing df levels give better idea about accuracy of df computations. All signals are 30s long and have 32 bit depth. Below are diffrograms (100ms) of the signals. For each test signal two types of computations were made - without time warping (48 vs. 48) and with (44 vs. 48). Min, Median and Max of 300 df levels are indicated.

(0) se2-pwn48.wav vs. se2-pwn48flip.wav

se2-pwn48flip.wav(48)__se2-pwn48.wav(48)__mono_100-0.1746+0.0065+0.1698.png

se2-pwn48flip.wav(48)__se2-pwn48.wav(48)__mono_100-0.1746+0.0065+0.1698

(0w) se2-pwn44.wav vs. se2-pwn48flip.wav

se2-pwn48flip.wav(48)__se2-pwn44.wav(44)__mono_100-0.3252-0.2045-0.1314.png

se2-pwn48flip.wav(48)__se2-pwn44.wav(44)__mono_100-0.3252-0.2045-0.1314

While no-warped (0) df level is almost perfectly equal to zero the warped one (0w) is a bit lower. Finally it turned out that the main reason for this is time warping algo, which searches for global minimum in order to compute df. As independent sequences have no such minimum the algo ends up finding some pretty random combination of scale and shift parameters that give max correlation with reference signal. This is correct operation of the algo; when signals have any small correlation, the minimum exists and the algo finds it with ease. So this is degenerate case when both signals are completely uncorrelated. If such signals are known to be already perfectly time aligned then the utility (with time warping disabled) will compute correct df level with 64bit accuracy.

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav

se2-pwn48mix10.wav(48)__se2-pwn48.wav(48)__mono_100-10.3455-9.9933-9.7099.png

se2-pwn48mix10.wav(48)__se2-pwn48.wav(48)__mono_100-10.3455-9.9933-9.7099

(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav

se2-pwn48mix10.wav(48)__se2-pwn44.wav(44)__mono_100-10.3466-9.9968-9.7116.png

se2-pwn48mix10.wav(48)__se2-pwn44.wav(44)__mono_100-10.3466-9.9968-9.7116

(20) se2-pwn48.wav vs. se2-pwn48mix20.wav

se2-pwn48mix20.wav(48)__se2-pwn48.wav(48)__mono_100-20.3701-19.9974-19.6535.png

se2-pwn48mix20.wav(48)__se2-pwn48.wav(48)__mono_100-20.3701-19.9974-19.6535

(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav

se2-pwn48mix20.wav(48)__se2-pwn44.wav(44)__mono_100-20.3712-20.0009-19.6543.png

se2-pwn48mix20.wav(48)__se2-pwn44.wav(44)__mono_100-20.3712-20.0009-19.6543

(30) se2-pwn48.wav vs. se2-pwn48mix30.wav

se2-pwn48mix30.wav(48)__se2-pwn48.wav(48)__mono_100-30.3675-30.0017-29.6405.png

se2-pwn48mix30.wav(48)__se2-pwn48.wav(48)__mono_100-30.3675-30.0017-29.6405

(30w) se2-pwn44.wav vs. se2-pwn48mix30.wav

se2-pwn48mix30.wav(48)__se2-pwn44.wav(44)__mono_100-30.3685-30.0026-29.6413.png

se2-pwn48mix30.wav(48)__se2-pwn44.wav(44)__mono_100-30.3685-30.0026-29.6413

(40) se2-pwn48.wav vs. se2-pwn48mix40.wav

se2-pwn48mix40.wav(48)__se2-pwn48.wav(48)__mono_100-40.3656-40.0010-39.6374.png

se2-pwn48mix40.wav(48)__se2-pwn48.wav(48)__mono_100-40.3656-40.0010-39.6374

(40w) se2-pwn44.wav vs. se2-pwn48mix40.wav

se2-pwn48mix40.wav(48)__se2-pwn44.wav(44)__mono_100-40.3667-40.0020-39.6382.png

se2-pwn48mix40.wav(48)__se2-pwn44.wav(44)__mono_100-40.3667-40.0020-39.6382

(50) se2-pwn48.wav vs. se2-pwn48mix50.wav

se2-pwn48mix50.wav(48)__se2-pwn48.wav(48)__mono_100-50.3649-50.0008-49.6365.png

se2-pwn48mix50.wav(48)__se2-pwn48.wav(48)__mono_100-50.3649-50.0008-49.6365

(50w) se2-pwn44.wav vs. se2-pwn48mix50.wav

se2-pwn48mix50.wav(48)__se2-pwn44.wav(44)__mono_100-50.3660-50.0018-49.6374.png

se2-pwn48mix50.wav(48)__se2-pwn44.wav(44)__mono_100-50.3660-50.0018-49.6374

(60) se2-pwn48.wav vs. se2-pwn48mix60.wav

se2-pwn48mix60.wav(48)__se2-pwn48.wav(48)__mono_100-60.3647-60.0007-59.6363.png

se2-pwn48mix60.wav(48)__se2-pwn48.wav(48)__mono_100-60.3647-60.0007-59.6363

(60w) se2-pwn44.wav vs. se2-pwn48mix60.wav

se2-pwn48mix60.wav(48)__se2-pwn44.wav(44)__mono_100-60.3658-60.0019-59.6373.png

se2-pwn48mix60.wav(48)__se2-pwn44.wav(44)__mono_100-60.3658-60.0019-59.6373

(70) se2-pwn48.wav vs. se2-pwn48mix70.wav

se2-pwn48mix70.wav(48)__se2-pwn48.wav(48)__mono_100-70.3646-70.0007-69.6362.png

se2-pwn48mix70.wav(48)__se2-pwn48.wav(48)__mono_100-70.3646-70.0007-69.6362

(70w) se2-pwn44.wav vs. se2-pwn48mix70.wav

se2-pwn48mix70.wav(48)__se2-pwn44.wav(44)__mono_100-70.3658-70.0016-69.6373.png

se2-pwn48mix70.wav(48)__se2-pwn44.wav(44)__mono_100-70.3658-70.0016-69.6373

(80) se2-pwn48.wav vs. se2-pwn48mix80.wav

se2-pwn48mix80.wav(48)__se2-pwn48.wav(48)__mono_100-80.3646-80.0007-79.6362.png

se2-pwn48mix80.wav(48)__se2-pwn48.wav(48)__mono_100-80.3646-80.0007-79.6362

(80w) se2-pwn44.wav vs. se2-pwn48mix80.wav

se2-pwn48mix80.wav(48)__se2-pwn44.wav(44)__mono_100-80.3635-79.9984-79.6359.png

se2-pwn48mix80.wav(48)__se2-pwn44.wav(44)__mono_100-80.3635-79.9984-79.6359

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav

se2-pwn48mix90.wav(48)__se2-pwn48.wav(48)__mono_100-90.3646-90.0007-89.6361.png

se2-pwn48mix90.wav(48)__se2-pwn48.wav(48)__mono_100-90.3646-90.0007-89.6361

(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav

se2-pwn48mix90.wav(48)__se2-pwn44.wav(44)__mono_100-90.3347-89.9750-89.6132.png

se2-pwn48mix90.wav(48)__se2-pwn44.wav(44)__mono_100-90.3347-89.9750-89.6132

(100) se2-pwn48.wav vs. se2-pwn48mix100.wav

se2-pwn48mix100.wav(48)__se2-pwn48.wav(48)__mono_100-100.3646-100.0008-99.6363.png

se2-pwn48mix100.wav(48)__se2-pwn48.wav(48)__mono_100-100.3646-100.0008-99.6363

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav

se2-pwn48mix100.wav(48)__se2-pwn44.wav(44)__mono_100-100.0446-99.6933-99.3487.png

se2-pwn48mix100.wav(48)__se2-pwn44.wav(44)__mono_100-100.0446-99.6933-99.3487

We can see that starting from -90dB time warping algo starts to give some increasing error. At -95db the error is about 0.1db. Thus all df values below -95dB computed with this utility are higher (worse) than true values. It is safe to say that true df values in such cases are not worse than actually measured ones.

(INF) se2-pwn48.wav vs. se2-pwn48.wav

se2-pwn48.wav(48)__se2-pwn48.wav(48)__mono_100-Inf-Inf-159.5459.png

se2-pwn48.wav(48)__se2-pwn48.wav(48)__mono_100-Inf-Inf-159.5459 (-Inf values are Grey on the diffrogram)

(INFw) se2-pwn44.wav vs. se2-pwn48.wav

se2-pwn48.wav(48)__se2-pwn44.wav(44)__mono_100-111.8120-111.3480-110.8553.png

se2-pwn48.wav(48)__se2-pwn44.wav(44)__mono_100-111.8120-111.3480-110.8553

This test case with mathematically identical waveforms of white noise shows the lowest computable df level with time warping enabled (all lower true df levels will be computed as -111.3480dB).

--------------------------------------------------------------------------------------------------------------------------------------------

The test cases below are just for better understanding of df measurements (these test signals are not included in the test package).

(A) The same case as (100w) but computed with the signal @192k sample rate.

se2-pwn44.wav vs. se2-pwn192mix100.wav

se2-pwn192mix100.wav(192)__se2-pwn44.wav(44)__mono_100-100.1766-99.8061-99.4227.png

se2-pwn192mix100.wav(192)__se2-pwn44.wav(44)__mono_100-100.1766-99.8061-99.4227

Resulting df value is almost the same (with x2 computation time though )). So accuracy of df measurements with this code is the same for any sample rate of an output signal.

The next case is the same as (100w) but the main signal in the mix is replaced with Sine 20kHz of the same RMS level as removed white noise. In other words now we measure df level for the sine signal distorted with white noise of very low level.

(B) se2-sin20k48.wav vs. se2-sin20k48mix100.wav

se2-sin20k48mix100.wav(48)__se2-sin20k48.wav(48)__mono_100-100.3576-100.0022-99.6147.png

se2-sin20k48mix100.wav(48)__se2-sin20k48.wav(48)__mono_100-100.3576-100.0022-99.6147

(Bw) se2-sin20k44.wav vs. se2-sin20k48mix100.wav

se2-sin20k48mix100.wav(48)__se2-sin20k44.wav(44)__mono_100-99.9127-99.5943-99.2114.png

se2-sin20k48mix100.wav(48)__se2-sin20k44.wav(44)__mono_100-99.9127-99.5943-99.2114

Resulting df levels are almost exactly the same as in (100)/(100w) cases, which means that accuracy of df measurements doesn't depend on a waveform. And this is the very important feature of df measurements - if we can measure with some accuracy df levels for white noise of some bandwidth then we can measure with the same accuracy df levels for any waveform within that bandwidth. That is why test vectors with white noise are sufficient for testing any df-measurement procedure. All signals within 20Hz-20kHz bandwidth (t-signals and m-signals) will be measured with the same accuracy.

And some practical case. As we can measure df levels at different sample rates testing of resamplers becomes easy. Foobar2000 has internal PPHS resampler, which has two modes: Normal and Ultra. We'll test 44100 → 48000 resampling.

(Cnormal) se2-pwn44.wav vs. se2-pwn44_48p.wav

se2-pwn44_48p.wav(48)__se2-pwn44.wav(44)__mono_100-41.6251-39.6948-38.2761.png

se2-pwn44_48p.wav(48)__se2-pwn44.wav(44)__mono_100-41.6251-39.6948-38.2761

(Cultra) se2-pwn44.wav vs. se2-pwn44_48pu.wav

se2-pwn44_48pu.wav(48)__se2-pwn44.wav(44)__mono_100-111.4497-111.0006-110.4725.png

se2-pwn44_48pu.wav(48)__se2-pwn44.wav(44)__mono_100-111.4497-111.0006-110.4725

Don't forget to check the “Ultra mode” check box when using this resampler in Foobar2000 ))

Another (more complicated) practical case. Bit reduction of 32bit signal (white noise) to 16bit with and without dithering. No need for time warping here.

(Dn) No dithering

se2-pwn44_16.wav(44)__se2-pwn44.wav(44)__mono_100-90.1591-89.8082-89.3917.png

se2-pwn44_16.wav(44)__se2-pwn44.wav(44)__mono_100-90.1591-89.8082-89.3917

(Dd) With triangle dithering
[the image is in the next post]
se2-pwn44_16t.wav(44)__se2-pwn44.wav(44)__mono_100-85.4232-85.0169-84.5709

Adding of dithering during bit reduction results in lower (worse) df level and df metric may seem not working here as dithering always improve perceived audio quality. But if we compute distance between artifact signatures for this operations it will show exactly 4.78dB. As I said earlier (#55) the distance of 1.5-2.0 dB is critical for relation of df measurements to subjective scores, so perceived quality for these two operations (with/without dithering) can't be assessed by df measurements. Thus df measurements must be accompanied by research of artifact signatures - not all measured distortions are equal from listener perspective.

Thanks to this discussion I also calibrated accuracy of my matlab code (all df levels above are computed with the new 2.4 version) and after updating its manual I will put the new version on SE web site (will inform you here). The manual will contain matlab code that was used for creation of the above test vectors. The ones that require too much time to compute will be included in the package.

Serge Smirnoff · Dec 23, 2019

(Dd) With triangle dithering

se2-pwn44_16t.wav(44)__se2-pwn44.wav(44)__mono_100-85.4232-85.0169-84.5709

pkane · Dec 23, 2019

Serge Smirnoff said:
I ended up choosing these test vectors for calibration accuracy of df-measurements. They all are derived from one instance of pseudo-white noise, band-limited and consisting of 200 000 sine waves of random freqs/phases:

- se2-pwn44.wav - band-limited (20Hz-20kHz) pseudo-white noise @44.1kHz sampling rate
- se2-pwn48.wav - band-limited (20Hz-20kHz) pseudo-white noise @48kHz sampling rate, which has mathematically identical waveform as se-pwn44.wav
- se2-pwn48flip.wav - the same as se-pwn48.wav but flipped left to right
- se2-pwn48mix10.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -10dB
- se2-pwn48mix20.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -20dB
..............................
- se2-pwn48mix90.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -90dB
- se2-pwn48mix100.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -100dB

They are in the test package - https://www.dropbox.com/s/fk17sdhookkv8dv/se2-testVectors.rar?dl=0

Multiple test points with decreasing df levels give better idea about accuracy of df computations. All signals are 30s long and have 32 bit depth. Below are diffrograms (100ms) of the signals. For each test signal two types of computations were made - without time warping (48 vs. 48) and with (44 vs. 48). Min, Median and Max of 300 df levels are indicated.

(0) se2-pwn48.wav vs. se2-pwn48flip.wav
View attachment 43495
se2-pwn48flip.wav(48)__se2-pwn48.wav(48)__mono_100-0.1746+0.0065+0.1698

(0w) se2-pwn44.wav vs. se2-pwn48flip.wav
View attachment 43496
se2-pwn48flip.wav(48)__se2-pwn44.wav(44)__mono_100-0.3252-0.2045-0.1314

While no-warped (0) df level is almost perfectly equal to zero the warped one (0w) is a bit lower. Finally it turned out that the main reason for this is time warping algo, which searches for global minimum in order to compute df. As independent sequences have no such minimum the algo ends up finding some pretty random combination of scale and shift parameters that give max correlation with reference signal. This is correct operation of the algo; when signals have any small correlation, the minimum exists and the algo finds it with ease. So this is degenerate case when both signals are completely uncorrelated. If such signals are known to be already perfectly time aligned then the utility (with time warping disabled) will compute correct df level with 64bit accuracy.

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
View attachment 43497
se2-pwn48mix10.wav(48)__se2-pwn48.wav(48)__mono_100-10.3455-9.9933-9.7099

(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav
View attachment 43498
se2-pwn48mix10.wav(48)__se2-pwn44.wav(44)__mono_100-10.3466-9.9968-9.7116

(20) se2-pwn48.wav vs. se2-pwn48mix20.wav
View attachment 43499
se2-pwn48mix20.wav(48)__se2-pwn48.wav(48)__mono_100-20.3701-19.9974-19.6535

(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav
View attachment 43500
se2-pwn48mix20.wav(48)__se2-pwn44.wav(44)__mono_100-20.3712-20.0009-19.6543

(30) se2-pwn48.wav vs. se2-pwn48mix30.wav
View attachment 43501
se2-pwn48mix30.wav(48)__se2-pwn48.wav(48)__mono_100-30.3675-30.0017-29.6405

(30w) se2-pwn44.wav vs. se2-pwn48mix30.wav
View attachment 43502
se2-pwn48mix30.wav(48)__se2-pwn44.wav(44)__mono_100-30.3685-30.0026-29.6413

(40) se2-pwn48.wav vs. se2-pwn48mix40.wav
View attachment 43503
se2-pwn48mix40.wav(48)__se2-pwn48.wav(48)__mono_100-40.3656-40.0010-39.6374

(40w) se2-pwn44.wav vs. se2-pwn48mix40.wav
View attachment 43504
se2-pwn48mix40.wav(48)__se2-pwn44.wav(44)__mono_100-40.3667-40.0020-39.6382

(50) se2-pwn48.wav vs. se2-pwn48mix50.wav
View attachment 43505
se2-pwn48mix50.wav(48)__se2-pwn48.wav(48)__mono_100-50.3649-50.0008-49.6365

(50w) se2-pwn44.wav vs. se2-pwn48mix50.wav
View attachment 43506
se2-pwn48mix50.wav(48)__se2-pwn44.wav(44)__mono_100-50.3660-50.0018-49.6374

(60) se2-pwn48.wav vs. se2-pwn48mix60.wav
View attachment 43507
se2-pwn48mix60.wav(48)__se2-pwn48.wav(48)__mono_100-60.3647-60.0007-59.6363

(60w) se2-pwn44.wav vs. se2-pwn48mix60.wav
View attachment 43508
se2-pwn48mix60.wav(48)__se2-pwn44.wav(44)__mono_100-60.3658-60.0019-59.6373

(70) se2-pwn48.wav vs. se2-pwn48mix70.wav
View attachment 43509
se2-pwn48mix70.wav(48)__se2-pwn48.wav(48)__mono_100-70.3646-70.0007-69.6362

(70w) se2-pwn44.wav vs. se2-pwn48mix70.wav
View attachment 43510
se2-pwn48mix70.wav(48)__se2-pwn44.wav(44)__mono_100-70.3658-70.0016-69.6373

(80) se2-pwn48.wav vs. se2-pwn48mix80.wav
View attachment 43511
se2-pwn48mix80.wav(48)__se2-pwn48.wav(48)__mono_100-80.3646-80.0007-79.6362

(80w) se2-pwn44.wav vs. se2-pwn48mix80.wav
View attachment 43512
se2-pwn48mix80.wav(48)__se2-pwn44.wav(44)__mono_100-80.3635-79.9984-79.6359

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
View attachment 43513
se2-pwn48mix90.wav(48)__se2-pwn48.wav(48)__mono_100-90.3646-90.0007-89.6361

(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav
View attachment 43514
se2-pwn48mix90.wav(48)__se2-pwn44.wav(44)__mono_100-90.3347-89.9750-89.6132

(100) se2-pwn48.wav vs. se2-pwn48mix100.wav
View attachment 43515
se2-pwn48mix100.wav(48)__se2-pwn48.wav(48)__mono_100-100.3646-100.0008-99.6363

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
View attachment 43516
se2-pwn48mix100.wav(48)__se2-pwn44.wav(44)__mono_100-100.0446-99.6933-99.3487

We can see that starting from -90dB time warping algo starts to give some increasing error. At -95db the error is about 0.1db. Thus all df values below -95dB computed with this utility are higher (worse) than true values. It is safe to say that true df values in such cases are not worse than actually measured ones.

(INF) se2-pwn48.wav vs. se2-pwn48.wav
View attachment 43517
se2-pwn48.wav(48)__se2-pwn48.wav(48)__mono_100-Inf-Inf-159.5459 (-Inf values are Grey on the diffrogram)

(INFw) se2-pwn44.wav vs. se2-pwn48.wav
View attachment 43518
se2-pwn48.wav(48)__se2-pwn44.wav(44)__mono_100-111.8120-111.3480-110.8553

This test case with mathematically identical waveforms of white noise shows the lowest computable df level with time warping enabled (all lower true df levels will be computed as -111.3480dB).

--------------------------------------------------------------------------------------------------------------------------------------------

The test cases below are just for better understanding of df measurements (these test signals are not included in the test package).

(A) The same case as (100w) but computed with the signal @192k sample rate.

se2-pwn44.wav vs. se2-pwn192mix100.wav
View attachment 43519
se2-pwn192mix100.wav(192)__se2-pwn44.wav(44)__mono_100-100.1766-99.8061-99.4227

Resulting df value is almost the same (with x2 computation time though )). So accuracy of df measurements with this code is the same for any sample rate of an output signal.

The next case is the same as (100w) but the main signal in the mix is replaced with Sine 20kHz of the same RMS level as removed white noise. In other words now we measure df level for the sine signal distorted with white noise of very low level.

(B) se2-sin20k48.wav vs. se2-sin20k48mix100.wav
View attachment 43520
se2-sin20k48mix100.wav(48)__se2-sin20k48.wav(48)__mono_100-100.3576-100.0022-99.6147

(Bw) se2-sin20k44.wav vs. se2-sin20k48mix100.wav
View attachment 43521
se2-sin20k48mix100.wav(48)__se2-sin20k44.wav(44)__mono_100-99.9127-99.5943-99.2114

Resulting df levels are almost exactly the same as in (100)/(100w) cases, which means that accuracy of df measurements doesn't depend on a waveform. And this is the very important feature of df measurements - if we can measure with some accuracy df levels for white noise of some bandwidth then we can measure with the same accuracy df levels for any waveform within that bandwidth. That is why test vectors with white noise are sufficient for testing any df-measurement procedure. All signals within 20Hz-20kHz bandwidth (t-signals and m-signals) will be measured with the same accuracy.

And some practical case. As we can measure df levels at different sample rates testing of resamplers becomes easy. Foobar2000 has internal PPHS resampler, which has two modes: Normal and Ultra. We'll test 44100 → 48000 resampling.

(Cnormal) se2-pwn44.wav vs. se2-pwn44_48p.wav
View attachment 43522
se2-pwn44_48p.wav(48)__se2-pwn44.wav(44)__mono_100-41.6251-39.6948-38.2761

(Cultra) se2-pwn44.wav vs. se2-pwn44_48pu.wav
View attachment 43523
se2-pwn44_48pu.wav(48)__se2-pwn44.wav(44)__mono_100-111.4497-111.0006-110.4725

Don't forget to check the “Ultra mode” check box when using this resampler in Foobar2000 ))

Another (more complicated) practical case. Bit reduction of 32bit signal (white noise) to 16bit with and without dithering. No need for time warping here.

(Dn) No dithering
View attachment 43524
se2-pwn44_16.wav(44)__se2-pwn44.wav(44)__mono_100-90.1591-89.8082-89.3917

(Dd) With triangle dithering
[the image is in the next post]
se2-pwn44_16t.wav(44)__se2-pwn44.wav(44)__mono_100-85.4232-85.0169-84.5709

Adding of dithering during bit reduction results in lower (worse) df level and df metric may seem not working here as dithering always improve perceived audio quality. But if we compute distance between artifact signatures for this operations it will show exactly 4.78dB. As I said earlier (#55) the distance of 1.5-2.0 dB is critical for relation of df measurements to subjective scores, so perceived quality for these two operations (with/without dithering) can't be assessed by df measurements. Thus df measurements must be accompanied by research of artifact signatures - not all measured distortions are equal from listener perspective.

Thanks to this discussion I also calibrated accuracy of my matlab code (all df levels above are computed with the new 2.4 version) and after updating its manual I will put the new version on SE web site (will inform you here). The manual will contain matlab code that was used for creation of the above test vectors. The ones that require too much time to compute will be included in the package.

Hi Serge,

That's a lot of work you've done here! I'll try to get to test these soon. Since the DeltaWave version that supports the DF computation is now publicly available, others can run these tests, as well. And some users of DW are already starting to

Serge Smirnoff · Dec 28, 2019

Df metric makes it possible to present objective measurements of a group of DUTs in some obvious and innovative way. Actually the most informative (and sufficient in most cases) presentation of such results is similarity map - 3D similarity space showing relative positions of DUTs according to similarity of their artifact signatures. Previously measured portable players (#55) are below (colors of DUTs correspond to their df levels with m-signal):

How such picture could look in case of 50-100-200... tested DUTs? The answer is pretty obvious. The more accurately a DUT reproduces a reference signal, the less “pronounced” its artifact signature and the more similar that artifact signature to the ones of other accurate DUTs. Consequently all accurate DUTs will come together in some place of our similarity space. And such place is single by the definition. Three cases below illustrate this feature of the similarity space. 30s excerpt from “Jingle Bells” was distorted with white noise at various df levels resulting 100 simulated DUTs:

[interactive version of the central case is here - http://soundexpert.org/sim-space]

The method of adding distortions was varied slightly from case to case resulting different positioning of DUTs in the space. But in all cases there is one “cold” center/core where the most accurate DUTs are situated. Thus measuring a class of real-world DUTs (portable audio for example) we will see one cold core of most accurate/transparent devices while all others will form some 3D shape around/nearby that center. The exact shape will depend on variety/similarity of audio solutions on the portable audio market. Such interactive 3D similarity space populated with real DUTs gives the whole picture of their relative quality. The fourth dimension - color indicates accuracy of m-signal reproduction. It should be noted that best DUTs form the core not because they have best df levels but because they have similar artifact signatures. So it is a double-check. Accompanying df-slide for each device will give technical details of its audio performance. This is how modern objective audio quality control looks like with df-metric. It is reliable, simple and visual.

At the moment I don't have audio interface of the required accuracy and - what is more important - I don't have access to the audio market of the first world. So testing of meaningful number of any audio devices I can do only in cooperation with those who meet both conditions. Recent co-work with csglinux from head-fi forum proved efficiency of such cooperation. If you like the idea of df-measurements we can start them somewhere on this forum.

Merry Christmas and Happy New Year!

pkane · Dec 29, 2019

Serge Smirnoff said:
I ended up choosing these test vectors for calibration accuracy of df-measurements. They all are derived from one instance of pseudo-white noise, band-limited and consisting of 200 000 sine waves of random freqs/phases:

- se2-pwn44.wav - band-limited (20Hz-20kHz) pseudo-white noise @44.1kHz sampling rate
- se2-pwn48.wav - band-limited (20Hz-20kHz) pseudo-white noise @48kHz sampling rate, which has mathematically identical waveform as se-pwn44.wav
- se2-pwn48flip.wav - the same as se-pwn48.wav but flipped left to right
- se2-pwn48mix10.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -10dB
- se2-pwn48mix20.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -20dB
..............................
- se2-pwn48mix90.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -90dB
- se2-pwn48mix100.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -100dB

They are in the test package - https://www.dropbox.com/s/fk17sdhookkv8dv/se2-testVectors.rar?dl=0

Multiple test points with decreasing df levels give better idea about accuracy of df computations. All signals are 30s long and have 32 bit depth. Below are diffrograms (100ms) of the signals. For each test signal two types of computations were made - without time warping (48 vs. 48) and with (44 vs. 48). Min, Median and Max of 300 df levels are indicated.

(0) se2-pwn48.wav vs. se2-pwn48flip.wav
View attachment 43495
se2-pwn48flip.wav(48)__se2-pwn48.wav(48)__mono_100-0.1746+0.0065+0.1698

(0w) se2-pwn44.wav vs. se2-pwn48flip.wav
View attachment 43496
se2-pwn48flip.wav(48)__se2-pwn44.wav(44)__mono_100-0.3252-0.2045-0.1314

While no-warped (0) df level is almost perfectly equal to zero the warped one (0w) is a bit lower. Finally it turned out that the main reason for this is time warping algo, which searches for global minimum in order to compute df. As independent sequences have no such minimum the algo ends up finding some pretty random combination of scale and shift parameters that give max correlation with reference signal. This is correct operation of the algo; when signals have any small correlation, the minimum exists and the algo finds it with ease. So this is degenerate case when both signals are completely uncorrelated. If such signals are known to be already perfectly time aligned then the utility (with time warping disabled) will compute correct df level with 64bit accuracy.

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
View attachment 43497
se2-pwn48mix10.wav(48)__se2-pwn48.wav(48)__mono_100-10.3455-9.9933-9.7099

(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav
View attachment 43498
se2-pwn48mix10.wav(48)__se2-pwn44.wav(44)__mono_100-10.3466-9.9968-9.7116

(20) se2-pwn48.wav vs. se2-pwn48mix20.wav
View attachment 43499
se2-pwn48mix20.wav(48)__se2-pwn48.wav(48)__mono_100-20.3701-19.9974-19.6535

(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav
View attachment 43500
se2-pwn48mix20.wav(48)__se2-pwn44.wav(44)__mono_100-20.3712-20.0009-19.6543

(30) se2-pwn48.wav vs. se2-pwn48mix30.wav
View attachment 43501
se2-pwn48mix30.wav(48)__se2-pwn48.wav(48)__mono_100-30.3675-30.0017-29.6405

(30w) se2-pwn44.wav vs. se2-pwn48mix30.wav
View attachment 43502
se2-pwn48mix30.wav(48)__se2-pwn44.wav(44)__mono_100-30.3685-30.0026-29.6413

(40) se2-pwn48.wav vs. se2-pwn48mix40.wav
View attachment 43503
se2-pwn48mix40.wav(48)__se2-pwn48.wav(48)__mono_100-40.3656-40.0010-39.6374

(40w) se2-pwn44.wav vs. se2-pwn48mix40.wav
View attachment 43504
se2-pwn48mix40.wav(48)__se2-pwn44.wav(44)__mono_100-40.3667-40.0020-39.6382

(50) se2-pwn48.wav vs. se2-pwn48mix50.wav
View attachment 43505
se2-pwn48mix50.wav(48)__se2-pwn48.wav(48)__mono_100-50.3649-50.0008-49.6365

(50w) se2-pwn44.wav vs. se2-pwn48mix50.wav
View attachment 43506
se2-pwn48mix50.wav(48)__se2-pwn44.wav(44)__mono_100-50.3660-50.0018-49.6374

(60) se2-pwn48.wav vs. se2-pwn48mix60.wav
View attachment 43507
se2-pwn48mix60.wav(48)__se2-pwn48.wav(48)__mono_100-60.3647-60.0007-59.6363

(60w) se2-pwn44.wav vs. se2-pwn48mix60.wav
View attachment 43508
se2-pwn48mix60.wav(48)__se2-pwn44.wav(44)__mono_100-60.3658-60.0019-59.6373

(70) se2-pwn48.wav vs. se2-pwn48mix70.wav
View attachment 43509
se2-pwn48mix70.wav(48)__se2-pwn48.wav(48)__mono_100-70.3646-70.0007-69.6362

(70w) se2-pwn44.wav vs. se2-pwn48mix70.wav
View attachment 43510
se2-pwn48mix70.wav(48)__se2-pwn44.wav(44)__mono_100-70.3658-70.0016-69.6373

(80) se2-pwn48.wav vs. se2-pwn48mix80.wav
View attachment 43511
se2-pwn48mix80.wav(48)__se2-pwn48.wav(48)__mono_100-80.3646-80.0007-79.6362

(80w) se2-pwn44.wav vs. se2-pwn48mix80.wav
View attachment 43512
se2-pwn48mix80.wav(48)__se2-pwn44.wav(44)__mono_100-80.3635-79.9984-79.6359

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
View attachment 43513
se2-pwn48mix90.wav(48)__se2-pwn48.wav(48)__mono_100-90.3646-90.0007-89.6361

(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav
View attachment 43514
se2-pwn48mix90.wav(48)__se2-pwn44.wav(44)__mono_100-90.3347-89.9750-89.6132

(100) se2-pwn48.wav vs. se2-pwn48mix100.wav
View attachment 43515
se2-pwn48mix100.wav(48)__se2-pwn48.wav(48)__mono_100-100.3646-100.0008-99.6363

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
View attachment 43516
se2-pwn48mix100.wav(48)__se2-pwn44.wav(44)__mono_100-100.0446-99.6933-99.3487

We can see that starting from -90dB time warping algo starts to give some increasing error. At -95db the error is about 0.1db. Thus all df values below -95dB computed with this utility are higher (worse) than true values. It is safe to say that true df values in such cases are not worse than actually measured ones.

(INF) se2-pwn48.wav vs. se2-pwn48.wav
View attachment 43517
se2-pwn48.wav(48)__se2-pwn48.wav(48)__mono_100-Inf-Inf-159.5459 (-Inf values are Grey on the diffrogram)

(INFw) se2-pwn44.wav vs. se2-pwn48.wav
View attachment 43518
se2-pwn48.wav(48)__se2-pwn44.wav(44)__mono_100-111.8120-111.3480-110.8553

This test case with mathematically identical waveforms of white noise shows the lowest computable df level with time warping enabled (all lower true df levels will be computed as -111.3480dB).

--------------------------------------------------------------------------------------------------------------------------------------------

The test cases below are just for better understanding of df measurements (these test signals are not included in the test package).

(A) The same case as (100w) but computed with the signal @192k sample rate.

se2-pwn44.wav vs. se2-pwn192mix100.wav
View attachment 43519
se2-pwn192mix100.wav(192)__se2-pwn44.wav(44)__mono_100-100.1766-99.8061-99.4227

Resulting df value is almost the same (with x2 computation time though )). So accuracy of df measurements with this code is the same for any sample rate of an output signal.

The next case is the same as (100w) but the main signal in the mix is replaced with Sine 20kHz of the same RMS level as removed white noise. In other words now we measure df level for the sine signal distorted with white noise of very low level.

(B) se2-sin20k48.wav vs. se2-sin20k48mix100.wav
View attachment 43520
se2-sin20k48mix100.wav(48)__se2-sin20k48.wav(48)__mono_100-100.3576-100.0022-99.6147

(Bw) se2-sin20k44.wav vs. se2-sin20k48mix100.wav
View attachment 43521
se2-sin20k48mix100.wav(48)__se2-sin20k44.wav(44)__mono_100-99.9127-99.5943-99.2114

Resulting df levels are almost exactly the same as in (100)/(100w) cases, which means that accuracy of df measurements doesn't depend on a waveform. And this is the very important feature of df measurements - if we can measure with some accuracy df levels for white noise of some bandwidth then we can measure with the same accuracy df levels for any waveform within that bandwidth. That is why test vectors with white noise are sufficient for testing any df-measurement procedure. All signals within 20Hz-20kHz bandwidth (t-signals and m-signals) will be measured with the same accuracy.

And some practical case. As we can measure df levels at different sample rates testing of resamplers becomes easy. Foobar2000 has internal PPHS resampler, which has two modes: Normal and Ultra. We'll test 44100 → 48000 resampling.

(Cnormal) se2-pwn44.wav vs. se2-pwn44_48p.wav
View attachment 43522
se2-pwn44_48p.wav(48)__se2-pwn44.wav(44)__mono_100-41.6251-39.6948-38.2761

(Cultra) se2-pwn44.wav vs. se2-pwn44_48pu.wav
View attachment 43523
se2-pwn44_48pu.wav(48)__se2-pwn44.wav(44)__mono_100-111.4497-111.0006-110.4725

Don't forget to check the “Ultra mode” check box when using this resampler in Foobar2000 ))

Another (more complicated) practical case. Bit reduction of 32bit signal (white noise) to 16bit with and without dithering. No need for time warping here.

(Dn) No dithering
View attachment 43524
se2-pwn44_16.wav(44)__se2-pwn44.wav(44)__mono_100-90.1591-89.8082-89.3917

(Dd) With triangle dithering
[the image is in the next post]
se2-pwn44_16t.wav(44)__se2-pwn44.wav(44)__mono_100-85.4232-85.0169-84.5709

Adding of dithering during bit reduction results in lower (worse) df level and df metric may seem not working here as dithering always improve perceived audio quality. But if we compute distance between artifact signatures for this operations it will show exactly 4.78dB. As I said earlier (#55) the distance of 1.5-2.0 dB is critical for relation of df measurements to subjective scores, so perceived quality for these two operations (with/without dithering) can't be assessed by df measurements. Thus df measurements must be accompanied by research of artifact signatures - not all measured distortions are equal from listener perspective.

Thanks to this discussion I also calibrated accuracy of my matlab code (all df levels above are computed with the new 2.4 version) and after updating its manual I will put the new version on SE web site (will inform you here). The manual will contain matlab code that was used for creation of the above test vectors. The ones that require too much time to compute will be included in the package.

Starting to go through the test cases using DeltaWave, now that I have a few minutes free from family obligations

If I'm not missing anything, the results are really close to those produced by your matlab algorithm.

(0) se2-pwn48.wav vs. se2-pwn48flip.wav
DF Metric (step=100ms, overlap=0%): Median=0dB Max=0.2dB Min=-0.2dB

(0w) se2-pwn44.wav vs. se2-pwn48flip.wav:
DF Metric (step=100ms, overlap=0%): Median=0dB Max=0.2dB Min=-0.2dB

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB

(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB

(20) se2-pwn48.wav vs. se2-pwn48mix20.wav
DF Metric (step=100ms, overlap=0%): Median=-20dB Max=-19.6dB Min=-20.3dB

(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav
DF Metric (step=100ms, overlap=0%): Median=-20dB Max=-19.6dB Min=-20.3dB

___ skipped some tests for now (all in the interest of ... being lazy

) ___

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.4dB

(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.3dB

(100) se2-pwn48.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-99.6dB Min=-100.4dB

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-96.1dB Min=-100.3dB

pkane · Dec 29, 2019

pkane said:
Starting to go through the test cases using DeltaWave, now that I have a few minutes free from family obligations If I'm not missing anything, the results are really close to those produced by your matlab algorithm.

(0) se2-pwn48.wav vs. se2-pwn48flip.wav
DF Metric (step=100ms, overlap=0%): Median=0dB Max=0.2dB Min=-0.2dB

(0w) se2-pwn44.wav vs. se2-pwn48flip.wav:
DF Metric (step=100ms, overlap=0%): Median=0dB Max=0.2dB Min=-0.2dB

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB

(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB

(20) se2-pwn48.wav vs. se2-pwn48mix20.wav
DF Metric (step=100ms, overlap=0%): Median=-20dB Max=-19.6dB Min=-20.3dB

(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav
DF Metric (step=100ms, overlap=0%): Median=-20dB Max=-19.6dB Min=-20.3dB

___ skipped some tests for now (all in the interest of ... being lazy ) ___

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.4dB

(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.3dB

(100) se2-pwn48.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-99.6dB Min=-100.4dB

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-96.1dB Min=-100.3dB

@Serge Smirnoff :

Interesting results by modifying the step size to 400ms and changing the step overlap to 80%, the results are tighter than with 100ms step. For example:

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB
DF Metric (step=400ms, overlap=80%): Median=-10dB Max=-9.8dB Min=-10.2dB

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.4dB
DF Metric (step=400ms, overlap=80%): Median=-90dB Max=-89.8dB Min=-90.2dB

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-96.1dB Min=-100.3dB
DF Metric (step=400ms, overlap=80%): Median=-100dB Max=-99.8dB Min=-100.2dB

pkane · Dec 29, 2019

pkane said:
@Serge Smirnoff :

Interesting results by modifying the step size to 400ms and changing the step overlap to 80%, the results are tighter than with 100ms step. For example:

(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
DF Metric (step=100ms, overlap=0%): Median=-10dB Max=-9.6dB Min=-10.4dB
DF Metric (step=400ms, overlap=80%): Median=-10dB Max=-9.8dB Min=-10.2dB

(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
DF Metric (step=100ms, overlap=0%): Median=-90dB Max=-89.6dB Min=-90.4dB
DF Metric (step=400ms, overlap=80%): Median=-90dB Max=-89.8dB Min=-90.2dB

(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
DF Metric (step=100ms, overlap=0%): Median=-100dB Max=-96.1dB Min=-100.3dB
DF Metric (step=400ms, overlap=80%): Median=-100dB Max=-99.8dB Min=-100.2dB

View attachment 43979

And even a tighter spread of results when switching to 800ms/90% overlap. This may not be a better representation of the subjective audio difference, but it is a consistently better statistical result (+/-0.1dB spread between max and min values for nearly all the tests).

DF Metric (step=800ms, overlap=90%): Median=-10dB Max=-9.9dB Min=-10.1dB

DF Metric (step=800ms, overlap=90%): Median=-90dB Max=-89.9dB Min=-90.1dB

DF Metric (step=800ms, overlap=90%): Median=-100dB Max=-99.9dB Min=-100.1dB

Serge Smirnoff · Dec 30, 2019

pkane said:
If I'm not missing anything, the results are really close to those produced by your matlab algorithm.

Yes, excellent results.

pkane said:
Interesting results by modifying the step size to 400ms and changing the step overlap to 80%, the results are tighter than with 100ms step.

pkane said:
And even a tighter spread of results when switching to 800ms/90% overlap. This may not be a better representation of the subjective audio difference, but it is a consistently better statistical result (+/-0.1dB spread between max and min values for nearly all the tests).

As we measure df levels with white noise here, the width of time window predictably affects variance of results. I can't say for sure what overlapping does, but both these variables can not be optimized at the moment. Because the only criterion for the optimization in df metric is relevance to subjective estimations. I chose 400ms because it is a minimum reasonable time frame for psychoacoustic research (enough time to complete all processing in brain in reaction to sound stimulus). In case of some reliable evidences that longer windows provide better correlation of df measurements to subjective scores, the width should be increased. So at this stage of the research I prefer to stay with the most simple df measurements - using rectangular time window of 400ms.

pkane · Dec 30, 2019

Serge Smirnoff said:
Yes, excellent results.

As we measure df levels with white noise here, the width of time window predictably affects variance of results. I can't say for sure what overlapping does, but both these variables can not be optimized at the moment. Because the only criterion for the optimization in df metric is relevance to subjective estimations. I chose 400ms because it is a minimum reasonable time frame for psychoacoustic research (enough time to complete all processing in brain in reaction to sound stimulus). In case of some reliable evidences that longer windows provide better correlation of df measurements to subjective scores, the width should be increased. So at this stage of the research I prefer to stay with the most simple df measurements - using rectangular time window of 400ms.

Agreed. Noise as the signal is the reason larger number of samples cause a lower distribution of values. I did test this on a few real music recordings, and it seems that a larger window size makes the distribution wider, and so has the opposite effect.

Serge Smirnoff · Jan 4, 2020

New version of Diffrogram code (v2.4, Matlab) with calibrated accuracy is available -
http://soundexpert.org/articles/-/blogs/visualization-of-distortion#part3

Also the one-page introduction to df-metric was revised and amended (Jan 2020) - http://soundexpert.org/documents/10179/11017/se-audio-metric.pdf

Serge Smirnoff · Feb 6, 2020

Sorry for shouting, I need to be heard.

The solution for Audio Quality Problem (what is it and how to control it) is not in the field of distortion research (developing some audio parameter or a set of them). The solution is in the field of circuitry/algo design. Current level of audio technology already allows to produce DUTs with whatever small null level we like. We don't need to research distortion furthermore. Instead we need to motivate manufacturers to produce consumer audio electronics with such small distortion levels that the structure of distortion does not matter. Such low distortion level can be controlled with a single parameter then. Near the audio singularity level it suffices to have only one quantitative parameter, qualitative assessment of distortion is not required. The research of audibility of distortion then will be needed only in the areas where transfer of original waveform with required accuracy is not possible due to some constrains (mob. telephony ...). In consumer audio such constrain is absent (best audio devices show excellent df values already). The paradigm of “inevitable inaccuracy” in audio should be dropped for good. “Lossless" analog audio can be a new reality.

Because of the above advance in technology the audio market will be commoditized (sooner or later). We, consumers, should take some steps to ensure that the commoditized audio products meet our requirements. Our recommendations will be strict and well grounded by clear concept and reliable measurement procedure.

The current approach, suggesting elaboration of some set of objective audio parameters that will allow to produce audio equipment with some moderate but inaudible degradation of an audio signal is ineffective, leads to endless discussion about goodness or badness of various types of distortion and hardly results in any consensus as such discussion involves subjective tastes. This approach creates those muddy waters - a comfortable environment for the audio industry, allowing the manufacturers to produce audio devices of mediocre quality but convincing consumers that their “distortions” are best/inaudible/pleasant. This semantic level of audio information should not be touched at all, audio quality requirements can be safely defined on syntactic level, on the level of signal.

The final goal is to control the audio market that is now driven by manufacturers, which use asymmetry of information on the market and absence of reliable AQ metric for profiting. By means of creative marketing they have learned to control both supply and demand curves, which determine the price. Audio consumers, who do not want to be fooled by marketers, should return the “demand curve” under their control. Purchasing decisions should be rational (not emotional) and cooperative. Self-organization is the key for achieving this goal. Internet offers many opportunities for such self-organization, we just do not use them and instead - fighting with each other about tiny aspects of audio reproduction, lament over reluctance of manufacturers to follow recommendations of audio science (and the same time searching for excuses why it is not possible to achieve), complain about stupidity of audiophile community, etc. What an epic waste of time and intellectual resources.

rajapruk · Feb 6, 2020

I do not have the competence to understand your work in detail. But it seems profoundly important, to me.
I wish you the best of luck with this.

pkane · Feb 7, 2020

Serge Smirnoff said:
Sorry for shouting, I need to be heard.

The solution for Audio Quality Problem (what is it and how to control it) is not in the field of distortion research (developing some audio parameter or a set of them). The solution is in the field of circuitry/algo design. Current level of audio technology already allows to produce DUTs with whatever small null level we like. We don't need to research distortion furthermore. Instead we need to motivate manufacturers to produce consumer audio electronics with such small distortion levels that the structure of distortion does not matter. Such low distortion level can be controlled with a single parameter then. Near the audio singularity level it suffices to have only one quantitative parameter, qualitative assessment of distortion is not required. The research of audibility of distortion then will be needed only in the areas where transfer of original waveform with required accuracy is not possible due to some constrains (mob. telephony ...). In consumer audio such constrain is absent (best audio devices show excellent df values already). The paradigm of “inevitable inaccuracy” in audio should be dropped for good. “Lossless" analog audio can be a new reality.

Because of the above advance in technology the audio market will be commoditized (sooner or later). We, consumers, should take some steps to ensure that the commoditized audio products meet our requirements. Our recommendations will be strict and well grounded by clear concept and reliable measurement procedure.

The current approach, suggesting elaboration of some set of objective audio parameters that will allow to produce audio equipment with some moderate but inaudible degradation of an audio signal is ineffective, leads to endless discussion about goodness or badness of various types of distortion and hardly results in any consensus as such discussion involves subjective tastes. This approach creates those muddy waters - a comfortable environment for the audio industry, allowing the manufacturers to produce audio devices of mediocre quality but convincing consumers that their “distortions” are best/inaudible/pleasant. This semantic level of audio information should not be touched at all, audio quality requirements can be safely defined on syntactic level, on the level of signal.

The final goal is to control the audio market that is now driven by manufacturers, which use asymmetry of information on the market and absence of reliable AQ metric for profiting. By means of creative marketing they have learned to control both supply and demand curves, which determine the price. Audio consumers, who do not want to be fooled by marketers, should return the “demand curve” under their control. Purchasing decisions should be rational (not emotional) and cooperative. Self-organization is the key for achieving this goal. Internet offers many opportunities for such self-organization, we just do not use them and instead - fighting with each other about tiny aspects of audio reproduction, lament over reluctance of manufacturers to follow recommendations of audio science (and the same time searching for excuses why it is not possible to achieve), complain about stupidity of audiophile community, etc. What an epic waste of time and intellectual resources.

Serge, I agree with you that df metric can help measure the engineering accuracy of a DUT. How closely (mathematically speaking) the DUT reproduces the desired waveform. This is not very different from the simple RMS error metric DeltaWave and other apps compute, in fact, if these are computed on short intervals of 400ms, and a median taken, the result will be almost exactly the same.

The limitation of such a metric is that the distance between any two measurements is not tied to the perception of such a difference. You stated a df audibility threshold number of -50dB as I recall. The problem is, df doesn't tell me if a distortion with a df metric of -35dB is just as inaudible, or much more audible, or if I can even tell the difference between -35dB and -45dB. As far as I can tell, DF doesn't provide any guidance on relative audibility of the error/distortion it measures because it's not tied to audio perception. I think this is why j_j and others were objecting to your definition.

solderdude · Feb 7, 2020

Serge Smirnoff said:
We don't need to research distortion furthermore. Instead we need to motivate manufacturers to produce consumer audio electronics with such small distortion levels that the structure of distortion does not matter.

You will need to shout a LOT louder if you want to convince say [insert many 'audiophile' amp brands] and even then they will ignore you because they too believe they have found the grail to perfect sound in the form of a certain harmonic profile they and their customers desire or think they desire.

In the meantime it is important that stuff is still measured with standard metrics and you expand yours.
Would be great if there were a link with subjectively perceived sound quality.

Serge Smirnoff · Feb 7, 2020

rajapruk said:
I do not have the competence to understand your work in detail. But it seems profoundly important, to me.
I wish you the best of luck with this.

Thank you, @rajapruk, for your support. I need it now as never before.

Alternative method for measuring distortion

Master Contributor

Active Member

Active Member

Master Contributor

Master Contributor

Active Member

Active Member

Master Contributor

Active Member

Master Contributor

Master Contributor

Master Contributor

Active Member

Master Contributor

Active Member

Active Member

Senior Member

Master Contributor

Grand Contributor

Active Member

Similar threads