I ended up choosing these test vectors for calibration accuracy of df-measurements. They all are derived from one instance of pseudo-white noise, band-limited and consisting of 200 000 sine waves of random freqs/phases:
- se2-pwn44.wav - band-limited (20Hz-20kHz) pseudo-white noise @44.1kHz sampling rate
- se2-pwn48.wav - band-limited (20Hz-20kHz) pseudo-white noise @48kHz sampling rate, which has mathematically identical waveform as se-pwn44.wav
- se2-pwn48flip.wav - the same as se-pwn48.wav but flipped left to right
- se2-pwn48mix10.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -10dB
- se2-pwn48mix20.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -20dB
..............................
- se2-pwn48mix90.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -90dB
- se2-pwn48mix100.wav - mix of se-pwn48.wav and a portion of se-pwn48flip.wav, resulting df level to be -100dB
They are in the test package -
https://www.dropbox.com/s/fk17sdhookkv8dv/se2-testVectors.rar?dl=0
Multiple test points with decreasing df levels give better idea about accuracy of df computations. All signals are 30s long and have 32 bit depth. Below are diffrograms (100ms) of the signals. For each test signal two types of computations were made - without time warping (48 vs. 48) and with (44 vs. 48). Min,
Median and Max of 300 df levels are indicated.
(0) se2-pwn48.wav vs. se2-pwn48flip.wav
View attachment 43495
se2-pwn48flip.wav(48)__se2-pwn48.wav(48)__mono_100-0.1746
+0.0065+0.1698
(0w) se2-pwn44.wav vs. se2-pwn48flip.wav
View attachment 43496
se2-pwn48flip.wav(48)__se2-pwn44.wav(44)__mono_100-0.3252
-0.2045-0.1314
While no-warped (0) df level is almost perfectly equal to zero the warped one (0w) is a bit lower. Finally it turned out that the main reason for this is time warping algo, which searches for global minimum in order to compute df. As independent sequences have no such minimum the algo ends up finding some pretty random combination of scale and shift parameters that give max correlation with reference signal. This is correct operation of the algo; when signals have any small correlation, the minimum exists and the algo finds it with ease. So this is degenerate case when both signals are completely uncorrelated. If such signals are known to be already perfectly time aligned then the utility (with time warping disabled) will compute correct df level with 64bit accuracy.
(10) se2-pwn48.wav vs. se2-pwn48mix10.wav
View attachment 43497
se2-pwn48mix10.wav(48)__se2-pwn48.wav(48)__mono_100-10.3455
-9.9933-9.7099
(10w) se2-pwn44.wav vs. se2-pwn48mix10.wav
View attachment 43498
se2-pwn48mix10.wav(48)__se2-pwn44.wav(44)__mono_100-10.3466
-9.9968-9.7116
(20) se2-pwn48.wav vs. se2-pwn48mix20.wav
View attachment 43499
se2-pwn48mix20.wav(48)__se2-pwn48.wav(48)__mono_100-20.3701
-19.9974-19.6535
(20w) se2-pwn44.wav vs. se2-pwn48mix20.wav
View attachment 43500
se2-pwn48mix20.wav(48)__se2-pwn44.wav(44)__mono_100-20.3712
-20.0009-19.6543
(30) se2-pwn48.wav vs. se2-pwn48mix30.wav
View attachment 43501
se2-pwn48mix30.wav(48)__se2-pwn48.wav(48)__mono_100-30.3675
-30.0017-29.6405
(30w) se2-pwn44.wav vs. se2-pwn48mix30.wav
View attachment 43502
se2-pwn48mix30.wav(48)__se2-pwn44.wav(44)__mono_100-30.3685
-30.0026-29.6413
(40) se2-pwn48.wav vs. se2-pwn48mix40.wav
View attachment 43503
se2-pwn48mix40.wav(48)__se2-pwn48.wav(48)__mono_100-40.3656
-40.0010-39.6374
(40w) se2-pwn44.wav vs. se2-pwn48mix40.wav
View attachment 43504
se2-pwn48mix40.wav(48)__se2-pwn44.wav(44)__mono_100-40.3667
-40.0020-39.6382
(50) se2-pwn48.wav vs. se2-pwn48mix50.wav
View attachment 43505
se2-pwn48mix50.wav(48)__se2-pwn48.wav(48)__mono_100-50.3649
-50.0008-49.6365
(50w) se2-pwn44.wav vs. se2-pwn48mix50.wav
View attachment 43506
se2-pwn48mix50.wav(48)__se2-pwn44.wav(44)__mono_100-50.3660
-50.0018-49.6374
(60) se2-pwn48.wav vs. se2-pwn48mix60.wav
View attachment 43507
se2-pwn48mix60.wav(48)__se2-pwn48.wav(48)__mono_100-60.3647
-60.0007-59.6363
(60w) se2-pwn44.wav vs. se2-pwn48mix60.wav
View attachment 43508
se2-pwn48mix60.wav(48)__se2-pwn44.wav(44)__mono_100-60.3658
-60.0019-59.6373
(70) se2-pwn48.wav vs. se2-pwn48mix70.wav
View attachment 43509
se2-pwn48mix70.wav(48)__se2-pwn48.wav(48)__mono_100-70.3646
-70.0007-69.6362
(70w) se2-pwn44.wav vs. se2-pwn48mix70.wav
View attachment 43510
se2-pwn48mix70.wav(48)__se2-pwn44.wav(44)__mono_100-70.3658
-70.0016-69.6373
(80) se2-pwn48.wav vs. se2-pwn48mix80.wav
View attachment 43511
se2-pwn48mix80.wav(48)__se2-pwn48.wav(48)__mono_100-80.3646
-80.0007-79.6362
(80w) se2-pwn44.wav vs. se2-pwn48mix80.wav
View attachment 43512
se2-pwn48mix80.wav(48)__se2-pwn44.wav(44)__mono_100-80.3635
-79.9984-79.6359
(90) se2-pwn48.wav vs. se2-pwn48mix90.wav
View attachment 43513
se2-pwn48mix90.wav(48)__se2-pwn48.wav(48)__mono_100-90.3646
-90.0007-89.6361
(90w) se2-pwn44.wav vs. se2-pwn48mix90.wav
View attachment 43514
se2-pwn48mix90.wav(48)__se2-pwn44.wav(44)__mono_100-90.3347
-89.9750-89.6132
(100) se2-pwn48.wav vs. se2-pwn48mix100.wav
View attachment 43515
se2-pwn48mix100.wav(48)__se2-pwn48.wav(48)__mono_100-100.3646
-100.0008-99.6363
(100w) se2-pwn44.wav vs. se2-pwn48mix100.wav
View attachment 43516
se2-pwn48mix100.wav(48)__se2-pwn44.wav(44)__mono_100-100.0446
-99.6933-99.3487
We can see that starting from -90dB time warping algo starts to give some increasing error. At -95db the error is about 0.1db. Thus all df values below -95dB computed with this utility are higher (worse) than true values. It is safe to say that
true df values in such cases are not worse than actually measured ones.
(INF) se2-pwn48.wav vs. se2-pwn48.wav
View attachment 43517
se2-pwn48.wav(48)__se2-pwn48.wav(48)__mono_100-Inf
-Inf-159.5459 (-Inf values are Grey on the diffrogram)
(INFw) se2-pwn44.wav vs. se2-pwn48.wav
View attachment 43518
se2-pwn48.wav(48)__se2-pwn44.wav(44)__mono_100-111.8120
-111.3480-110.8553
This test case with mathematically identical waveforms of white noise shows the lowest computable df level with time warping enabled (all lower true df levels will be computed as -111.3480dB).
--------------------------------------------------------------------------------------------------------------------------------------------
The test cases below are just for better understanding of df measurements (these test signals are not included in the test package).
(A) The same case as (100w) but computed with the signal @192k sample rate.
se2-pwn44.wav vs. se2-pwn192mix100.wav
View attachment 43519
se2-pwn192mix100.wav(192)__se2-pwn44.wav(44)__mono_100-100.1766
-99.8061-99.4227
Resulting df value is almost the same (with x2 computation time though )). So
accuracy of df measurements with this code is the same for any sample rate of an output signal.
The next case is the same as (100w) but the main signal in the mix is replaced with Sine 20kHz of the same RMS level as removed white noise. In other words now we measure df level for the sine signal distorted with white noise of very low level.
(B) se2-sin20k48.wav vs. se2-sin20k48mix100.wav
View attachment 43520
se2-sin20k48mix100.wav(48)__se2-sin20k48.wav(48)__mono_100-100.3576
-100.0022-99.6147
(Bw) se2-sin20k44.wav vs. se2-sin20k48mix100.wav
View attachment 43521
se2-sin20k48mix100.wav(48)__se2-sin20k44.wav(44)__mono_100-99.9127
-99.5943-99.2114
Resulting df levels are almost exactly the same as in (100)/(100w) cases, which means that accuracy of df measurements doesn't depend on a waveform. And this is the very important feature of df measurements -
if we can measure with some accuracy df levels for white noise of some bandwidth then we can measure with the same accuracy df levels for any waveform within that bandwidth. That is why test vectors with white noise are sufficient for testing any df-measurement procedure. All signals within 20Hz-20kHz bandwidth (t-signals and m-signals) will be measured with the same accuracy.
And some practical case. As we can measure df levels at different sample rates testing of resamplers becomes easy. Foobar2000 has internal PPHS resampler, which has two modes: Normal and Ultra. We'll test 44100 → 48000 resampling.
(Cnormal) se2-pwn44.wav vs. se2-pwn44_48p.wav
View attachment 43522
se2-pwn44_48p.wav(48)__se2-pwn44.wav(44)__mono_100-41.6251
-39.6948-38.2761
(Cultra) se2-pwn44.wav vs. se2-pwn44_48pu.wav
View attachment 43523
se2-pwn44_48pu.wav(48)__se2-pwn44.wav(44)__mono_100-111.4497
-111.0006-110.4725
Don't forget to check the “Ultra mode” check box when using this resampler in Foobar2000 ))
Another (more complicated) practical case. Bit reduction of 32bit signal (white noise) to 16bit with and without dithering. No need for time warping here.
(Dn) No dithering
View attachment 43524
se2-pwn44_16.wav(44)__se2-pwn44.wav(44)__mono_100-90.1591
-89.8082-89.3917
(Dd) With triangle dithering
[the image is in the next post]
se2-pwn44_16t.wav(44)__se2-pwn44.wav(44)__mono_100-85.4232
-85.0169-84.5709
Adding of dithering during bit reduction results in lower (worse) df level and df metric may seem not working here as dithering always improve perceived audio quality. But if we compute distance between artifact signatures for this operations it will show exactly 4.78dB. As I said earlier (#55) the distance of 1.5-2.0 dB is critical for relation of df measurements to subjective scores, so perceived quality for these two operations (with/without dithering) can't be assessed by df measurements. Thus
df measurements must be accompanied by research of artifact signatures - not all measured distortions are equal from listener perspective.
Thanks to this discussion I also calibrated accuracy of my matlab code (all df levels above are computed with the new 2.4 version) and after updating its manual I will put the new version on SE web site (will inform you here). The manual will contain matlab code that was used for creation of the above test vectors. The ones that require too much time to compute will be included in the package.