I've created a synthetic "solo piano" recording using a physically-modelled process, to which Lexicon reverb is added (about 50% mix.) Only a few notes are played.
https://we.tl/t-bUw4vBzBnE
(Download expires after 1 week.)
The original file is "PIANO.WAV" (32-bit float/44.1kHz.)
The processed files are either dithered, rounded, truncated toward negative, or truncated toward zero.
A MATLAB script to do this, kindly provided by JJ, is included ("howtoscrewupsomethingnice.m")
Here is the results of df-measurements for piano samples. The samples contain, besides the piano sounds, substantial amount of digital silence (more than half of the entire sample). As the pauses are integral part of the sample (they also should be listened and measured) they can mess the results and the latter might be hard to interpret. So, I additionally performed df-measurements for this sample with removed pauses.
(1) The full piano sample. In this case more than half of df levels computed for the signal (400ms window) refer to the silence part and median is not good estimator for overall distortion of the signal. So, I used mean value instead (in the end of the file names in brackets).
dsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-14.1298-0.9066-0.0413(
-3.7151)
dsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-25.9008-3.8944-0.0437(
-8.5301)
rsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-20.5367-4.7371-0.0029(
-6.0758)
rsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-32.5971-8.8625-0.0252(
-11.2105)
tdsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-18.5240-8.2642-2.3131(
-8.4641)
tdsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-30.2572-8.8987-2.4562(
-13.2600)
tzsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-15.9514-1.0166-0.0021(
-3.8553)
tzsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-28.9211-5.9214-0.0092(
-9.6782)
The similarity of their artifact signatures:
The shortest distance (between art.signatures of tdsig6 and dsig6) is 1.35dB, which means that almost all conclusions about audible closeness to the original will be on the edge of df-metric possibilities. Here we have two different groups of similar distortions (distance < 2dB): dsig6/tdsig6/tzsig6 and dsig4/tdsig4/tzsig4. The similarity holds within each group and not between groups. Within each group the samples can be sorted according to their mean df levels:
group1_1_tdsig6 (-13.26dB)
group1_2_tzsig6 (-9.68dB)
group1_3_dsig6 (-8.53db)
group2_1_tdsig4 (-8.46dB)
group2_2_tzsig4 (-3.86db)
group2_3_dsig4 (-3.72dB)
For convenient listening here you can download the above samples named according to this order ("1" is closest) -
https://www.dropbox.com/s/ucgc25gi0shda3k/xr100_8_400.zip?dl=0
It worth to remind that while listening one should
- assess audible
closeness to the original (not pleasantness)
- pay attention to the whole sample including pauses.
Taking into account the composite character of the sample and extreme similarity of the art.signatures the correlation of predicted closeness to audible one is not bad. To my taste I would only swap the first and the second samples in group1.
(2) The piano sample without pauses. The pauses were removed similarly in all samples and df-measurements were performed from scratch. In this case medians are valid estimators but for methodological consistency I will use means (in this case they are close to medians).
dsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.2553-6.9809-1.0308(
-7.4052)
dsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-27.0974-17.6638-4.7017(
-17.1097)
rsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-20.1883-10.7021-0.8304(
-10.6211)
rsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.8961-22.5251-10.3141(
-22.1555)
tdsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-19.8591-11.1020-3.3886(
-11.3859)
tdsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.7600-22.3796-5.9886(
-21.2367)
tzsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.9694-4.3747-0.0013(
-6.0109)
tzsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-28.4143-17.9826-4.8017(
-17.8143)
The similarity of the artifact signatures:
Now we have shorter distances between art.signatures and following groups of similar distortions (distance < 2dB):
group1_1_rsig6_nopause (-22.16dB)
group1_2_tdsig6_nopause (-21.24dB)
group1_3_tzsig6_nopause (-17.81dB)
group1_4_dsig6_nopause (-17.11db)
group1_5_rsig4_nopause (-10.62db)
group2_1_tdsig4 (-11.39dB)
group2_2_dsig4 (-7.41db)
Files for listening named according to this order are here -
https://www.dropbox.com/s/az5eugggb4kag18/xr100_8_400_nopause.zip?dl=0
To my taste the correlation is better in this case, but still, I would swap “2” and “3” in group1.
To be honest these examples of distortion are exaggerated and involve dithering which is aimed at replacing one type of distortion by another more pleasant for human hearing (a kind of psychoacoustic treatment, a simple one). Art.signature mechanism, which is responsible for accounting psychoacoustic features of distortion in df-metric was never intended for such extreme cases. And I'm a bit surprised it is able to produce some meaningful results/predictions even in the case. Does any other objective audio metric exist that is capable of producing better results in this case? If yes, can anybody apply it to this case in order to compare the results?