Audibility thresholds of amp and DAC measurements

Serge Smirnoff · Jan 25, 2020

xr100 said:
The processed files are either dithered, rounded, truncated toward negative, or truncated toward zero.

Do you want them to be examined with df-metric?

xr100 · Jan 25, 2020

Serge Smirnoff said:
Yes, from a listener perspective both files are degraded/distorted versions of the original. One can be preferred to another depending on pesonal audio taste.

Have you listened to the "piano" example posted?

Serge Smirnoff said:
And your music sample is exactly from that era ))

Yes, it is.

The equipment that was used in the producer's own recording studios is well documented.

(There is actually a Sound on Sound article about the recording, as a retrospective piece from a few years ago, but it is littered with so many errors that I don't want to "endorse" it with a link.)

These included a Fairlight sampler (the strings on that record, for example), Calrec Soundfield Microphone (not used as a "soundfield" mic, but on all vocals as it was the best microphone they could find,) Lexicon 480L reverb, an SSL 4000E desk; they were early adopters of 24 track digital (Sony DASH) multitrack, and subsequently 48 track digital multitrack. Obviously an "audiophile" (for want of a better word--"no compromise"?) approach would not have been taken but, accepting that stylistically it could be considered "of its era," the combination of (then) high-end equipment and the engineering effort that went into making the record meant that it sounded good then, and it still sounds good today.

I dare say some digital "nasties" are lurking within; for example, the bass instrument on that record was definitely from an FM synth, and even today FM synthesis is a problem because (when modulating one operator with another) of the generation of harmonics/overtones to infinity (aliasing.) Particularly the original Yamaha DX7 (FM synth from 1983), besides all of the DSP compromises needed then and the 12-bit DAC used, also had a very poor analogue section, with a very low S/N ratio (probably something like AM radio!) Fortunately, the worst of these (both digital and analogue) issues would (hopefully) be eliminated through "EQ" (inc. LPF), masking (inc. by other parts), and noise gating.

(The rest is suspect, too, ranging from the old ADC/DACs used to the DSP (presumably recirculating all-pass filter structures etc.) in the Lexicon...)

In any case, it's a pretty clean and bright recording, and it was supposed to be, so the intended aesthetic is emphatically ruined by "bit-crushing"--and the last thing you'd want to do is add more digital "nasties" "cascaded" with those bound to be lurking on the recording.

(I suspect that when anyone complains that this kind of material sounds bad to them, they actually mean it sounds too clean, bright, lush (chorusing/modulation FX), etc. "Grunge" it is not!)

xr100 · Jan 25, 2020

Serge Smirnoff said:
Do you want them to be examined with df-metric?

Yes, please.

Serge Smirnoff · Jan 25, 2020

xr100 said:
BTW, it is not easy to "read" the small monochromatic images that you have been uploading. It might be better to upload larger images?

The diffrograms are the output of Matlab code. They show degradation of a signal with time. One horizontal pixel represents 400ms of a signal and its color corresponds to df level for that 400ms portion of the signal. You can enlarge them in the nearest-neighbor mode to preserve the pixel structure of the image.

j_j · Jan 25, 2020

xr100 said:
How is their "similarity" defined/known?

By abandoning least mean squares techniques and using knowledge of perception.

Which is the point I've been making for some time now.

Serge Smirnoff · Jan 25, 2020

j_j said:
By abandoning least mean squares techniques and using knowledge of perception.

I also account perception but I do it in a more simple way with the help of art.signatures; and MAE technique (a version of least mean squares technique) for their comparison is the best among all that I've tried. Indeed, this part of df-metric is less elaborated but only because I believe that the need for research of audibility of distortion is already not very actual today and will be even less actual in the future.

j_j · Jan 25, 2020

Serge Smirnoff said:
I also account perception but I do it in a more simple way with the help of art.signatures; and MAE technique (a version of least mean squares technique) for their comparison is the best among all that I've tried. Indeed, this part of df-metric is less elaborated but only because I believe that the need for research of audibility of distortion is already not very actual today and will be even less actual in the future.

Well, until you compare original signal to error signal, you're still not going anywhere useful. You should start by looking into masking thresholds, and the meaning of loudness at elevated thresholds.

xr100 · Jan 25, 2020

Some more plots courtesy of "DeltaWave"...

The same source file as previous was used*, and in iZotope RX, was scaled down by -1dB, and then upsampled 4x. (=176.4kHz.)

(EDIT: *Meaning the "Rickroll'd" file, NOT the "Piano" file.)

It was then truncated to 4 bits (using Reaper's "Bit Reduction/Dither with Noise Shaping" plug-in/script), and, separately, iZotope Ozone's Dither process was used (Dither setting: Strong, Noise shaping: Max.) Since iZotope Ozone's Dither process supports only a minimum bit-depth of 8 bits, this setting was used; however, the signal was bit-shifted down by 4 bits before Ozone, and then bit-shifted up by 4 bits following it.

Spectrum of the upsampled file:

Delta spectrogram for the dithered/noise-shaped file:

For the truncated file:

A marked difference in character between the two "in-band" delta spectra can be seen.

Loading the files into iZotope RX, the difference in character of the two processed signals above >~20kHz (on its spectrogram) is quite obvious, too. (And, ballpark, typically somewhere around 30dB higher in amplitude.)

The "DF Metric" for each as calculated by DeltaWave? -18.7dB for the truncated, -2.8dB for the dithered/noise shaped. Hmm...

pkane · Jan 26, 2020

xr100 said:
Some more plots courtesy of "DeltaWave"...

The same source file as previous was used, and in iZotope RX, was scaled down by -1dB, and then upsampled 4x. (=176.4kHz.)

It was then truncated to 4 bits (using Reaper's "Bit Reduction/Dither with Noise Shaping" plug-in/script), and, separately, iZotope Ozone's Dither process was used (Dither setting: Strong, Noise shaping: Max.) Since iZotope Ozone's Dither process supports only a minimum bit-depth of 8 bits, this setting was used; however, the signal was bit-shifted down by 4 bits before Ozone, and then bit-shifted up by 4 bits following it.

Spectrum of the upsampled file:

View attachment 47409

Delta spectrogram for the dithered/noise-shaped file:

View attachment 47410

For the truncated file:

View attachment 47411

A marked difference in character between the two "in-band" delta spectra can be seen.

Loading the files into iZotope RX, the difference in character of the two processed signals above >~20kHz (on its spectrogram) is quite obvious, too.

The "DF Metric" for each as calculated by DeltaWave? -18.7dB for the truncated, -2.8dB for the dithered/noise shaped. Hmm...

Which of the files is the truncated and which dithered/noise shaped?

Look at the Original or Matched waveform to see what the difference looks like in the time domain. Here's an example: (blue is the original PIANO.wav, white is dsig6.wav:

j_j · Jan 26, 2020

pkane said:
Which of the files is the truncated and which dithered/noise shaped?

Look at the Original or Matched waveform to see what the difference looks like in the time domain. Here's an example: (blue is the original PIANO.wav, white is dsig6.wav:

View attachment 47428

So you know, the files ending in 6 are 6 bit files.

pkane · Jan 26, 2020

j_j said:
So you know, the files ending in 6 are 6 bit files.

And dsig* is dithered and noise shaped, or just dithered? and rsig* is what? I'm sure I'm just being slow

j_j · Jan 26, 2020

pkane said:
And dsig* is dithered and noise shaped, or just dithered? and rsig* is what? I'm sure I'm just being slow

dsig is TPD, but no shaping.
rsig is rounded, no dither
tz is "truncate to zero"
and td is truncate down (toward negative)

xr100 · Jan 26, 2020

pkane said:
xr100 said:

Spectrum of the upsampled file:

View attachment 47409

Delta spectrogram for the dithered/noise-shaped file:

View attachment 47410

For the truncated file:

View attachment 47411

Click to expand...

Which of the files is the truncated and which dithered/noise shaped?

As "labelled."

I think I've managed to cause some confusion here. To be clear, this was the earlier "Rickroll'd" file, upsampled 4x ("original"), then truncated to 4 bits for one file, and dither/noise shaped to 4 bits for the other.

The upsampled file spectrum shows it's "brickwalled" at ~20kHz, and it appears that all of the added "out of band" (>20kHz) energy has been included in the "DF Metric" calculation.

IOW, more sophisticated perceptual aspects need not be considered before saying "Houston, we have a problem..."

pkane said:
Look at the Original or Matched waveform to see what the difference looks like in the time domain.

Thanks, will try that.

pkane · Jan 26, 2020

xr100 said:
As "labelled."

I think I've managed to cause some confusion here. To be clear, this was the earlier "Rickroll'd" file, upsampled 4x ("original"), then truncated to 4 bits for one file, and dither/noise shaped to 4 bits for the other.

The upsampled file spectrum shows it's "brickwalled" at ~20kHz, and it appears that all of the added "out of band" (>20kHz) energy has been included in the "DF Metric" calculation.

IOW, more sophisticated perceptual aspects need not be considered before saying "Houston, we have a problem..."

Thanks, will try that.

I agree. In so far that there is significant out of band energy included in the signal, DF calculation may be overestimating the real ‘audible’ difference. I have a slightly modified version in DeltaWave that applies an audibility curve to the measurement. This has the benefit of cutting out inaudible frequencies while giving more weight to those where the ear is more sensitive. I’m not sure if the result is really better. I’ll play with different ways to do this computation (I have some other ideas to try), but for now it doesn’t apply to the DF metric, only to the DeltaWave delta metric.

Serge Smirnoff · Jan 29, 2020

xr100 said:
I've created a synthetic "solo piano" recording using a physically-modelled process, to which Lexicon reverb is added (about 50% mix.) Only a few notes are played.

https://we.tl/t-bUw4vBzBnE

(Download expires after 1 week.)

The original file is "PIANO.WAV" (32-bit float/44.1kHz.)

The processed files are either dithered, rounded, truncated toward negative, or truncated toward zero.

A MATLAB script to do this, kindly provided by JJ, is included ("howtoscrewupsomethingnice.m")

Here is the results of df-measurements for piano samples. The samples contain, besides the piano sounds, substantial amount of digital silence (more than half of the entire sample). As the pauses are integral part of the sample (they also should be listened and measured) they can mess the results and the latter might be hard to interpret. So, I additionally performed df-measurements for this sample with removed pauses.

(1) The full piano sample. In this case more than half of df levels computed for the signal (400ms window) refer to the silence part and median is not good estimator for overall distortion of the signal. So, I used mean value instead (in the end of the file names in brackets).

dsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-14.1298-0.9066-0.0413(-3.7151)

dsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-25.9008-3.8944-0.0437(-8.5301)

rsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-20.5367-4.7371-0.0029(-6.0758)

rsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-32.5971-8.8625-0.0252(-11.2105)

tdsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-18.5240-8.2642-2.3131(-8.4641)

tdsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-30.2572-8.8987-2.4562(-13.2600)

tzsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-15.9514-1.0166-0.0021(-3.8553)

tzsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-28.9211-5.9214-0.0092(-9.6782)

The similarity of their artifact signatures:

The shortest distance (between art.signatures of tdsig6 and dsig6) is 1.35dB, which means that almost all conclusions about audible closeness to the original will be on the edge of df-metric possibilities. Here we have two different groups of similar distortions (distance < 2dB): dsig6/tdsig6/tzsig6 and dsig4/tdsig4/tzsig4. The similarity holds within each group and not between groups. Within each group the samples can be sorted according to their mean df levels:

group1_1_tdsig6 (-13.26dB)
group1_2_tzsig6 (-9.68dB)
group1_3_dsig6 (-8.53db)

group2_1_tdsig4 (-8.46dB)
group2_2_tzsig4 (-3.86db)
group2_3_dsig4 (-3.72dB)

For convenient listening here you can download the above samples named according to this order ("1" is closest) - https://www.dropbox.com/s/ucgc25gi0shda3k/xr100_8_400.zip?dl=0
It worth to remind that while listening one should
- assess audible closeness to the original (not pleasantness)
- pay attention to the whole sample including pauses.

Taking into account the composite character of the sample and extreme similarity of the art.signatures the correlation of predicted closeness to audible one is not bad. To my taste I would only swap the first and the second samples in group1.

(2) The piano sample without pauses. The pauses were removed similarly in all samples and df-measurements were performed from scratch. In this case medians are valid estimators but for methodological consistency I will use means (in this case they are close to medians).

dsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.2553-6.9809-1.0308(-7.4052)

dsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-27.0974-17.6638-4.7017(-17.1097)

rsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-20.1883-10.7021-0.8304(-10.6211)

rsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.8961-22.5251-10.3141(-22.1555)

tdsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-19.8591-11.1020-3.3886(-11.3859)

tdsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.7600-22.3796-5.9886(-21.2367)

tzsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.9694-4.3747-0.0013(-6.0109)

tzsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-28.4143-17.9826-4.8017(-17.8143)

The similarity of the artifact signatures:

Now we have shorter distances between art.signatures and following groups of similar distortions (distance < 2dB):

group1_1_rsig6_nopause (-22.16dB)
group1_2_tdsig6_nopause (-21.24dB)
group1_3_tzsig6_nopause (-17.81dB)
group1_4_dsig6_nopause (-17.11db)
group1_5_rsig4_nopause (-10.62db)

group2_1_tdsig4 (-11.39dB)
group2_2_dsig4 (-7.41db)

Files for listening named according to this order are here - https://www.dropbox.com/s/az5eugggb4kag18/xr100_8_400_nopause.zip?dl=0

To my taste the correlation is better in this case, but still, I would swap “2” and “3” in group1.

To be honest these examples of distortion are exaggerated and involve dithering which is aimed at replacing one type of distortion by another more pleasant for human hearing (a kind of psychoacoustic treatment, a simple one). Art.signature mechanism, which is responsible for accounting psychoacoustic features of distortion in df-metric was never intended for such extreme cases. And I'm a bit surprised it is able to produce some meaningful results/predictions even in the case. Does any other objective audio metric exist that is capable of producing better results in this case? If yes, can anybody apply it to this case in order to compare the results?

Zek · Jan 29, 2020

Links for samples and pics don't work???

xr100 · Jan 29, 2020

Serge Smirnoff said:
Here is the results of df-measurements for piano samples.

Thanks for your reply and posting all those results, etc.

(I am absolutely exhausted from being out all day yesterday and very little sleep subsequently--so this will be brief, and, forgive me if overly crudely expressed...)

BTW, I realise you are using the word "sample" in a different sense, but just to reiterate, and to make sure it's quite clear, as the implications will soon be relevant, the sound is generated entirely synthetically. The piano is physically modelled, and (intentionally on my part) the output of this process is mono.

It is then fed into Lexicon's "Room" plug-in, i.e. an "artificial" digital reverb process--about 50% mix between the "dry" piano input and "wet" reverb output. Note that this is a stereo process, and, its "model" includes early reflections; it can create "convincing" spatial cues. (Just how "convincing," including the limitations of stereo per se, isn't really relevant here.) This is very obvious when listening to the original file.

Serge Smirnoff said:
The samples contain, besides the piano sounds, substantial amount of digital silence (more than half of the entire sample).

That's not the case for the 32-bit float original.

"Between notes," i.e. after the note is "released," the signal decays but is still around -90dB typical at when the next note is triggered.

Serge Smirnoff said:
involve dithering which is aimed at replacing one type of distortion by another more pleasant for human hearing (a kind of psychoacoustic treatment, a simple one).

Dither (as in "vanilla" TPDF) is NOT a "psychoacoustic treatment," it (basically) "linearisers" the system. In this case, the distortion in the rounded/truncated files is perceptually "nasty," at worst almost turning single piano notes into sounding like "power chords" played on electric guitar through "overdrive" distortion/FX.

Moreover, though, in the rounded/truncated files, the stereo image is somewhere between highly compromised and annihilated, very clearly wandering all over the place. The release stages of the notes are "cut off."

By comparison, even the 4-bit dithered version retains the spatial cues such that there is a sense of a piano that is centrally "panned," clearly placed and surrounded by a "room." "Tails" may well be "lost" under the noise but the release stage of the notes is kept, too, rather than completely modifying the release envelope.

IOW, the dither is not "masking" anything in order for these aspects of the signal to actually be encoded rather than "lost!"

scott wurcer · Jan 29, 2020

xr100 said:
Moreover, though, in the rounded/truncated files e.g. the reverb tail is annihilated... i.e. The dither is not "masking" anything in order for the signal to actually be encoded rather than lost!

There was an AES presentation years ago on the problems with un-dithered commercial CD's. It's easy to show that the correct amount of TPDF dither acts as additive noise and there is no "averaging" out or masking. There are some esoteric issues around moments of random distributions but that is out of my experience.

xr100 · Jan 29, 2020

scott wurcer said:
There was an AES presentation years ago on the problems with un-dithered commercial CD's. It's easy to show that the correct amount of TPDF dither acts as additive noise and there is no "averaging" out or masking. There are some esoteric issues around moments of random distributions but that is out of my experience.

We get quickly into the question of "self-dither" (by the inherent "noise floor" on a recording) and what "nonsubtractive" dither "does" and "does not," and into the territory of "Lipshitz et al." An "interesting" area to explore further, as you suggest.

Stage one, though is getting to the stage of acceptance, i.e. that it, essentially, does work, even if "how" isn't exactly intuitive. The easiest way to "accept" that, yes, it really works is just to do some tests that demonstrate it in action. Examples are in this thread.

16-bit dithered can carry e.g. a 1kHz sine wave at far, far below -100dB where truncated would yield absolute silence.

(And for most practical purposes, if in doubt, use TPDF dither on conversion to fixed-point. Whilst allowing some margin to avoid base rate/sample value clipping, i.e. don't add it to a "0dB normalised" "signal," if not intersample clipping.)

j_j · Jan 29, 2020

Yeah, I was going to object to the portrayal of dither as "psychoacoustic treatment" because it is absolutely nothing of the sort, it is linearization of the quantizer, no more, no less. It replaces signal-correlated distortions with a constant noise floor. Again, there is no masking, and the resulting quantizations create no spectral lines with the TPDF. Were we to use uniform dither, the second order error (power) would not be linearized, you would hear noise come and go with signal energy, which is why we use TPDF. It means that both the signal is linearized, AND that the added noise level is exactly constant.

The fact you need to edit the files says to me that you aren't looking at the entire problem, and again, it shows clearly that you MUST consider perception.

It is easy to adjust the matlab to other levels of quantization, you should give it a try.

Audibility thresholds of amp and DAC measurements

Active Member

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Major Contributor

Active Member

Major Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Addicted to Fun and Learning

Master Contributor

Active Member

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Similar threads