• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. We now measure and review equipment for free! Click here for details.

Audibility thresholds of amp and DAC measurements

xr100

Addicted to Fun and Learning
Joined
Jan 6, 2020
Messages
517
Likes
214
Location
London, UK
Yes, from a listener perspective both files are degraded/distorted versions of the original. One can be preferred to another depending on pesonal audio taste.
Have you listened to the "piano" example posted?

And your music sample is exactly from that era ))
Yes, it is.

The equipment that was used in the producer's own recording studios is well documented.

(There is actually a Sound on Sound article about the recording, as a retrospective piece from a few years ago, but it is littered with so many errors that I don't want to "endorse" it with a link.)

These included a Fairlight sampler (the strings on that record, for example), Calrec Soundfield Microphone (not used as a "soundfield" mic, but on all vocals as it was the best microphone they could find,) Lexicon 480L reverb, an SSL 4000E desk; they were early adopters of 24 track digital (Sony DASH) multitrack, and subsequently 48 track digital multitrack. Obviously an "audiophile" (for want of a better word--"no compromise"?) approach would not have been taken but, accepting that stylistically it could be considered "of its era," the combination of (then) high-end equipment and the engineering effort that went into making the record meant that it sounded good then, and it still sounds good today.

I dare say some digital "nasties" are lurking within; for example, the bass instrument on that record was definitely from an FM synth, and even today FM synthesis is a problem because (when modulating one operator with another) of the generation of harmonics/overtones to infinity (aliasing.) Particularly the original Yamaha DX7 (FM synth from 1983), besides all of the DSP compromises needed then and the 12-bit DAC used, also had a very poor analogue section, with a very low S/N ratio (probably something like AM radio!) Fortunately, the worst of these (both digital and analogue) issues would (hopefully) be eliminated through "EQ" (inc. LPF), masking (inc. by other parts), and noise gating.

(The rest is suspect, too, ranging from the old ADC/DACs used to the DSP (presumably recirculating all-pass filter structures etc.) in the Lexicon...)

In any case, it's a pretty clean and bright recording, and it was supposed to be, so the intended aesthetic is emphatically ruined by "bit-crushing"--and the last thing you'd want to do is add more digital "nasties" "cascaded" with those bound to be lurking on the recording.

(I suspect that when anyone complains that this kind of material sounds bad to them, they actually mean it sounds too clean, bright, lush (chorusing/modulation FX), etc. "Grunge" it is not!)
 
Last edited:
Joined
Dec 7, 2019
Messages
207
Likes
76
BTW, it is not easy to "read" the small monochromatic images that you have been uploading. It might be better to upload larger images?
The diffrograms are the output of Matlab code. They show degradation of a signal with time. One horizontal pixel represents 400ms of a signal and its color corresponds to df level for that 400ms portion of the signal. You can enlarge them in the nearest-neighbor mode to preserve the pixel structure of the image.
 
Joined
Dec 7, 2019
Messages
207
Likes
76
By abandoning least mean squares techniques and using knowledge of perception.
I also account perception but I do it in a more simple way with the help of art.signatures; and MAE technique (a version of least mean squares technique) for their comparison is the best among all that I've tried. Indeed, this part of df-metric is less elaborated but only because I believe that the need for research of audibility of distortion is already not very actual today and will be even less actual in the future.
 

j_j

Addicted to Fun and Learning
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
606
Likes
860
Location
My dining room.
I also account perception but I do it in a more simple way with the help of art.signatures; and MAE technique (a version of least mean squares technique) for their comparison is the best among all that I've tried. Indeed, this part of df-metric is less elaborated but only because I believe that the need for research of audibility of distortion is already not very actual today and will be even less actual in the future.
Well, until you compare original signal to error signal, you're still not going anywhere useful. You should start by looking into masking thresholds, and the meaning of loudness at elevated thresholds.
 

xr100

Addicted to Fun and Learning
Joined
Jan 6, 2020
Messages
517
Likes
214
Location
London, UK
Some more plots courtesy of "DeltaWave"...

The same source file as previous was used*, and in iZotope RX, was scaled down by -1dB, and then upsampled 4x. (=176.4kHz.)

(EDIT: *Meaning the "Rickroll'd" file, NOT the "Piano" file.)

It was then truncated to 4 bits (using Reaper's "Bit Reduction/Dither with Noise Shaping" plug-in/script), and, separately, iZotope Ozone's Dither process was used (Dither setting: Strong, Noise shaping: Max.) Since iZotope Ozone's Dither process supports only a minimum bit-depth of 8 bits, this setting was used; however, the signal was bit-shifted down by 4 bits before Ozone, and then bit-shifted up by 4 bits following it.

Spectrum of the upsampled file:

ASR46.png


Delta spectrogram for the dithered/noise-shaped file:

ASR47.png


For the truncated file:

ASR48.png


A marked difference in character between the two "in-band" delta spectra can be seen.

Loading the files into iZotope RX, the difference in character of the two processed signals above >~20kHz (on its spectrogram) is quite obvious, too. (And, ballpark, typically somewhere around 30dB higher in amplitude.)

The "DF Metric" for each as calculated by DeltaWave? -18.7dB for the truncated, -2.8dB for the dithered/noise shaped. Hmm...
 
Last edited:

pkane

Major Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
1,235
Likes
1,713
Location
North-East
Some more plots courtesy of "DeltaWave"...

The same source file as previous was used, and in iZotope RX, was scaled down by -1dB, and then upsampled 4x. (=176.4kHz.)

It was then truncated to 4 bits (using Reaper's "Bit Reduction/Dither with Noise Shaping" plug-in/script), and, separately, iZotope Ozone's Dither process was used (Dither setting: Strong, Noise shaping: Max.) Since iZotope Ozone's Dither process supports only a minimum bit-depth of 8 bits, this setting was used; however, the signal was bit-shifted down by 4 bits before Ozone, and then bit-shifted up by 4 bits following it.

Spectrum of the upsampled file:

View attachment 47409

Delta spectrogram for the dithered/noise-shaped file:

View attachment 47410

For the truncated file:

View attachment 47411

A marked difference in character between the two "in-band" delta spectra can be seen.

Loading the files into iZotope RX, the difference in character of the two processed signals above >~20kHz (on its spectrogram) is quite obvious, too.

The "DF Metric" for each as calculated by DeltaWave? -18.7dB for the truncated, -2.8dB for the dithered/noise shaped. Hmm...
Which of the files is the truncated and which dithered/noise shaped?

Look at the Original or Matched waveform to see what the difference looks like in the time domain. Here's an example: (blue is the original PIANO.wav, white is dsig6.wav:

1580011758212.png
 

j_j

Addicted to Fun and Learning
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
606
Likes
860
Location
My dining room.
Which of the files is the truncated and which dithered/noise shaped?

Look at the Original or Matched waveform to see what the difference looks like in the time domain. Here's an example: (blue is the original PIANO.wav, white is dsig6.wav:

View attachment 47428
So you know, the files ending in 6 are 6 bit files.
 

j_j

Addicted to Fun and Learning
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
606
Likes
860
Location
My dining room.
And dsig* is dithered and noise shaped, or just dithered? and rsig* is what? I'm sure I'm just being slow :)
dsig is TPD, but no shaping.
rsig is rounded, no dither
tz is "truncate to zero"
and td is truncate down (toward negative)
 

xr100

Addicted to Fun and Learning
Joined
Jan 6, 2020
Messages
517
Likes
214
Location
London, UK
Spectrum of the upsampled file:

View attachment 47409

Delta spectrogram for the dithered/noise-shaped file:

View attachment 47410

For the truncated file:

View attachment 47411
Which of the files is the truncated and which dithered/noise shaped?
As "labelled."

I think I've managed to cause some confusion here. To be clear, this was the earlier "Rickroll'd" file, upsampled 4x ("original"), then truncated to 4 bits for one file, and dither/noise shaped to 4 bits for the other.

The upsampled file spectrum shows it's "brickwalled" at ~20kHz, and it appears that all of the added "out of band" (>20kHz) energy has been included in the "DF Metric" calculation.

IOW, more sophisticated perceptual aspects need not be considered before saying "Houston, we have a problem..."

Look at the Original or Matched waveform to see what the difference looks like in the time domain.
Thanks, will try that. :)
 
Last edited:

pkane

Major Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
1,235
Likes
1,713
Location
North-East
As "labelled."

I think I've managed to cause some confusion here. To be clear, this was the earlier "Rickroll'd" file, upsampled 4x ("original"), then truncated to 4 bits for one file, and dither/noise shaped to 4 bits for the other.

The upsampled file spectrum shows it's "brickwalled" at ~20kHz, and it appears that all of the added "out of band" (>20kHz) energy has been included in the "DF Metric" calculation.

IOW, more sophisticated perceptual aspects need not be considered before saying "Houston, we have a problem..."



Thanks, will try that. :)
I agree. In so far that there is significant out of band energy included in the signal, DF calculation may be overestimating the real ‘audible’ difference. I have a slightly modified version in DeltaWave that applies an audibility curve to the measurement. This has the benefit of cutting out inaudible frequencies while giving more weight to those where the ear is more sensitive. I’m not sure if the result is really better. I’ll play with different ways to do this computation (I have some other ideas to try), but for now it doesn’t apply to the DF metric, only to the DeltaWave delta metric.
 
Joined
Dec 7, 2019
Messages
207
Likes
76
I've created a synthetic "solo piano" recording using a physically-modelled process, to which Lexicon reverb is added (about 50% mix.) Only a few notes are played.

https://we.tl/t-bUw4vBzBnE

(Download expires after 1 week.)

The original file is "PIANO.WAV" (32-bit float/44.1kHz.)

The processed files are either dithered, rounded, truncated toward negative, or truncated toward zero.

A MATLAB script to do this, kindly provided by JJ, is included ("howtoscrewupsomethingnice.m")
Here is the results of df-measurements for piano samples. The samples contain, besides the piano sounds, substantial amount of digital silence (more than half of the entire sample). As the pauses are integral part of the sample (they also should be listened and measured) they can mess the results and the latter might be hard to interpret. So, I additionally performed df-measurements for this sample with removed pauses.

(1) The full piano sample. In this case more than half of df levels computed for the signal (400ms window) refer to the silence part and median is not good estimator for overall distortion of the signal. So, I used mean value instead (in the end of the file names in brackets).

dsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-14.1298-0.9066-0.0413(-3.7151).png

dsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-14.1298-0.9066-0.0413(-3.7151)

dsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-25.9008-3.8944-0.0437(-8.5301).png

dsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-25.9008-3.8944-0.0437(-8.5301)

rsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-20.5367-4.7371-0.0029(-6.0758).png

rsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-20.5367-4.7371-0.0029(-6.0758)

rsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-32.5971-8.8625-0.0252(-11.2105).png

rsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-32.5971-8.8625-0.0252(-11.2105)

tdsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-18.5240-8.2642-2.3131(-8.4641).png

tdsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-18.5240-8.2642-2.3131(-8.4641)

tdsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-30.2572-8.8987-2.4562(-13.2600).png

tdsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-30.2572-8.8987-2.4562(-13.2600)

tzsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-15.9514-1.0166-0.0021(-3.8553).png

tzsig4.wav_cut.wav(44)__PIANO.wav(44)__mono_400-15.9514-1.0166-0.0021(-3.8553)

tzsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-28.9211-5.9214-0.0092(-9.6782).png

tzsig6.wav_cut.wav(44)__PIANO.wav(44)__mono_400-28.9211-5.9214-0.0092(-9.6782)

The similarity of their artifact signatures:
dendro_xr100_8_400.png

The shortest distance (between art.signatures of tdsig6 and dsig6) is 1.35dB, which means that almost all conclusions about audible closeness to the original will be on the edge of df-metric possibilities. Here we have two different groups of similar distortions (distance < 2dB): dsig6/tdsig6/tzsig6 and dsig4/tdsig4/tzsig4. The similarity holds within each group and not between groups. Within each group the samples can be sorted according to their mean df levels:

group1_1_tdsig6 (-13.26dB)
group1_2_tzsig6 (-9.68dB)
group1_3_dsig6 (-8.53db)

group2_1_tdsig4 (-8.46dB)
group2_2_tzsig4 (-3.86db)
group2_3_dsig4 (-3.72dB)

For convenient listening here you can download the above samples named according to this order ("1" is closest) - https://www.dropbox.com/s/ucgc25gi0shda3k/xr100_8_400.zip?dl=0
It worth to remind that while listening one should
- assess audible closeness to the original (not pleasantness)
- pay attention to the whole sample including pauses.

Taking into account the composite character of the sample and extreme similarity of the art.signatures the correlation of predicted closeness to audible one is not bad. To my taste I would only swap the first and the second samples in group1.


(2) The piano sample without pauses. The pauses were removed similarly in all samples and df-measurements were performed from scratch. In this case medians are valid estimators but for methodological consistency I will use means (in this case they are close to medians).

dsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.2553-6.9809-1.0308(-7.4052).png

dsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.2553-6.9809-1.0308(-7.4052)

dsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-27.0974-17.6638-4.7017(-17.1097).png

dsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-27.0974-17.6638-4.7017(-17.1097)

rsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-20.1883-10.7021-0.8304(-10.6211).png

rsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-20.1883-10.7021-0.8304(-10.6211)

rsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.8961-22.5251-10.3141(-22.1555).png

rsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.8961-22.5251-10.3141(-22.1555)

tdsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-19.8591-11.1020-3.3886(-11.3859).png

tdsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-19.8591-11.1020-3.3886(-11.3859)

tdsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.7600-22.3796-5.9886(-21.2367).png

tdsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-31.7600-22.3796-5.9886(-21.2367)

tzsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.9694-4.3747-0.0013(-6.0109).png

tzsig4_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-15.9694-4.3747-0.0013(-6.0109)

tzsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-28.4143-17.9826-4.8017(-17.8143).png

tzsig6_nopause.wav_cut.wav(44)__piano_nopause.wav(44)__mono_400-28.4143-17.9826-4.8017(-17.8143)

The similarity of the artifact signatures:
dendro_xr100_nopause.png

Now we have shorter distances between art.signatures and following groups of similar distortions (distance < 2dB):

group1_1_rsig6_nopause (-22.16dB)
group1_2_tdsig6_nopause (-21.24dB)
group1_3_tzsig6_nopause (-17.81dB)
group1_4_dsig6_nopause (-17.11db)
group1_5_rsig4_nopause (-10.62db)

group2_1_tdsig4 (-11.39dB)
group2_2_dsig4 (-7.41db)

Files for listening named according to this order are here - https://www.dropbox.com/s/az5eugggb4kag18/xr100_8_400_nopause.zip?dl=0

To my taste the correlation is better in this case, but still, I would swap “2” and “3” in group1.

To be honest these examples of distortion are exaggerated and involve dithering which is aimed at replacing one type of distortion by another more pleasant for human hearing (a kind of psychoacoustic treatment, a simple one). Art.signature mechanism, which is responsible for accounting psychoacoustic features of distortion in df-metric was never intended for such extreme cases. And I'm a bit surprised it is able to produce some meaningful results/predictions even in the case. Does any other objective audio metric exist that is capable of producing better results in this case? If yes, can anybody apply it to this case in order to compare the results?
 
Last edited:

xr100

Addicted to Fun and Learning
Joined
Jan 6, 2020
Messages
517
Likes
214
Location
London, UK
Here is the results of df-measurements for piano samples.
Thanks for your reply and posting all those results, etc. :)

(I am absolutely exhausted from being out all day yesterday and very little sleep subsequently--so this will be brief, and, forgive me if overly crudely expressed...)

BTW, I realise you are using the word "sample" in a different sense, but just to reiterate, and to make sure it's quite clear, as the implications will soon be relevant, the sound is generated entirely synthetically. The piano is physically modelled, and (intentionally on my part) the output of this process is mono.

It is then fed into Lexicon's "Room" plug-in, i.e. an "artificial" digital reverb process--about 50% mix between the "dry" piano input and "wet" reverb output. Note that this is a stereo process, and, its "model" includes early reflections; it can create "convincing" spatial cues. (Just how "convincing," including the limitations of stereo per se, isn't really relevant here.) This is very obvious when listening to the original file.

The samples contain, besides the piano sounds, substantial amount of digital silence (more than half of the entire sample).
That's not the case for the 32-bit float original.

"Between notes," i.e. after the note is "released," the signal decays but is still around -90dB typical at when the next note is triggered.

involve dithering which is aimed at replacing one type of distortion by another more pleasant for human hearing (a kind of psychoacoustic treatment, a simple one).
Dither (as in "vanilla" TPDF) is NOT a "psychoacoustic treatment," it (basically) "linearisers" the system. In this case, the distortion in the rounded/truncated files is perceptually "nasty," at worst almost turning single piano notes into sounding like "power chords" played on electric guitar through "overdrive" distortion/FX.

Moreover, though, in the rounded/truncated files, the stereo image is somewhere between highly compromised and annihilated, very clearly wandering all over the place. The release stages of the notes are "cut off."

By comparison, even the 4-bit dithered version retains the spatial cues such that there is a sense of a piano that is centrally "panned," clearly placed and surrounded by a "room." "Tails" may well be "lost" under the noise but the release stage of the notes is kept, too, rather than completely modifying the release envelope.

IOW, the dither is not "masking" anything in order for these aspects of the signal to actually be encoded rather than "lost!"
 
Last edited:

scott wurcer

Addicted to Fun and Learning
Audio Luminary
Technical Expert
Joined
Apr 24, 2019
Messages
732
Likes
1,173
Moreover, though, in the rounded/truncated files e.g. the reverb tail is annihilated... i.e. The dither is not "masking" anything in order for the signal to actually be encoded rather than lost!
There was an AES presentation years ago on the problems with un-dithered commercial CD's. It's easy to show that the correct amount of TPDF dither acts as additive noise and there is no "averaging" out or masking. There are some esoteric issues around moments of random distributions but that is out of my experience.
 

xr100

Addicted to Fun and Learning
Joined
Jan 6, 2020
Messages
517
Likes
214
Location
London, UK
There was an AES presentation years ago on the problems with un-dithered commercial CD's. It's easy to show that the correct amount of TPDF dither acts as additive noise and there is no "averaging" out or masking. There are some esoteric issues around moments of random distributions but that is out of my experience.
We get quickly into the question of "self-dither" (by the inherent "noise floor" on a recording) and what "nonsubtractive" dither "does" and "does not," and into the territory of "Lipshitz et al." An "interesting" area to explore further, as you suggest.

Stage one, though is getting to the stage of acceptance, i.e. that it, essentially, does work, even if "how" isn't exactly intuitive. The easiest way to "accept" that, yes, it really works is just to do some tests that demonstrate it in action. Examples are in this thread.

16-bit dithered can carry e.g. a 1kHz sine wave at far, far below -100dB where truncated would yield absolute silence.

(And for most practical purposes, if in doubt, use TPDF dither on conversion to fixed-point. Whilst allowing some margin to avoid base rate/sample value clipping, i.e. don't add it to a "0dB normalised" "signal," if not intersample clipping.)
 
Last edited:

j_j

Addicted to Fun and Learning
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
606
Likes
860
Location
My dining room.
Yeah, I was going to object to the portrayal of dither as "psychoacoustic treatment" because it is absolutely nothing of the sort, it is linearization of the quantizer, no more, no less. It replaces signal-correlated distortions with a constant noise floor. Again, there is no masking, and the resulting quantizations create no spectral lines with the TPDF. Were we to use uniform dither, the second order error (power) would not be linearized, you would hear noise come and go with signal energy, which is why we use TPDF. It means that both the signal is linearized, AND that the added noise level is exactly constant.

The fact you need to edit the files says to me that you aren't looking at the entire problem, and again, it shows clearly that you MUST consider perception.

It is easy to adjust the matlab to other levels of quantization, you should give it a try.
 
Top Bottom