• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Let's develop an ASR inter-sample test procedure for DACs!

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,486
Likes
4,113
Location
Pacific Northwest
By way of contrast, I'll mention that this problem isn't nearly as bad in video or film. Most of the shows I watch, whether on streaming or disc, have good dynamic range and low overall levels. This applies across a broad spectrum of studios / productions. It appears that in video & film production, the audio is engineered to consistent standards having reasonable dynamic range. I wish the music industry would adopt similar standards.
 
OP
splitting_ears
Joined
Aug 15, 2023
Messages
88
Likes
200
Location
Saint-Étienne, France
Again, show me a music track (not a test signal) with peak levels -3 dB or lower having ISOs. I've never seen one, what I know of reconstruction theory tells me they would be extremely rare and I wonder if any real-world examples exist at all.
With this much headroom, I don't think there is any. The problem is that you won't easily find 3dB of sample headroom in music released after the 90s. Even a bluegrass album from that era is likely to have less than 1dBFS of it.

Which is another way of saying they are causing the problem by recording too loud. Put differently, there are 2 ways ISOs happen:
  1. The recording or mastering engineer makes the recording too loud.
  2. DA reconstruction raises a waveform more than 3 dB above a sampled amplitude.
(2) has only been demonstrated with carefully constructed test signals, and while theoretically possible with music, is so rare we don't have any actual examples. Even if they did exist, they would be so rare as to be a non-issue.
About 2) : on post #4, I shared 'real world examples' with reconstructed peaks almost 5dB above the sampled peaks.

Just because they do this intentionally, does not imply they enjoy it.
I didn't get that from your previous messages, so thanks for elaborating. I'm glad to see that you have at least a bit of empathy for fellow audio engineers :)

It appears that in video & film production, the audio is engineered to consistent standards having reasonable dynamic range. I wish the music industry would adopt similar standards.
Yep. Big budgets = big companies = easier standardisation. Throw independent artists/labels/content creators into the mix (which, in my humble opinion, is a huge artistic plus compared to these fields) and all bets are off.

All this discussion aside, I share your curiosity about how different DACs handle this.
That's great to hear!
 
Last edited:

melowman

Member
Joined
Sep 18, 2020
Messages
68
Likes
28
@MRC01 Still mistaking peaks for loudness…
Lowering peaks does not mean less crushed/squashed sound. In certain situations it could even be the opposite (i.e. lower peak levels = even more crushed sound)
Do you think that lowering Metallica’s Death Magnetic by 3dB will make it sound better? Do you also think that normalizing the records that you love to 0 will make them sound worse (ISOs put aside)?

All this debate has to do with peak headroom, not loudness. I won’t comment anymore on that if you’re still making the confusion.
 

DVDdoug

Major Contributor
Joined
May 27, 2021
Messages
3,033
Likes
3,995
It's not a tired cliche, but a well known fact that audio engineers in certain genres of popular music are deliberately pushing things to sound as loud as possible.
But like he said, they are only doing what they are told to by their clients. If you are paying the mastering engineer he will do whatever you ask.

By way of contrast, I'll mention that this problem isn't nearly as bad in video or film.
Dolby has some loudness guidelines that allow for headroom. But I'm not sure if they are mandated, and in any case they are allowed to use the headroom for loud parts. Also, movie theaters are loudness-calibrated. People would complain and walk-out of the theater if movies were "loudness war" compressed and limited. As it is, people often complain about movie theaters being too loud.

I wish the music industry would adopt similar standards.
I'm NOT optimistic. :( But I'm old and I already own most of the music I want (or will want) so I don't really care that much. :p

We've had a couple of generations of musicians & producers that have grown-up with highly compressed music and they probably think music is supposed to be constant loudness (no dynamics) if not constantly-loud or constantly "intense".

But it's not only that... The loudness wars started back in the analog days.

When CDs were introduced I remember thinking that popular music would become more dynamic to take advantage of the increased dynamic range. Obviously, that didn't happen and it didn't take long for them to figure-out how to use DSP to compress and limit more than was possible with analog processing (better digital weapons for fighting the loudness war).

I keep reading that "the loudness wars are over" because the streaming services all use loudness normalization making it harder to "win" the war. But I use ReplayGain on the files that I own and the last couple of CDs I bought and ripped (I don't buy a lot) were getting a ReplayGain adjustment around -10dB. That much gain reduction is unusual compared to most of the (mostly older) music that I have so it hasn't gone-away and it might be getting worse!
 
Last edited:

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
2,781
Likes
6,223
Location
Berlin, Germany
The "headroom issue" is not a limitation of the recording format or the DAC. The root cause is recording & mastering folks intentionally abusing the recording format to make recordings as loud as possible. Addressing this in the DAC doesn't address the root cause, but only puts a band-aid over the effects. Doing so would likely be counterproductive, since if we "fix" it in the DAC then those folks will only up the ante and make recordings even louder.
You cannot make a mix louder by introducing ISO's because those are always short transients contributing almost no energy to make it perceived as louder, only more broken, maybe (assumed it's not of the benign simple clipping type but rather those nasty mis-behaviors like overflows/wraparounds).

So, fixing it in the DAC would solve it, and the reasonable strategy is stable soft-clipping (because that is very unlikely to be audible). Some DAC chips do handle it this way already (AKM AK4493 for example), some don't, and that's what we'd like to know... I do support the OP's request for a test.

Here is one that doesn't, showing the textbook overflow with wraparound.
image-27.png

Source: https://forums.stevehoffman.tv/thre...our-dac-or-software-player-reduce-it.1173425/
For grins, this is a R2R NOS*) DAC (note the stair-steps), so this problem is apparently not restricted to oversampling DS-DACs. DSP/FPGA code flipping out :)
EDIT: *) Looking at the time scale, it appears that the DAC was set up in oversampling mode.
 
Last edited:

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,486
Likes
4,113
Location
Pacific Northwest
@MRC01 Still mistaking peaks for loudness…
Lowering peaks does not mean less crushed/squashed sound.
I am not confusing anything. I didn't say that lowering peaks means less crushed sound. What I said is it eliminates the ISOs. That is: take a recording that has ISOs, reduce the overall level a few dB and the ISOs go away. Doesn't matter how squashed or dynamic the recording is, its dynamic range won't change.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,674
Likes
241,066
Location
Seattle Area
Maybe it is not audible, but +3dB of noise at -100dB is not audible either.
Oh that can definitely be audible. It all depends on what level you play at. If you play at 97 dBSPL, then your noise would be at 0 dBSPL. Our hearing threshold is actually a bit better than that. So even in this situation noise can be audible. My reference for inaudibility is playback level of 120 dBSPL which would easily make this noise audible.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,674
Likes
241,066
Location
Seattle Area
All this discussion aside, I share your curiosity about how different DACs handle this.
Let me say that the discussion has been useful and I am glad we are having it. So nothing is lost here.
 

popej

Active Member
Joined
Jul 13, 2023
Messages
281
Likes
185
If you play at 97 dBSPL, then your noise would be at 0 dBSPL. Our hearing threshold is actually a bit better than that.
I live in a city. At quiet night I get about 30dBSPL ambient noise. I wonder if anyone can get near 0dBSPL, maybe headphones? But then you should hear your heart too ;)
I think one can hear sounds below noise level, but not the noise itself.
 

melowman

Member
Joined
Sep 18, 2020
Messages
68
Likes
28
I am not confusing anything. I didn't say that lowering peaks means less crushed sound. What I said is it eliminates the ISOs. That is: take a recording that has ISOs, reduce the overall level a few dB and the ISOs go away. Doesn't matter how squashed or dynamic the recording is, its dynamic range won't change.
Even if you’re not actually confused, you do look confused: you are the one who replied to my comment on loudness.
Anyways, cheers. [Edit: my double-beer emoticon didn't appear, there’s supposed to be one accompanying my cheers :)]
 
Last edited:

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,674
Likes
241,066
Location
Seattle Area
I live in a city. At quiet night I get about 30dBSPL ambient noise. I wonder if anyone can get near 0dBSPL, maybe headphones? But then you should hear your heart too ;)
I think one can hear sounds below noise level, but not the noise itself.
Large number of our readers use headphones. For speakers, you need to look at the spectrum of noise, not single value. 30 dBSPL is likely all bass, not midrange. See: https://www.audiosciencereview.com/forum/index.php?threads/dynamic-range-how-quiet-is-quiet.14/

And related video from my channel:

 

popej

Active Member
Joined
Jul 13, 2023
Messages
281
Likes
185
For speakers, you need to look at the spectrum of noise, not single value. 30 dBSPL is likely all bass, not midrange.
The same is valid for DAC. Single value SPL is a noise power that is spread over the entire band, only the frequency characteristic is flat. If you draw noise characteristic of a DAC, it will lay much lower than its SNR value. How lower, depend on critical band for human hearing. At 1kHz critical band is about 130Hz. This would mean, that noise power at this frequency is like 20dB less than single value SNR (if I have done math correctly).

If we talk about -97dB SNR, then the curve for noise characteristic should be at -117 at 1kHz. Assuming 97dBSPL output, noise is a few dB lower than the lowest measured ambient noise in your quote.

Sorry about video, I haven't watched. Can't stand this media for delivering information.
 
Last edited:
OP
splitting_ears
Joined
Aug 15, 2023
Messages
88
Likes
200
Location
Saint-Étienne, France
So, fixing it in the DAC would solve it, and the reasonable strategy is stable soft-clipping (because that is very unlikely to be audible). Some DAC chips do handle it this way already (AKM AK4493 for example), some don't, and that's what we'd like to know... I do support the OP's request for a test.

Here is one that doesn't, showing the textbook overflow with wraparound.
image-27.png

Source: https://forums.stevehoffman.tv/thre...our-dac-or-software-player-reduce-it.1173425/
For grins, this is a R2R NOS*) DAC (note the stair-steps), so this problem is apparently not restricted to oversampling DS-DACs. DSP/FPGA code flipping out :)
EDIT: *) Looking at the time scale, it appears that the DAC was set up in oversampling mode.

Very informative, thank you. Do you know how we might simulate this? Is it similar to wave-folding distortion, or something else entirely?
 

KSTR

Major Contributor
Joined
Sep 6, 2018
Messages
2,781
Likes
6,223
Location
Berlin, Germany
Very informative, thank you. Do you know how we might simulate this? Is it similar to wave-folding distortion, or something else entirely?
This appears to be systematic signed integer wraparound which is trivial to emulate, basically a modulo N operation.
Wave-folding is different but closely related, and equally easy to emulate.

Basically any simple transfer characteristic, statically mapping output sample values to input sample values according to whatever law, is easy to emulate.
I'm doing exactly this in my "ESS hump" emulator: https://www.audiosciencereview.com/...-hump-style-distortions-to-audio-files.30758/
 
OP
splitting_ears
Joined
Aug 15, 2023
Messages
88
Likes
200
Location
Saint-Étienne, France

danadam

Addicted to Fun and Learning
Joined
Jan 20, 2017
Messages
994
Likes
1,545
In the attachment there are 44.1 kHz and 48 kHz files with Fs/4 tones which:
  • start at -0.5 dBTP,
  • in the last 0.1 of each second ramp up by 0.1 dB
  • until they reach +2.9 dBTP.
(there are also 4 pulses at the beginning and at the end: for aligning, to "wake-up" the DAC, etc)

intersample.44k.wav.png

intersample.44k.png


Below are spectrograms of outputs of a few things I have lying around:
  • DragonFly Red
  • Fiio K3 (USB-C) with and without gain
  • Cowon Plenue D
  • Samsung dongle
  • Speaka (AFAIK it's a clone of DragonFly Black)
  • Lenovo T480s
The 44.1 kHz file was played on Linux through ALSA using aplay command and the outputs were captured with ADI-2 Pro FS BE:
  • 192 kHz sampling rate,
  • ref level +13 dBu,
  • different Trim Gain settings to have at least 3.5 dB of headroom when playing calibrating 220 Hz tone (3 dB for true peaks, 0.5 dB as a spare)
For spectrograms I downsampled the captured files to 96 kHz and used only the left channel, to save some vertical space. I also aligned them so that during the 10-th second the signal should be at 0 dBTP [0].

[0] For those more observant among you: yes, if the signal starts at -0.5 dBTP then it should reach 0 dBTP during the 9-th second. Well, I miscalculated a bit and in the file that I used the signal was from -0.6 dBTP to +2.8 dBTP ¯\_(ツ)_/¯

t_df_red.pngt_fiiok3_nogain.pngt_fiiok3_gain.pngt_plenued.pngt_samsung_dongle.pngt_speaka.pngt_t480s.png

For ADI-2 itself I only could do a loopback recording at 44.1 kHz sampling rate. I used:
  • Main out at +2 dB, ref level +4 dBu
  • Phones out at 0 dB, low power setting
For phones it looks at first like it doesn't clip, so I also added spectrograms with zoomed dB scale (for both main and phones):

t_adi2_main.pngt_adi2_main_zoom.pngt_adi2_ph.pngt_adi2_ph_zoom.png

And now I'll go listen to my volume-normalized music and contemplate the point of this exercise :oops: ;)
 

Attachments

  • intersample.44k.flac.zip
    729.6 KB · Views: 28
  • intersample.48k.flac.zip
    792.2 KB · Views: 30

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,674
Likes
241,066
Location
Seattle Area
The same is valid for DAC. Single value SPL is a noise power that is spread over the entire band, only the frequency characteristic is flat. If you draw noise characteristic of a DAC, it will lay much lower than its SNR value. How lower, depend on critical band for human hearing. At 1kHz critical band is about 130Hz. This would mean, that noise power at this frequency is like 20dB less than single value SNR (if I have done math correctly).
Your math for ERB is fine. It is your knowledge of the topic that is the problem. ERB is a simplification of auditory filter bank at one loudness level. It is inappropriate to use as is for audibility of noise. I provide a reference from Stuart in the link I gave you on this. Importantly, if you read the second reference, Dynamic-Range Issues in the Modern Digital Audio Environment, from J. AES, you will see this listening test:

1698966491912.png


As I have highlighted, audibility threshold for such a wideband noise can even go down to negative SPL level of -2 dB. And this is for mono. For stereo the noise adds, lowering the threshold even more.

Also notice how environmental noise did not impact the result. The paper goes into more detail on how directional noise is more audible than diffused noise in a room.
 

popej

Active Member
Joined
Jul 13, 2023
Messages
281
Likes
185
It is your knowledge of the topic that is the problem.
OK, let me summarize: in a quiet room white noise can be heard on average at around 3.8dB SPL.

I assume, that a DAC with 0dB SPL noise is good enough. If SNR of a DAC is 97dB, than we can set peak volume at 97dB SPL without worrying about hearing noise.

What if we set peak output level to 105dB? When we can expect to hear some noise at 8dB SPL? Without signal DAC most probably goes to mute and noise is suppressed. With music signal, 8dB noise doesn't matter. I think there could be a second at beginning of a 24bit recording, where fade-in is applied and recorded signal is below -97dBFS. Maybe at this moment DAC's noise could be perceived.

Sure it is good to have high SNR. I like resilience to ISP too. Looks like it is personal preference, for which feature allocate these 3dB ;)
 

NTK

Major Contributor
Forum Donor
Joined
Aug 11, 2019
Messages
2,716
Likes
6,007
Location
US East
What if we set peak output level to 105dB? When we can expect to hear some noise at 8dB SPL? Without signal DAC most probably goes to mute and noise is suppressed. With music signal, 8dB noise doesn't matter. I think there could be a second at beginning of a 24bit recording, where fade-in is applied and recorded signal is below -97dBFS. Maybe at this moment DAC's noise could be perceived.
Not exactly what you asked. The charts are for exposure to noise, then after 2 minutes evaluate the threshold shift at 4 kHz, which is near our most sensitive frequency, and where our hearing is most fragile.
TTSRecovery.jpg

Source: https://www.sfu.ca/sonic-studio-webdav/cmns/Handbook Tutorial/Audiology.html
 
Last edited:

levimax

Major Contributor
Joined
Dec 28, 2018
Messages
2,393
Likes
3,520
Location
San Diego
In the attachment there are 44.1 kHz and 48 kHz files with Fs/4 tones which:
  • start at -0.5 dBTP,
  • in the last 0.1 of each second ramp up by 0.1 dB
  • until they reach +2.9 dBTP.
(there are also 4 pulses at the beginning and at the end: for aligning, to "wake-up" the DAC, etc)

View attachment 323287
View attachment 323288

Below are spectrograms of outputs of a few things I have lying around:
  • DragonFly Red
  • Fiio K3 (USB-C) with and without gain
  • Cowon Plenue D
  • Samsung dongle
  • Speaka (AFAIK it's a clone of DragonFly Black)
  • Lenovo T480s
The 44.1 kHz file was played on Linux through ALSA using aplay command and the outputs were captured with ADI-2 Pro FS BE:
  • 192 kHz sampling rate,
  • ref level +13 dBu,
  • different Trim Gain settings to have at least 3.5 dB of headroom when playing calibrating 220 Hz tone (3 dB for true peaks, 0.5 dB as a spare)
For spectrograms I downsampled the captured files to 96 kHz and used only the left channel, to save some vertical space. I also aligned them so that during the 10-th second the signal should be at 0 dBTP [0].

[0] For those more observant among you: yes, if the signal starts at -0.5 dBTP then it should reach 0 dBTP during the 9-th second. Well, I miscalculated a bit and in the file that I used the signal was from -0.6 dBTP to +2.8 dBTP ¯\_(ツ)_/¯

View attachment 323289View attachment 323290View attachment 323291View attachment 323292View attachment 323293View attachment 323294View attachment 323295

For ADI-2 itself I only could do a loopback recording at 44.1 kHz sampling rate. I used:
  • Main out at +2 dB, ref level +4 dBu
  • Phones out at 0 dB, low power setting
For phones it looks at first like it doesn't clip, so I also added spectrograms with zoomed dB scale (for both main and phones):

View attachment 323296View attachment 323297View attachment 323298View attachment 323299

And now I'll go listen to my volume-normalized music and contemplate the point of this exercise :oops: ;)
Wow... nice work. So is this evidence that DAC's can sound different if they are presented with intersample overs? The differences look like they could be audible.
 
Top Bottom