• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

"I can hear them church bells pre-ringing!" - A (mostly) time domain investigation into DAC-like reconstruction filters

Not quite sure what you want to say. It seems like you are talking about downsampling filters in mastering/production, but this post is about reconstruction filters after upsampling in our DAC at home. Are you hinting at problems with intersample overs?
My point is: The topic of ringing is not only about direct hearing of frequencies at or near Nyquist. Real listeners do not listen to reconstruction filter output. Their chain contains real world analog equipment which introduces nonlinearities. As it is known, unwanted ultrasonic content in combination with nonlinearities in analog circuity or transcuders leads to intermodulation distortion.

About intersample overs specifically: Intersample overs created by oversampling process in delta sigma DACs cause the same effect as clipped source content. They can cause intermodulation distortion in audio band with real world downstream analog devices.

Not to write too much, I used ASK AI again:

Query: Explain, if intersample overflows in delta sigma DACs can cause intermodulation distortion.

The main part of ASK AI answer:

How Intersample Peaks Lead to Nonlinearity

If an intersample peak exceeds the voltage range that downstream analog circuitry (e.g., opamps, I/V converters) can handle without distortion:
  • The circuit may clip or compress those peaks.
  • This introduces nonlinearity.
  • Nonlinearities generate both harmonic and intermodulation products; if multiple frequencies are present near these peaks, their sum/difference frequencies appear in the output spectrum[10][11].

Real-world Evidence & Engineering Practice​

Bob Katz warns that mastering up to digital full scale without accounting for intersample overshoots risks "unexpected distortion" on consumer playback equipment—especially with delta-sigma DACs whose output stages may not accommodate >0 dBFS signals[14] (PRINT).
Ken Pohlmann notes that "intersample overload" is a practical concern in real-world D/A conversion and recommends mastering engineers leave headroom below full scale to avoid such issues[15] (PRINT).
Douglas Self explains that when opamps or line drivers are overdriven by excessive input levels—even momentarily—they generate both harmonic and intermodulation distortion products[16] (PRINT).
AES technical papers confirm that commercial DACs often exhibit measurable distortion when fed with test signals engineered to produce large intersample peaks—even though individual samples never exceed full scale[17] (Academic Journal).


Conclusion​

Yes, intersample overflows in an oversampling delta-sigma DAC can create intermodulation distortion products at its analog output—but only if those overflows cause downstream analog circuitry to operate nonlinearly due to insufficient headroom above 0 dBFS.

This is not a property of delta-sigma modulation itself but rather a consequence of real-world limitations in analog hardware design following D/A conversion. The resulting IMD artifacts depend on both program material and hardware implementation.

Intersample Overflows in Oversampling Delta-Sigma DACs and Intermodulation Distortion​

Introduction​

When digital audio is mastered up to 0 dBFS (full scale), the waveform peaks may touch or even "clip" at the sample points. However, due to the band-limited nature of digital audio (especially after steep anti-aliasing filtering), the actual analog waveform reconstructed by a DAC can sometimes exceed 0 dBFS between sample points—a phenomenon known as intersample overshoot or intersample peaks[1][2][3]. This is particularly relevant for oversampling delta-sigma DACs, which are ubiquitous in modern digital audio playback.

The question:
Can these intersample overflows cause intermodulation distortion (IMD) at the analog output of a delta-sigma DAC?


Fundamentals​

1. Delta-Sigma DAC Operation

  • Oversampling: Delta-sigma DACs operate by upsampling (often by factors of 64x or more) and noise-shaping the input signal, pushing quantization noise out of the audio band before converting it to analog with a low-order analog filter[4][5].
  • Reconstruction: Theoretically, if all stages are linear and have sufficient headroom, the reconstructed analog signal will faithfully reproduce any legal input within ±1.0 (normalized full scale).

2. Intersample Peaks/Overflows

  • Definition: Intersample peaks occur when the continuous-time waveform, reconstructed from discrete samples via sinc interpolation (the mathematical ideal), exceeds the maximum amplitude represented by any individual sample[1][6].
  • Magnitude: For signals mastered aggressively near 0 dBFS, intersample peaks can be as much as +3 dB above full scale[7].

Mechanisms for IMD Creation​

A. Where Does IMD Originate?

  • Linear Systems: Purely linear systems do not create new frequencies; thus, no IMD occurs in ideal digital processing or ideal reconstruction.
  • Nonlinearities: IMD arises only when a nonlinear element is present—such as an overloaded analog stage, clipping circuit, or nonideal behavior in DAC output drivers[8][9].

B. How Intersample Peaks Lead to Nonlinearity

  • If an intersample peak exceeds the voltage range that downstream analog circuitry (e.g., opamps, I/V converters) can handle without distortion:
    • The circuit may clip or compress those peaks.
    • This introduces nonlinearity.
    • Nonlinearities generate both harmonic and intermodulation products; if multiple frequencies are present near these peaks, their sum/difference frequencies appear in the output spectrum[10][11].

Step-by-Step Analysis​

1. Digital Input at Full Scale​

  • Mastered content reaches 0 dBFS at sample points.

2. Oversampling/Interpolation​

  • The delta-sigma modulator reconstructs a higher-rate version of the signal.
  • Sinc interpolation reveals true waveform between samples.
  • Intersample peaks may exceed ±1.0.

3. Analog Output Stage​

  • If designed with insufficient headroom for >0 dBFS signals:
    • Analog circuitry clips/compresses on these peaks.
    • Nonlinear transfer function is engaged.

4. Resulting Distortion​

  • Clipping/compression produces harmonics and IMD products.
    • Harmonics: Integer multiples of fundamentals.
    • IMD: Sum/difference frequencies from interaction of two or more tones—these may fall into audible range even if original tones are not harmonically related[12][13].
  • Artifacts manifest as added spurious tones/noise in output.

Real-world Evidence & Engineering Practice​

Bob Katz warns that mastering up to digital full scale without accounting for intersample overshoots risks "unexpected distortion" on consumer playback equipment—especially with delta-sigma DACs whose output stages may not accommodate >0 dBFS signals[14] (PRINT).
Ken Pohlmann notes that "intersample overload" is a practical concern in real-world D/A conversion and recommends mastering engineers leave headroom below full scale to avoid such issues[15] (PRINT).
Douglas Self explains that when opamps or line drivers are overdriven by excessive input levels—even momentarily—they generate both harmonic and intermodulation distortion products[16] (PRINT).
AES technical papers confirm that commercial DACs often exhibit measurable distortion when fed with test signals engineered to produce large intersample peaks—even though individual samples never exceed full scale[17] (Academic Journal).


Conclusion​

Yes, intersample overflows in an oversampling delta-sigma DAC can create intermodulation distortion products at its analog output—but only if those overflows cause downstream analog circuitry to operate nonlinearly due to insufficient headroom above 0 dBFS.
This is not a property of delta-sigma modulation itself but rather a consequence of real-world limitations in analog hardware design following D/A conversion. The resulting IMD artifacts depend on both program material and hardware implementation.


References​


World's Most Authoritative Sources​

  1. Pohlmann, Ken C. Principles of Digital Audio. McGraw-Hill Education. (PRINT)
  2. Watkinson, John. The Art of Digital Audio. Focal Press/Taylor & Francis Group. (PRINT)
  3. Zölzer, Udo ed., DAFX: Digital Audio Effects. Wiley & Sons. (PRINT)
  4. Smith III, Julius O., Introduction to Digital Filters. W3K Publishing. (PRINT)
  5. Lyons, Richard G., Understanding Digital Signal Processing. Pearson Education. (PRINT)
  6. Lipshitz S.P.; Vanderkooy J.; Wright E.M., "A Perceptual Evaluation Of Digital Audio Reconstruction Filters," J.Audio Eng.Society Vol 29(1981):126–142.(Academic Journal)
  7. Rumsey, Francis & McCormick Tim., Sound and Recording: An Introduction. Focal Press/Taylor & Francis Group. (PRINT)
  8. Self Douglas., Audio Power Amplifier Design Handbook, Newnes/Elsevier Science.(PRINT)
  9. Borwick, John ed., Loudspeaker and Headphone Handbook, Focal Press/Taylor & Francis Group.(PRINT)
 
Last edited:
My point is: The topic of ringing is not only about direct hearing of frequencies at or near Nyquist. Real listeners do not listen to reconstruction filter output. Their chain contains real world analog equipment which introduces nonlinearities. As it is known, unwanted ultrasonic content in combination with nonlinearities in analog circuity or transcuders leads to intermodulation distortion.

About intersample overs specifically: Intersample overs created by oversampling process in delta sigma DACs cause the same effect as clipped source content. They can cause intermodulation distortion in audio band with real world downstream analog devices.

...
Yes, IMD is real and can be a concern if ultrasonic content passes from the DAC to downstream devices. And yes, DACs ideally need headroom for intersample overs, otherwise the ouput will be clipped. And last but not least, bad masters are also a thing, sadly.

In my original post, I focused exclusively on the DAC reconstruction filters and pretty much ignored what happens before or after, because you have to make a cut somewhere. Since my filter simulation runs floating point math, intersample overs also just "happen" (-> amplitude values >|1| are allowed) and have no ill effect on the result. There's - if you will - infinite headroom in the simulated upsampler, which is of course not how it works in the real world. On any real DAC, the exact outcome will depend on the installed chip and the config the device manufacturer uses. In general, you can't rely on there being any headroom but on well engineered DACs, you will have a dB or two.
 
First off, I’ve been wondering about this for a while more so in designing FIR crossover filters. So, great, it made no sense that filters would generate tons of ringing when reproducing music.

The single sample impulse test presents a signal that only exists on a test bench. The steepest transient in actual music is several orders of magnitude slower and lower in frequency.

All my music comes from my computer. Would upsampling/interpolating everything to 88.2/96kHz when playing back 44.1kHz files simply move any filter nonsense out of audibility?

If you upsample before passing the signal to your DAC, your upsampler (software player or operating system) will do the filtering for you. The DAC will then upsample and filter again. So if you want to avoid the filtering, this isn't the solution. You also shouldn't avoid the filtering. My advice: I would just not worry about it.

I think the ideal solution would be to just pass the original audio bit perfect to the DAC and let it handle the upsampling and filtering. In Windows, that would probably mean using WASAPI exclusive mode. The engineers at AKM, ESS or Cirrus Logic are paid well to figure out how to best handle audio signals fed to their chips and the companies have a couple of decades of experience with that stuff, so I would let them worry about the details. Pushing bit perfect audio to the DAC would do exactly that.

On my machine, I use a far lazier approach: Upsample everything to 96 kHz in the OS/driver, use EAPO for EQ and stop thinking about the rest. I checked a couple of sampling rate combinations for input and upsample target and the 96 kHz output looked fine for 44.1 and 48 kHz input, which covers 98% of the material I listen to. Since I'm too lazy to run exclusive mode, there will be some resampling taking place on my PC either way. That's why this solution works best for my use case.
 
My advice: I would just not worry about it.

I think the ideal solution would be to just pass the original audio bit perfect to the DAC and let it handle the upsampling and filtering.

This. Linear phase oversampling filtering is the best compromise we have. Pick that, forget about it and enjoy the music. :)

(This is when talking about digital audio reconstruction. Crossover filters are another story.)
 
There's also some good stuff like [1, 2, 3, 4, 5]. A lot of the good stuff focuses on frequency domain analysis, impulse responses and "illegal" signals like impulses, square waves or clipping.
There's also https://troll-audio.com/articles/filter-ringing/ . This one focuses on a "not-quite-impulse" :-)

This post therefore focuses on time-domain analysis and real audio samples.
And yet, the "real audio sample" which shows the difference between filters the clearest (the 7.5 to 7.8 ms part in Michael Jackson example), looks almost like a synthetic square wave :-)

The ground truth is what would be captured by a microphone in the recording studio or concert hall:
...
A couple of things to note while looking at these signals:
  • The downsampled data points are part of the ground truth signal
  • There are peaks and valleys in the ground truth in-between data points of the downsampled signal which are higher/lower than the closest downsampled data points. Or in other words: A straight line connecting the downsampled data points would not be a good representation of the ground truth signal.
  • Since the sampling rate of our downsampled data stream is more than twice the maximum sine frequency contained in it (44.1 kHz vs. 16 kHz), a faithful reconstruction of the ground truth signal is theoretically possible (see Nyquist frequency, we will get back to this)
That's somewhat idealized situation, I think. In general, the signal captured by a microphone will be low-passed before sampling, let's say to high sample rate, and then low-passed again before downsampling to 44.1k. So the data points won't necessarily coincide with the "ground truth signal" as it is defined above.

I'd rather say, that first you choose the bandwidth you are interested in and then the "ground truth signal" is a signal within that bandwidth. That bandwidth also determines the sampling rate needed.

Aside from that, something that complicates things even more, is processing, like compression, limitting, etc. I suspect it may modify the samples in such a way, that they no longer look like anything that could ever come out of an ADC. Like the Michael Jackson example mentioned earlier. Was there really almost square signal recorded or is that processing artifact or recording error?

The reason they are not perfect is that the filters don’t totally suppress all signals beyond the Nyquist frequency of 22.05 kHz and they also inflict some suppression on signals below that threshold.
Do you mean here that's because the passband ripple is non-zero and stopband attenuation is not -inf, or do you mean something else?

Why not use a real DAC and measure that?
I don't have an ADC good enough for the task and I don't intend on buying one just for this.
Here is a sum of 4 not-so-randomly selected sines, highest is 16 kHz:
waveform and spectrum of the squarish input signal


played through different filters of ADI-2 Pro, bluish is the input, reddish is the output:
ADI-2 Pro output compared to the input

(I did a similar linear to minimum comparison, only fully in digital domain, here)

Or if that was too squarish, the same 4 sines with their phases mangled and amplitude modulated:
waveform and spectrum of the non-squarish input signal

ADI-2 Pro output compared to the input


Your linear phase filters perform pretty similar to each other, why is that?
I double checked this and could find no error. The FR and IR plots use the same filter as the processed audio snippets and they look 100% fine. If you have any idea, drop it in a post below.
There's probably not enough high frequency content to create images big enough to visibly affect the reconstructed shape. Here, for example, is 18 kHz tone at -2 dBFS played using Sharp and Slow filter:
ADI-2 pro output compared to the input and the output spectrum
 
Last edited:
If you upsample before passing the signal to your DAC, your upsampler (software player or operating system) will do the filtering for you. The DAC will then upsample and filter again. So if you want to avoid the filtering, this isn't the solution. You also shouldn't avoid the filtering. My advice: I would just not worry about it.

I think the ideal solution would be to just pass the original audio bit perfect to the DAC and let it handle the upsampling and filtering. In Windows, that would probably mean using WASAPI exclusive mode. The engineers at AKM, ESS or Cirrus Logic are paid well to figure out how to best handle audio signals fed to their chips and the companies have a couple of decades of experience with that stuff, so I would let them worry about the details. Pushing bit perfect audio to the DAC would do exactly that.

On my machine, I use a far lazier approach: Upsample everything to 96 kHz in the OS/driver, use EAPO for EQ and stop thinking about the rest. I checked a couple of sampling rate combinations for input and upsample target and the 96 kHz output looked fine for 44.1 and 48 kHz input, which covers 98% of the material I listen to. Since I'm too lazy to run exclusive mode, there will be some resampling taking place on my PC either way. That's why this solution works best for my use case.
I have a Mac and the Audio MIDI Setup app sets sample rates. Before going further, I did a search and found this page:

OK, then. The resampling done by the Core Audio section of the Mac OS was basically perfect 10 years ago so there’s nothing, but possible upside to it today.
 
All good info in this thread.
Great and easy to read info by @RandomEar and others.
Even Cameron makes sense this time. :)

Conclusion:
If you want the best possible reproduction of the digital content; use fast linear phase filters.

When your DAC is not great at inter-sample-overs (some DACs do have issues) use digital volume control and set that to -3dB or so.

Upsampling using a fast linear phase filter can improve the technical sound quality of DACs that have 'poor' quality (i.e. not fast linear phase) filters or even lack filters, the so-called NonOverSampling DACs (regardless of the used conversion method) as then you are using a proper filter and the 'poor' filter of the DAC itself is doing its poor job way up higher where there is no 'input' signal anyway.
The ultrasonic noise crap of that DAC is now higher up where the chance of it wreaking havoc in the audible band is much lower.

When your DAC already has good linear phase filters upsampling is pointless and is only a way of fooling yourself. For some folks that seems to help them get more enjoyment. Perception is a wonderful thing. Go with it. Placebo can do great things for perception.

Now... why do some people prefer a technically 'poorer performing' DAC is another matter and does matter.
If someone simply 'prefers' slow filters, apodizing filters, minimum phase filters or even no filters at all for whatever reasons they think/believe/know matters and like what they hear then that's fine too. The downside is that usually these folks (the everything matters folks) are quite vocal and vent their opinions but state those as facts.
After all for them it is about 'maximum enjoyment' and if that means less accurate to the waveform that is digitally described in a file (the end result of a digital music recording into a certain format) then they should go for it.
They just should realize that whatever they are listening to is not the 'intended sound' but an altered sound they prefer.
It is not technically better but only 'sounds' better to them (preference for whatever reason).
 
Last edited:
They just should realize that whatever they are listening to is not the 'intended sound' but an altered sound they prefer.
I feel like most R2R DAC enjoys do (and should) admit this to be fair, you'd be pretty insane not to.

Great study all up, really cool stuff in an easy-to-digest way.
 
I feel like most R2R DAC enjoys do (and should) admit this to be fair, you'd be pretty insane not to.
Most folks who enjoy filterless NOS (R2R, DS or whatever other method) firmly believe any 'filter' is 'bad for sound' and because it sounds 'better' to them and they see near perfect square-wave and impulse plots (all illegal signals that do not exist other than in test signals) believe they are hearing a 'more correct' sound because of the lack of filters.

Also the R2R believers (NOS is not the same as R2R they are totally different things) simply believe that this conversion method is 'better' and 'more accurate' than the cheap 'approximating DS DAC chips'.
People buy R2R based converters because they read (and believe) the sound is more 'analog' and 'superior'. It is all based on folklore found on the web.
They buy/use 'NOS' because on their favorite audiophile websites/magazines they read it 'sounds better'. And indeed because of the 'roll-off' from NOS DACs that 'setting' actually sounds different and they may prefer that but that's just preference and won't admit that basically they are listening to a (in a negative way) altered 'intended' signal.

This is all based on misinformation and misunderstanding. Nobody wants to admit that and rather believes their own perception and the 'info' they find on websites that are close to their audio-worldview.
Preference is just preference, can go in all kinds of directions and, in the end, for enjoyment, that's all that counts. Regardless of what the actual reasons for that preference are.

For a lot of ASR members good sound starts with good technical performance.
For 'the listening people' good sound is just preference and they don't care about numbers and plots, just what they hear.

Pick your poison.

In any case the OP showed the technical side of the filters... it was not about the difference conversion methods. This is one of those things that are often completely 'mixed up' but are 2 very different things.
 
Last edited:
There's also https://troll-audio.com/articles/filter-ringing/ . This one focuses on a "not-quite-impulse" :-)
Nice. The captured band-limited pulse at the end is a good example.

And yet, the "real audio sample" which shows the difference between filters the clearest (the 7.5 to 7.8 ms part in Michael Jackson example), looks almost like a synthetic square wave :-)
I'm not going to tell MJ how to make that music ;)

That's somewhat idealized situation, I think. In general, the signal captured by a microphone will be low-passed before sampling, let's say to high sample rate, and then low-passed again before downsampling to 44.1k. So the data points won't necessarily coincide with the "ground truth signal" as it is defined above.

I'd rather say, that first you choose the bandwidth you are interested in and then the "ground truth signal" is a signal within that bandwidth. That bandwidth also determines the sampling rate needed.
Yes, this is idealized. I'm also not an audio production engineer, so my knowledge of what happens in the studio is kinda limited.

Aside from that, something that complicates things even more, is processing, like compression, limitting, etc. I suspect it may modify the samples in such a way, that they no longer look like anything that could ever come out of an ADC. Like the Michael Jackson example mentioned earlier. Was there really almost square signal recorded or is that processing artifact or recording error?
Yes, I purposely ignored everything related to quantization and the effects of any further processing/mastering. I think the example with the ground truth and downsampled signals would be less clear if they deviated significantly from each other.

Do you mean here that's because the passband ripple is non-zero and stopband attenuation is not -inf, or do you mean something else?
Mainly that the filters are not infinitely steep. They already suppress some stuff below Nyquist (not much for the "fast" filters) and let a more significant amount of power pass above it. But yes, the passband and stopband ripple are also something to consider which I didn't mention.

[...]

There's probably not enough high frequency content to create images big enough to visibly affect the reconstructed shape. Here, for example, is 18 kHz tone at -2 dBFS played using Sharp and Slow filter:
View attachment 507041
That may be the case. But the difference between minimum phase slow and fast is quite pronounced, so I was surprised the linear phase pair looked so similar. Maybe the phase shift of the MP filters plays a role in that difference?
 
Fantastic post, thank you. Since pre ringing happens around the Nyquist frequency, can you comment on filters that use thousands or millions of taps? Since they should be innocuous in terms of pre ringing, are they really beneficial in time accuracy?
 
Last edited:
Fantastic post, thank you. Since pre ringing happens around the Nyquist frequency, can you comment on filters that use thousands or millions of taps? Since they should be innocuous in terms of pre ringing, are they really beneficial in time accuracy?
Thanks! You can design filters with a million taps and lots of (pre-)ringing. In that regard more taps just give you more freedom what to optimize: Steepness vs phase vs passband and stopband ripple. If you do not care about the additional delay, more taps will technically be beneficial and will result in a reconstruction which is closer to the theoretical optimum. There might also be a practical limit somewhere, because at some point, numerical accuracy becomes a problem. Not sure where that will be, though.
 
@Antastik , the filter IS the ringing. A longer filter can produce a longer ringing. If the filter is not minimum-phase, it also produces pre-ringing. The ringing frequency is the filter cut-off frequency, which can be anywhere <= Nyquist and has no influence on the ringing as such. Time accuracy (with regard to the input data) is preserved with linear phase filters.
 
INTRO

I've recently encountered a couple of posts about pre-ringing in linear phase reconstruction filters in DACs. Most claim the usual: How they can definitely hear the difference between filters, how that evil pre-ringing makes linear phase much worse and so on. Impressions from uncontrolled, sighted listening tests of course.

In general, the members stating this appear to have misunderstood important aspects of the topic and it’s sometimes apparent that they are not very familiar with the frequency domain, FFTs and related technical details. One major problem is that there's a lot of bullshit about DAC filters on the net. Perpetuated by manufacturer marketing departments, dealers and misinformed or mistaken reviewers, but also by regular people on forums. There's also some good stuff like [1, 2, 3, 4, 5]. A lot of the good stuff focuses on frequency domain analysis, impulse responses and "illegal" signals like impulses, square waves or clipping. Archimago specifically also investigated real music upsampled using foobar2000 + SoX [2]. On ASR, I have also found one practical comparison using an ADC to capture DAC outputs.

For engineers, looking at frequency and impulse response plots is typically enough for an informed decision. But most people are not engineers. And without the specific knowledge about what a frequency response tells you, how to interpret impulse responses and how a Fourier series and the Nyquist frequency are related to all of this, that information might not be helpful or even misleading to non-engineer readers.

This post therefore focuses on time-domain analysis and real audio samples. It will likely not bring any surprises to those familiar with the math behind audio, but is hopefully insightful for those who are not. I also tried to avoid overly technical descriptions in the important parts to keep this write-up helpful to everybody.

I'd like to point out that I'm not a DSP expert, but some fellow forum members definitely are. If you happen to find any errors in this post, feel free to point them out politely.


FILTER DESIGN

The technical stuff. This is mostly for the people familiar with the math. If you are not one of’em, feel free to gloss over this section.

The filters are designed based on parameters (passband freq, stopband freq, etc.) from the attached ESS ES9039Q2M data sheet. I've made some effort to get them reasonably close to the originals, but for a multitude of reasons, they are not identical. The design process was as follows:
  • All filters are created as linear phase FIR filters using the Parks-McClellan design method
  • Minimum phase filters are derived from the linear phase design using the cepstrum method
  • Fast filters use an order of 512 (also referred to as 512 "taps")
  • Slow filters use an order of 128
  • The passband ripple is <0.0005 dB for all filters
  • Fast filters offer a stopband attenuation of 100 dB or better
  • Slow filters offer a stopband attenuation of 90 dB or better
  • Fast filters are designed with an attenuation of less than 0.01 dB @ 20 kHz
  • Slow filters are designed with an attenuation of 2.8 dB @ 19 kHz, which results in 4.3 dB @ 20 kHz
The generated filters look like this:

View attachment 506438
View attachment 506439

The plots show that the filters are not identical to the ESS ones, but qualitatively similar. The fast filters are essentially flat in the audible band up to 20 kHz. For the slow filters, the decline in the frequency response starts around 15 kHz (-0.2 dB, followed by -0.5 dB @ 16 kHz) for input signals with 44.1 kHz. This decline could be audible, depending on the content and the listening level. However, if you are older than about 40, this is likely not a concern for you anymore ;) [6]

The filter delays are 726 µs (linear fast), 181 µs (linear slow), 57 µs (minimum fast) and 43 µs (minimum slow). These numbers are lower than the delays given in the ESS data sheet, but from my understanding the ones from the data sheet represent the total delay through the DAC pipeline, including more than just the filters.

Now that our filters are ready, we need a data pipeline to use them in. It looks like this:
  • Read real audio data from a file (44.1 kHz bit perfect, uncompressed CD rip) or generate a synthetic signal
  • For real audio: Convert to mono by dropping one channel (reduces clutter in plots)
  • Apply 8x upsampling to the audio by inserting zeros between samples, generating a signal with an effective sampling rate of 352.8 kHz
  • Apply one of the specified reconstruction filters to the upsampled audio signal
  • Do some analysis, like measuring the filter delay
  • Plot the results
Clearly, this pipeline is less complex than what happens in a real DAC chip. But it contains everything that is relevant for comparing our reconstruction filters.


SYNTHETIC DATA

I know, I promised to look at real music. But let's start out with a look at synthetic data constructed from a mix of pure sine waves. We sum up four different sines and add amplitude modulation to the highest one to make the signal a bit more interesting. The following plots shows 1 ms of two signals with this sine mix: Our high res "ground truth" and the "downsampled" data points how they would be stored on a CD or in a typical file or stream.

View attachment 506442

The ground truth is what would be captured by a microphone in the recording studio or concert hall: A continuous signal with very high time and amplitude resolution. This is also close to how the signal looks like throughout the mastering process, assuming it is kept at a high sample rate and bit depth. The downsampled signal is what is provided to us as buyers and what we feed into our DACs. After upsampling and applying the reconstruction filter, we want to arrive back as close to the ground truth as possible on the analog output of our DAC.



A couple of things to note while looking at these signals:
  • The downsampled data points are part of the ground truth signal
  • There are peaks and valleys in the ground truth in-between data points of the downsampled signal which are higher/lower than the closest downsampled data points. Or in other words: A straight line connecting the downsampled data points would not be a good representation of the ground truth signal.
  • Since the sampling rate of our downsampled data stream is more than twice the maximum sine frequency contained in it (44.1 kHz vs. 16 kHz), a faithful reconstruction of the ground truth signal is theoretically possible (see Nyquist frequency, we will get back to this)
Let’s now take a first look at how two of our self-made filters perform on this sine wave mix. Comparing the ground truth of our signal with the data reconstructed from the downsampled signal using our two “fast” filters gives the following graph:

View attachment 506443

It’s apparent that our reconstructions are both not perfect, but at least one of them is pretty close. The reason they are not perfect is that the filters don’t totally suppress all signals beyond the Nyquist frequency of 22.05 kHz and they also inflict some suppression on signals below that threshold. In addition, the minimum phase filter has different delays for signal components of different frequency. For filters with a finite number of taps and signals of finite length, you can’t avoid all of these deficiencies at once. All real filters are a compromise.

Looking at the graph a little closer, we can make out some differences between the filters: While the linear phase one follows the ground truth signal closely and deviations are hard to spot, the minimum phase filter is different: Sometimes, it is very close to the ground truth and at other times, it is pretty far away from it. We can also see that even though there's some high frequency 16 kHz content in our signal, there's no hint of pre-ringing visible in this example.

And just for fun, let’s have a look at what the unfiltered, non-oversampled (NOS / sample and hold) output would look like:

View attachment 506444

It should be pretty clear that NOS is not a good approximation of our ground truth signal at all. Contrary to some audiophile beliefs, NOS is the polar opposite of being “true to the original music”. Unless you’re doing some scientific experiments and explicitly need a square wave output or you sufficiently upsample and filter your audio before sending it to your DAC, don’t use NOS mode.

After this excursion into pure tones, let's now switch to actual music.


REAL AUDIO DATA

From now on, the downsampled data will be our "original" audio signal straight from the CD, from which we try to reconstruct something close to the now unknown ground truth. Let's begin with comparing all four reconstruction filters on a 10 ms sample from Eurythmics - Thorn In My Side (plot timestamp “0” starts at 9.445 s in the song):

View attachment 506447

You can see the different filter delays, which make it a bit hard to compare the filters. Let's zoom in more to the left side and compensate for the different delays by aligning the signals:

View attachment 506448

That shows us that all filters deliver pretty similar results on this part of the song. This is actually how most of the audio signal looks, but we will go on the hunt for the critical sections. The minimum phase filters appear to have a bit more overshoot – for example around the 0.6 ms mark. We can punch in even closer and look at the small region from 0.45 to 0.65 ms:

View attachment 506449

The linear phase fast and slow filters perform very similar in this example and are on average much closer to the data points of the original signal than the minimum phase filters. The latter appear to “swing” more in-between samples. There is also a bigger difference between the fast and slow filters for the minimum phase pair. This audio snippet wasn't too hard on our filters. Let’s take a look at a slightly more interesting section around 7.4 ms:

View attachment 506450

There’s a pretty steep gradient in the center of this plot, which is worth taking a closer look at. Let’s zoom in:

View attachment 506451

All filters appear to “wobble around” somewhat between the samples in front of the gradient (7.3-7.4 ms). It’s important to remember that the ground truth between two original audio samples is rarely a straight line. In consequence, the reconstructions all look plausible at first glance. The minimum phase filters do both overshoot towards the end of the gradient around 7.45 ms, though.

The section in front of this gradient is especially tricky for our filters, because all four original samples between 7.3 and 7.4 ms have almost the same amplitude. If we take a closer look at the linear phase reconstruction in that region, we can see that the curve completes one full (inverted) sine wave between samples one and three of the original audio. At 44.1 kHz, the section between three sample points is exactly 45.35 µs long, corresponding to a wave with a frequency of 22.05 kHz. Something we will keep in mind for later. For the minimum phase filter, the curve takes about 50% longer to complete a full wave in this region, corresponding to roughly 14.7 kHz.

Overall, Thorn In My Side didn’t pose much of a challenge to our reconstruction filters. Nonetheless, it allowed us to discover some distinct differences between linear and minimum phase filters and gave us a rough idea of how they perform. Let's now switch to a more difficult track: An excerpt from Michael Jackson – Beat It (timestamp “0” = 72.38 s):

View attachment 506453

Now that is much more interesting! There is big dip around 7.5 ms and some of our filters produce quite a bit of ringing trying to cope with it:

View attachment 506454

Some also reach the clipping threshold at an amplitude of -1.0. It is clear from the plot that all our filters do ring. But the minimum phase ones perform much worse in this case. Let’s take a closer look:

View attachment 506455

Both linear phase filters stay close to the original samples, but there is a bit of a depression at the foot of the gradient. The minimum phase filters “go wild” here, with the fast variety even reaching the clipping threshold just after the gradient. Clipping is always bad, because it introduces distortion and high frequency noise into the output. If we compare the relative change in signal amplitude over the gradient from 7.45 – 7.55 ms, we can calculate that the fast linear phase filter overshoots the original signal by 6.3%, while the fast minimum phase filter does so by 22.3% – more than triple the deviation.

If we take a look at the ringing period again, it comes out to two samples for the two linear phase filters, which corresponds to 22.05 kHz, as seen above. This is the Nyquist frequency for discrete signals with a 44.1 kHz sampling rate – the highest frequency that can be reproduced faithfully (without aliasing). It is also well outside of the audible range for anybody except young children, maybe a handful of lucky teenagers [6] and your dog. Apart from the fact that it is a mathematically valid reconstruction of the signal, the “evil ringing” in that section is therefore also inaudible for the vast majority of listeners. For the minimum phase filter, the ringing period after the gradient comes out to about 2.5 samples or 56.7 µs, which is equivalent to 17.6 kHz and could potentially be audible for young-ish listeners or those with excellent HF hearing up to maybe 35 years of age.

Small note: Our examples work with CD-quality music (44.1 kHz). For 48 kHz material, the potential ringing of linear phase filters will be inaudible for everybody and with "high res audio" (88.2 kHz +), you don't need to worry about ringing at all, regardless of the selected filter – unless you are a bat.

Back to the time domain: In our latest audio snippet, we are looking at a drop in amplitude. A rising edge might look totally different, right? Would be nice to have a direct comparison, wouldn’t it? Luckily, we don’t need to search hours of audio to find a gradient comparable to our above example. We can just flip the original track and process it again using the same settings. The output then looks like this (you can ignore the different runtime, what was 10 ms before is 0 ms now and vice versa):

View attachment 506456

As we can see, the linear phase filters do not care at all in which direction they are applied to the source material. Same result as before, same limited amount of ringing, no surprises. This is due to their symmetrical impulse response and it is a distinct advantage of this type of filter.

For the minimum phase filters, the results look much different: Gone are the heavy ringing and clipping. We are left with a mostly smooth reconstruction of the signal, albeit with significant overshoot at the top end of the gradient. Clearly, minimum phase filters are not symmetrical, which we can also see in their impulse response. As in our non-flipped example, the relative amplitude error over the gradient is also higher for the minimum phase filters compared to the linear phase ones.

There’s an additional perspective of looking at filter symmetry: Our linear phase filters put a lot of weight on the present, and a bit of equal weight on the past and future of the original signal when generating their output. This symmetry gives them the advantage that rising and falling amplitude signals are processed equally. In contrast, our minimum phase filters put some weight on the present, some weight on the past and no weight at all on the future. This asymmetry gives them the advantage of low latency, but comes with some disadvantages in other areas.

This concludes our investigation into real music snippets. It’s worth keeping in mind that I deliberately selected difficult sections of the two songs presented in this post. For the majority of these tracks, the differences between the reconstruction filters are less pronounced than presented here.

What does all this mean for the end-boss of all audiophiles: Pre-ringing?
  • We have seen in our theoretical investigation, that the ringing some see as a defect isn’t one per-se: It is a valid reconstruction of the ground truth signal in-between the stored samples. This is true regardless of the position of said ringing relative to its trigger (like a steep gradient) – it can appear before or after it.
  • We have seen that the ringing frequency for the filter type most often criticized by audiophiles – fast linear phase – is about equal to the Nyquist frequency and thereby inaudible for the vast majority of humans if 44.1 kHz source material is played. For higher sampling rates starting at 48 kHz, it is inaudible for all humans.
  • We have seen that our DAC-like minimum phase filters trade inaudible pre-ringing for lower delay and potentially audible post-ringing, which is also higher in amplitude compared to that of our linear phase filters.


VERDICT & TLDR

In conclusion, pre-ringing is not an effect of concern in actual music. All reconstruction filters can produce ringing under specific circumstances, but the effect typically represents a valid reconstruction of the audio signal. For fast linear phase filters, the ringing frequency for CD-quality audio is already outside of the audible band for nearly everybody except young children and some teenagers. For minimum phase filters, the ringing is typically slightly lower in frequency and potentially audible for a good portion of listeners in case of CD-quality material. Ringing is never an audible concern for high res audio (≥88.2 kHz).

Among the options available on DACs, fast linear phase filters on average deliver the most faithful reconstruction of ANY audio signal. Their symmetrical nature means they do not care about the direction of change in the signal amplitude. Minimum phase filters on average deliver a slightly less faithful reconstruction and e.g. exhibit higher overshoot, but have other advantages like a significantly lower delay.

Impulses and square waves are not music and the impulse response from a data sheet is not intuitive to read for most people. If you are not an engineer, listen to what independent audio engineers explain and don't get scared by bullshitters trying to sell you the next even more expensive piece of equipment you don't need.

Also, don't use NOS mode. It sucks :)
Good article. Enjoyable read and understandable. Thank you.
 
Last edited:
All good info in this thread.
Great and easy to read info by @RandomEar and others.
Even Cameron makes sense this time. :)

Conclusion:
If you want the best possible reproduction of the digital content; use fast linear phase filters.

When your DAC is not great at inter-sample-overs (some DACs do have issues) use digital volume control and set that to -3dB or so.

Upsampling using a fast linear phase filter can improve the technical sound quality of DACs that have 'poor' quality (i.e. not fast linear phase) filters or even lack filters, the so-called NonOverSampling DACs (regardless of the used conversion method) as then you are using a proper filter and the 'poor' filter of the DAC itself is doing its poor job way up higher where there is no 'input' signal anyway.
The ultrasonic noise crap of that DAC is now higher up where the chance of it wreaking havoc in the audible band is much lower.

When your DAC already has good linear phase filters upsampling is pointless and is only a way of fooling yourself. For some folks that seems to help them get more enjoyment. Perception is a wonderful thing. Go with it. Placebo can do great things for perception.

Now... why do some people prefer a technically 'poorer performing' DAC is another matter and does matter.
If someone simply 'prefers' slow filters, apodizing filters, minimum phase filters or even no filters at all for whatever reasons they think/believe/know matters and like what they hear then that's fine too. The downside is that usually these folks (the everything matters folks) are quite vocal and vent their opinions but state those as facts.
After all for them it is about 'maximum enjoyment' and if that means less accurate to the waveform that is digitally described in a file (the end result of a digital music recording into a certain format) then they should go for it.
They just should realize that whatever they are listening to is not the 'intended sound' but an altered sound they prefer.
It is not technically better but only 'sounds' better to them (preference for whatever reason).
Well said.
 
INTRO

I've recently encountered a couple of posts about pre-ringing in linear phase reconstruction filters in DACs. Most claim the usual: How they can definitely hear the difference between filters, how that evil pre-ringing makes linear phase much worse and so on. Impressions from uncontrolled, sighted listening tests of course.

etc
Oh, very nice. And gels perfectly with another discussion we are participating in. Will find time to give that a detailed read later. Going out on my bike now
:)

EDIT:
And now I have. Great read, and helped me to confirm my pre-existing view of linear vs minimum phase filters.

This is not intended to teach anyone to suck eggs, just an explanation of my reasoning in favour of linear phase filters.

I've always chosen linear phase because the result is identical to "naturallly" band limited signals. Such as discontinuous waveforms (eg triangle / square waves etc) which are built from the fourier sequence for those waveforms, but only with harmonics up to the band limit.

Examples of a square wave constructed from harmonics up to 1st, 5th, 11th, and 49th.

K=11 would be the appearance of a 1.8kHz square wave band limited to 20kHz.
1772040599372.png

Ringing (pre and post) is a natural feature of a band limited signal. It is not "created" by a filter - it is there all the time, but "revealed" when higher frequencies are removed by the filter. (Not a perfect description - but as close as I can get)

A linear filter does not change the phase of the ringing, so you see the "natural" pre and post ringing. A minimum phase filter shifts it to mostly post ringing, due to the phase distortion created by the filter. I think audibility of this phase shift is still open to debate.
 
Last edited:
Back
Top Bottom