• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

"I can hear them church bells pre-ringing!" - A (mostly) time domain investigation into DAC-like reconstruction filters

RandomEar

Addicted to Fun and Learning
Joined
Feb 14, 2022
Messages
757
Likes
1,914
INTRO

I've recently encountered a couple of posts about pre-ringing in linear phase reconstruction filters in DACs. Most claim the usual: How they can definitely hear the difference between filters, how that evil pre-ringing makes linear phase much worse and so on. Impressions from uncontrolled, sighted listening tests of course.

In general, the members stating this appear to have misunderstood important aspects of the topic and it’s sometimes apparent that they are not very familiar with the frequency domain, FFTs and related technical details. One major problem is that there's a lot of bullshit about DAC filters on the net. Perpetuated by manufacturer marketing departments, dealers and misinformed or mistaken reviewers, but also by regular people on forums. There's also some good stuff like [1, 2, 3, 4, 5]. A lot of the good stuff focuses on frequency domain analysis, impulse responses and "illegal" signals like impulses, square waves or clipping. Archimago specifically also investigated real music upsampled using foobar2000 + SoX [2]. On ASR, I have also found one practical comparison using an ADC to capture DAC outputs.

For engineers, looking at frequency and impulse response plots is typically enough for an informed decision. But most people are not engineers. And without the specific knowledge about what a frequency response tells you, how to interpret impulse responses and how a Fourier series and the Nyquist frequency are related to all of this, that information might not be helpful or even misleading to non-engineer readers.

This post therefore focuses on time-domain analysis and real audio samples. It will likely not bring any surprises to those familiar with the math behind audio, but is hopefully insightful for those who are not. I also tried to avoid overly technical descriptions in the important parts to keep this write-up helpful to everybody.

I'd like to point out that I'm not a DSP expert, but some fellow forum members definitely are. If you happen to find any errors in this post, feel free to point them out politely.


FILTER DESIGN

The technical stuff. This is mostly for the people familiar with the math. If you are not one of’em, feel free to gloss over this section.

The filters are designed based on parameters (passband freq, stopband freq, etc.) from the attached ESS ES9039Q2M data sheet. I've made some effort to get them reasonably close to the originals, but for a multitude of reasons, they are not identical. The design process was as follows:
  • All filters are created as linear phase FIR filters using the Parks-McClellan design method
  • Minimum phase filters are derived from the linear phase design using the cepstrum method
  • Fast filters use an order of 512 (also referred to as 512 "taps")
  • Slow filters use an order of 128
  • The passband ripple is <0.0005 dB for all filters
  • Fast filters offer a stopband attenuation of 100 dB or better
  • Slow filters offer a stopband attenuation of 90 dB or better
  • Fast filters are designed with an attenuation of less than 0.01 dB @ 20 kHz
  • Slow filters are designed with an attenuation of 2.8 dB @ 19 kHz, which results in 4.3 dB @ 20 kHz
The generated filters look like this:

DAC filter comp_all FRs.png

DAC filter comp_all IRs.png


The plots show that the filters are not identical to the ESS ones, but qualitatively similar. The fast filters are essentially flat in the audible band up to 20 kHz. For the slow filters, the decline in the frequency response starts around 15 kHz (-0.2 dB, followed by -0.5 dB @ 16 kHz) for input signals with 44.1 kHz. This decline could be audible, depending on the content and the listening level. However, if you are older than about 40, this is likely not a concern for you anymore ;) [6]

The filter delays are 726 µs (linear fast), 181 µs (linear slow), 57 µs (minimum fast) and 43 µs (minimum slow). These numbers are lower than the delays given in the ESS data sheet, but from my understanding the ones from the data sheet represent the total delay through the DAC pipeline, including more than just the filters.

Now that our filters are ready, we need a data pipeline to use them in. It looks like this:
  • Read real audio data from a file (44.1 kHz bit perfect, uncompressed CD rip) or generate a synthetic signal
  • For real audio: Convert to mono by dropping one channel (reduces clutter in plots)
  • Apply 8x upsampling to the audio by inserting zeros between samples, generating a signal with an effective sampling rate of 352.8 kHz
  • Apply one of the specified reconstruction filters to the upsampled audio signal
  • Do some analysis, like measuring the filter delay
  • Plot the results
Clearly, this pipeline is less complex than what happens in a real DAC chip. But it contains everything that is relevant for comparing our reconstruction filters.


SYNTHETIC DATA

I know, I promised to look at real music. But let's start out with a look at synthetic data constructed from a mix of pure sine waves. We sum up four different sines and add amplitude modulation to the highest one to make the signal a bit more interesting. The following plots shows 1 ms of two signals with this sine mix: Our high res "ground truth" and the "downsampled" data points how they would be stored on a CD or in a typical file or stream.

DAC filter comp_SYNTHETIC_ground truth vs downsampled.png


The ground truth is what would be captured by a microphone in the recording studio or concert hall: A continuous signal with very high time and amplitude resolution. This is also close to how the signal looks like throughout the mastering process, assuming it is kept at a high sample rate and bit depth. The downsampled signal is what is provided to us as buyers and what we feed into our DACs. After upsampling and applying the reconstruction filter, we want to arrive back as close to the ground truth as possible on the analog output of our DAC.

Note: The rest of this post focuses on the sampling rate and ignores the bit depth, floating vs. fixed point formats and quantization errors. You can forget about those for now.

A couple of things to note while looking at these signals:
  • The downsampled data points are part of the ground truth signal
  • There are peaks and valleys in the ground truth in-between data points of the downsampled signal which are higher/lower than the closest downsampled data points. Or in other words: A straight line connecting the downsampled data points would not be a good representation of the ground truth signal.
  • Since the sampling rate of our downsampled data stream is more than twice the maximum sine frequency contained in it (44.1 kHz vs. 16 kHz), a faithful reconstruction of the ground truth signal is theoretically possible (see Nyquist frequency, we will get back to this)
Let’s now take a first look at how two of our self-made filters perform on this sine wave mix. Comparing the ground truth of our signal with the data reconstructed from the downsampled signal using our two “fast” filters gives the following graph:

DAC filter comp_SYNTHETIC_ground truth vs reconstructed.png


It’s apparent that our reconstructions are both not perfect, but at least one of them is pretty close. The reason they are not perfect is that the filters don’t totally suppress all signals beyond the Nyquist frequency of 22.05 kHz and they also inflict some suppression on signals below that threshold. In addition, the minimum phase filter has different delays for signal components of different frequency. For filters with a finite number of taps and signals of finite length, you can’t avoid all of these deficiencies at once. All real filters are a compromise.

Looking at the graph a little closer, we can make out some differences between the filters: While the linear phase one follows the ground truth signal closely and deviations are hard to spot, the minimum phase filter is different: Sometimes, it is very close to the ground truth and at other times, it is pretty far away from it. We can also see that even though there's some high frequency 16 kHz content in our signal, there's no hint of pre-ringing visible in this example.

And just for fun, let’s have a look at what the unfiltered, non-oversampled (NOS / sample and hold) output would look like:

DAC filter comp_SYNTHETIC_NOS vs filter.png


It should be pretty clear that NOS is not a good approximation of our ground truth signal at all. Contrary to some audiophile beliefs, NOS is the polar opposite of being “true to the original music”. Unless you’re doing some scientific experiments and explicitly need a square wave output or you sufficiently upsample and filter your audio before sending it to your DAC, don’t use NOS mode.

After this excursion into pure tones, let's now switch to actual music.


REAL AUDIO DATA

From now on, the downsampled data will be our "original" audio signal straight from the CD, from which we try to reconstruct something close to the now unknown ground truth. Let's begin with comparing all four reconstruction filters on a 10 ms sample from Eurythmics - Thorn In My Side (plot timestamp “0” starts at 9.445 s in the song):

DAC filter comp_Eurythmics_all filters vs original.png


You can see the different filter delays, which make it a bit hard to compare the filters. Let's zoom in more to the left side and compensate for the different delays by aligning the signals:

DAC filter comp_Eurythmics_all filters_0-1.png


That shows us that all filters deliver pretty similar results on this part of the song. This is actually how most of the audio signal looks, but we will go on the hunt for the critical sections. The minimum phase filters appear to have a bit more overshoot – for example around the 0.6 ms mark. We can punch in even closer and look at the small region from 0.45 to 0.65 ms:

DAC filter comp_Eurythmics_all filters short_0.45-0.65.png


The linear phase fast and slow filters perform very similar in this example and are on average much closer to the data points of the original signal than the minimum phase filters. The latter appear to “swing” more in-between samples. There is also a bigger difference between the fast and slow filters for the minimum phase pair. This audio snippet wasn't too hard on our filters. Let’s take a look at a slightly more interesting section around 7.4 ms:

DAC filter comp_Eurythmics_all filters_6.9-7.9.png


There’s a pretty steep gradient in the center of this plot, which is worth taking a closer look at. Let’s zoom in:

DAC filter comp_Eurythmics_all filters short_7.3-7.5.png


All filters appear to “wobble around” somewhat between the samples in front of the gradient (7.3-7.4 ms). It’s important to remember that the ground truth between two original audio samples is rarely a straight line. In consequence, the reconstructions all look plausible at first glance. The minimum phase filters do both overshoot towards the end of the gradient around 7.45 ms, though.

The section in front of this gradient is especially tricky for our filters, because all four original samples between 7.3 and 7.4 ms have almost the same amplitude. If we take a closer look at the linear phase reconstruction in that region, we can see that the curve completes one full (inverted) sine wave between samples one and three of the original audio. At 44.1 kHz, the section between three sample points is exactly 45.35 µs long, corresponding to a wave with a frequency of 22.05 kHz. Something we will keep in mind for later. For the minimum phase filter, the curve takes about 50% longer to complete a full wave in this region, corresponding to roughly 14.7 kHz.

Overall, Thorn In My Side didn’t pose much of a challenge to our reconstruction filters. Nonetheless, it allowed us to discover some distinct differences between linear and minimum phase filters and gave us a rough idea of how they perform. Let's now switch to a more difficult track: An excerpt from Michael Jackson – Beat It (timestamp “0” = 72.38 s):

DAC filter comp_MJ_full snippet RAW.png


Now that is much more interesting! There is big dip around 7.5 ms and some of our filters produce quite a bit of ringing trying to cope with it:

DAC filter comp_MJ_all filters.png


Some also reach the clipping threshold at an amplitude of -1.0. It is clear from the plot that all our filters do ring. But the minimum phase ones perform much worse in this case. Let’s take a closer look:

DAC filter comp_MJ_all filters short_7.45-7.65.png


Both linear phase filters stay close to the original samples, but there is a bit of a depression at the foot of the gradient. The minimum phase filters “go wild” here, with the fast variety even reaching the clipping threshold just after the gradient. Clipping is always bad, because it introduces distortion and high frequency noise into the output. If we compare the relative change in signal amplitude over the gradient from 7.45 – 7.55 ms, we can calculate that the fast linear phase filter overshoots the original signal by 6.3%, while the fast minimum phase filter does so by 22.3% – more than triple the deviation.

If we take a look at the ringing period again, it comes out to two samples for the two linear phase filters, which corresponds to 22.05 kHz, as seen above. This is the Nyquist frequency for discrete signals with a 44.1 kHz sampling rate – the highest frequency that can be reproduced faithfully (without aliasing). It is also well outside of the audible range for anybody except young children, maybe a handful of lucky teenagers [6] and your dog. Apart from the fact that it is a mathematically valid reconstruction of the signal, the “evil ringing” in that section is therefore also inaudible for the vast majority of listeners. For the minimum phase filter, the ringing period after the gradient comes out to about 2.5 samples or 56.7 µs, which is equivalent to 17.6 kHz and could potentially be audible for young-ish listeners or those with excellent HF hearing up to maybe 35 years of age.

Small note: Our examples work with CD-quality music (44.1 kHz). For 48 kHz material, the potential ringing of linear phase filters will be inaudible for everybody and with "high res audio" (88.2 kHz +), you don't need to worry about ringing at all, regardless of the selected filter – unless you are a bat.

Back to the time domain: In our latest audio snippet, we are looking at a drop in amplitude. A rising edge might look totally different, right? Would be nice to have a direct comparison, wouldn’t it? Luckily, we don’t need to search hours of audio to find a gradient comparable to our above example. We can just flip the original track and process it again using the same settings. The output then looks like this (you can ignore the different runtime, what was 10 ms before is 0 ms now and vice versa):

DAC filter comp_MJ_all filters short_7.45-7.65_FLIPPED.png


As we can see, the linear phase filters do not care at all in which direction they are applied to the source material. Same result as before, same limited amount of ringing, no surprises. This is due to their symmetrical impulse response and it is a distinct advantage of this type of filter.

For the minimum phase filters, the results look much different: Gone are the heavy ringing and clipping. We are left with a mostly smooth reconstruction of the signal, albeit with significant overshoot at the top end of the gradient. Clearly, minimum phase filters are not symmetrical, which we can also see in their impulse response. As in our non-flipped example, the relative amplitude error over the gradient is also higher for the minimum phase filters compared to the linear phase ones.

There’s an additional perspective of looking at filter symmetry: Our linear phase filters put a lot of weight on the present, and a bit of equal weight on the past and future of the original signal when generating their output. This symmetry gives them the advantage that rising and falling amplitude signals are processed equally. In contrast, our minimum phase filters put some weight on the present, some weight on the past and no weight at all on the future. This asymmetry gives them the advantage of low latency, but comes with some disadvantages in other areas.

This concludes our investigation into real music snippets. It’s worth keeping in mind that I deliberately selected difficult sections of the two songs presented in this post. For the majority of these tracks, the differences between the reconstruction filters are less pronounced than presented here.

What does all this mean for the end-boss of all audiophiles: Pre-ringing?
  • We have seen in our theoretical investigation, that the ringing some see as a defect isn’t one per-se: It is a valid reconstruction of the ground truth signal in-between the stored samples. This is true regardless of the position of said ringing relative to its trigger (like a steep gradient) – it can appear before or after it.
  • We have seen that the ringing frequency for the filter type most often criticized by audiophiles – fast linear phase – is about equal to the Nyquist frequency and thereby inaudible for the vast majority of humans if 44.1 kHz source material is played. For higher sampling rates starting at 48 kHz, it is inaudible for all humans.
  • We have seen that our DAC-like minimum phase filters trade inaudible pre-ringing for lower delay and potentially audible post-ringing, which is also higher in amplitude compared to that of our linear phase filters.


VERDICT & TLDR

In conclusion, pre-ringing is not an effect of concern in actual music. All reconstruction filters can produce ringing under specific circumstances, but the effect typically represents a valid reconstruction of the audio signal. For fast linear phase filters, the ringing frequency for CD-quality audio is already outside of the audible band for nearly everybody except young children and some teenagers. For minimum phase filters, the ringing is typically slightly lower in frequency and potentially audible for a good portion of listeners in case of CD-quality material. Ringing is never an audible concern for high res audio (≥88.2 kHz).

Among the options available on DACs, fast linear phase filters on average deliver the most faithful reconstruction of ANY audio signal. Their symmetrical nature means they do not care about the direction of change in the signal amplitude. Minimum phase filters on average deliver a slightly less faithful reconstruction and e.g. exhibit higher overshoot, but have other advantages like a significantly lower delay.

Impulses and square waves are not music and the impulse response from a data sheet is not intuitive to read for most people. If you are not an engineer, listen to what independent audio engineers explain and don't get scared by bullshitters trying to sell you the next even more expensive piece of equipment you don't need.

Also, don't use NOS mode. It sucks :)
 

Attachments

Last edited:
FAQ

  • Why not use a real DAC and measure that?
    I don't have an ADC good enough for the task and I don't intend on buying one just for this. Also, a software-based approach is more flexible and faster to implement.

  • Why did you not use (audio-)software XYZ for this?
    I’m familiar with MATLAB/Octave and it offers superior flexibility in plotting the data.

  • Why are you using 8x upsampling?
    It's a reasonable compromise between computational effort and visible effect in the plotted signals.

  • But these filters are not the actual ESS filters!
    Yes, I think I'm pretty forthcoming about this. They are similar and their main properties are close to the filters in certain DACs. But in the end, they are just "DAC-like". I don't have access to the secret filter coefficients ESS uses and the numerical side of the filter design process is not trivial. Both factors put certain limitations on the filters which are difficult to overcome (for me).

  • Your linear phase filters perform pretty similar to each other, why is that?
    I double checked this and could find no error. The FR and IR plots use the same filter as the processed audio snippets and they look 100% fine. If you have any idea, drop it in a post below.

  • Your minimum phase filters suck! The ones in my DAC are far better!
    I'm open to suggestions. Show us where they perform better and how they are designed differently.

  • You didn't even use High Res audio!
    That would mostly defeat the point of reconstruction filters. They are applied after upsampling. If there is no or very little upsampling required, the filters would have no or very little effect on the output. But we want to see their worst case effect.

  • How did you select the songs? Why didn't you use song XYZ?
    I used what was available to me and what offered decent high frequency content. The latter is handy in making filter differences more obvious.

  • Is there a hidden message encoded in the song selection?
    Maybe.

  • But I like filter XYZ the most!
    OK.

  • I can definitely hear that pre-ringing!!!
    As this analysis should have demonstrated, you claim to hear something which is inaudible in actual music unless you’re dog (no shame) or a young child. So... I don't know what to tell you, bud.

  • Linear phase is the devil!!!
    I disagree.
 
This is impressive and sure will lead to a long thread without any conclusion. At least for me, practical conclusions that I can follow help. But yeah, this is a science forum after all.
 
Wonderful, and the tldr, I'm relieved, is to use Sharp ;) When you say delay, is that for studio use or something related to sound quality?
 
I would think studio use , direct monitoring while you sing and play can be annoying with delay :)

Stellar work by OP .

It show us the basics for how this works. I expect the actual filters in DAC’s to actually perform even better .
 
Worth pointing out that the reconstructed signal “clipping” digital full scale is not necessarily a defect and is perfectly permissible by the sampling theorem; whether DACs have headroom for this varies, but they really ought to.
 
For a good filter: Just look at period time of the pre-ringing : < 50usec. So above 20Khz. Always in the transition band.
A not so good filter could have pre-ringing below 20Khz..
(by the way these time domain pics of impulse responses in the OP cant be right, time scale is off?) -> edit: my bad they look ok
 
Last edited:
Wonderful, and the tldr, I'm relieved, is to use Sharp ;) When you say delay, is that for studio use or something related to sound quality?
Mostly relevant for studio use and audio production, I would think. It's less than 1 ms after all. Shouldn't be relevant for normal listening or even somewhat time critical stuff like gaming.
 
(by the way these time domain pics of impulse responses in the OP cant be right, time scale is off?) -> edit: my bad they look ok
Yeah, IR plot scale is ms but delay numbers are in μs. Can be misleading at first glance. Next time I'll use μs in the plots, too ;)
 
Excellent detailed exposition and clear conclusion. Bravo!
 
My question addressed to iask.ai (Ask AI):

Please consider a digital audio recording mastered up to 0dB. Let assume a steep filter was used during mastering when downsampling to the target 44.1kHz sample rate. It is known that very fast signal changes can lead to ringing at Nyquist rate 22.05kHz with such steep filters. Can this ringing, caused by clipped source recording, affect audio band by means of intermodulation distortion?

Ask AI answer Conclusion part:

Conclusion​

Yes, ringing produced by steep anti-aliasing filters during downsampling—especially when fed by clipped source material—can indeed lead to audible intermodulation distortion via nonlinearities in playback equipment or acoustic transducers. Although these artifacts originate outside the intended audio band (at or near Nyquist), their interaction within real-world systems creates spurious tones well within human hearing range.
This effect has been documented extensively in authoritative printed books on audio engineering and psychoacoustics as well as academic literature on digital signal processing.

Can Ringing from Steep Filters and Clipped Sources Cause Audio-Band Intermodulation Distortion?​

Introduction​

When mastering digital audio, it is common to maximize loudness by pushing the signal close to 0 dBFS (decibels relative to full scale), sometimes resulting in clipped waveforms. If such a recording is then downsampled using a steep low-pass filter (to avoid aliasing at the target sample rate, e.g., 44.1 kHz for CD audio), this process can introduce “ringing” artifacts at or near the Nyquist frequency (22.05 kHz). The question is whether this high-frequency ringing, especially when originating from clipped or otherwise non-sinusoidal sources, can lead to intermodulation distortion (IMD) that affects the audible frequency range.
This issue sits at the intersection of digital signal processing theory, psychoacoustics, and practical mastering engineering. To answer comprehensively, we must consider:
  • The nature of ringing from steep filters,
  • The spectral content of clipped signals,
  • Mechanisms of intermodulation distortion,
  • How these interact in real-world playback systems.

1. Ringing and Steep Digital Filters

What Is Ringing?​

Ringing refers to oscillatory artifacts introduced by sharp transitions in a signal processed through a filter with a very steep cutoff—especially those used during sample rate conversion[1][2]. In digital audio, brick-wall filters are often implemented as linear-phase finite impulse response (FIR) filters. These preserve phase but spread out transients in time (“pre-ringing” and “post-ringing”) because of the filter’s impulse response[3].

Why Does Steep Filtering Cause Ringing?​

The sharper (steeper) the filter’s transition band, the longer its impulse response must be. This is a direct consequence of the time-frequency uncertainty principle: perfect frequency selectivity requires infinite duration in time[4][5]. Thus, sudden changes—like those found in clipped waveforms—excite this long impulse response, producing oscillations at or near the cutoff frequency (the Nyquist frequency for anti-aliasing filters)[6].

2. Clipping and High-Frequency Content

Spectral Effects of Clipping​

Clipping is a nonlinear process that flattens peaks above a certain threshold. Mathematically, it introduces strong harmonics and high-frequency energy into the spectrum[7][8]. For example, hard clipping a sine wave produces odd harmonics; if these harmonics exceed half the sampling rate (fs/2), they must be removed by an anti-aliasing filter during downsampling[9].

Interaction with Steep Filters​

When such high-frequency content hits a steep low-pass filter during downsampling, much of it is abruptly cut off—leading to pronounced ringing at frequencies near Nyquist[10].

3. Intermodulation Distortion: Theory and Mechanisms

What Is Intermodulation Distortion?​

IMD occurs when two or more frequencies interact within a nonlinear system to produce new frequencies equal to sums and differences of integer multiples of the original frequencies (f1±f2, 2f1±f2, etc.)[11][12]. Unlike harmonic distortion (which only creates integer multiples of one tone), IMD can create spurious tones throughout the spectrum—including within the audible band even if both original tones are ultrasonic.

Where Does IMD Occur?​

IMD does not occur inside linear processes like ideal digital filtering or D/A conversion itself[13]. However:
  • Playback electronics: Analog amplifiers, DAC output stages, headphones/speakers all exhibit some degree of nonlinearity.
  • Acoustic transducers: Tweeters especially may generate IMD products when driven with strong ultrasonic content.
  • Human hearing: There is evidence that very strong ultrasonic signals can cause “demodulation” effects within the ear itself[14].

4. Can Filter-Induced Ringing Cause Audible IMD?

Step-by-Step Analysis​

a) Generation:
A clipped source contains strong high-frequency components up to and beyond Nyquist.
b) Downsampling:
A steep anti-aliasing filter removes everything above fs/2, but introduces ringing at/near Nyquist due to its long impulse response.
c) Playback:
The reconstructed analog waveform contains significant energy near Nyquist (e.g., ~22 kHz).
d) Nonlinearities:
If playback equipment or speakers are nonlinear—and they almost always are to some extent—the presence of strong near-Nyquist energy can mix with other frequencies present in music (including itself), generating sum/difference tones that fall into the audible range via intermodulation[15][16][17].

Example:​

Suppose ringing produces significant energy at 21 kHz and 22 kHz: IMD product=|22kHz−21kHz|=1kHz This new tone could be heard as an artifact not present in the original material.

Empirical Evidence​

Several authoritative sources confirm this mechanism:
  • Bob Katz notes that excessive limiting/clipping followed by steep filtering can result in "splatter"—audible artifacts caused by intermodulation between ultrasonic ringing components and lower-frequency program material within playback systems[18] (PRINT).
  • John Watkinson discusses how anti-aliasing filters' pre/post-ringing can excite nonlinearities downstream—especially problematic with modern "loudness war" mastering practices where clipping is common[19] (PRINT).
  • Floyd Toole explains that tweeter IMD from ultrasonic content can produce audible difference tones even if humans cannot hear pure ultrasonics directly[20] (PRINT).
  • Stanley Lipshitz et al., in their classic works on digital audio theory, warn about "inaudible" signals causing audible problems through intermodulation once they reach real-world hardware[21] (Academic Journal).

5. Summary Table

Process StepLinear?Potential for IMD?
Digital filteringYesNo
D/A conversionMostlyMinimal
Analog output stageNoYes
Speaker/headphoneNoYes

Conclusion​

Yes, ringing produced by steep anti-aliasing filters during downsampling—especially when fed by clipped source material—can indeed lead to audible intermodulation distortion via nonlinearities in playback equipment or acoustic transducers. Although these artifacts originate outside the intended audio band (at or near Nyquist), their interaction within real-world systems creates spurious tones well within human hearing range.
This effect has been documented extensively in authoritative printed books on audio engineering and psychoacoustics as well as academic literature on digital signal processing.

World's Most Authoritative Sources​

  1. Pohlmann, Ken C. Principles of Digital Audio. McGraw-Hill Education. (PRINT)
  2. Zölzer, Udo ed., DAFX: Digital Audio Effects. Wiley & Sons. (PRINT)
  3. Oppenheim, Alan V., Schafer Ronald W., Discrete-Time Signal Processing. Prentice Hall. (PRINT)
  4. Smith III, Julius O., Introduction to Digital Filters. W3K Publishing. (PRINT)
  5. Lyons, Richard G., Understanding Digital Signal Processing. Pearson Education. (PRINT)
  6. Watkinson, John. The Art of Digital Audio. Focal Press/Taylor & Francis Group. (PRINT)
  7. Katz, Bob. Mastering Audio: The Art and the Science. Focal Press/Taylor & Francis Group. (PRINT)
  8. Everest, F.A., Pohlmann K.C., Master Handbook of Acoustics. McGraw-Hill Education. (PRINT)
  9. Rumsey, Francis & McCormick Tim., Sound and Recording: An Introduction. Focal Press/Taylor & Francis Group. (PRINT)
  10. Borwick, John ed., Loudspeaker and Headphone Handbook, Focal Press/Taylor & Francis Group. (PRINT)
  11. Toole, Floyd E., Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms. Focal Press/Taylor & Francis Group.(PRINT)
  12. Self Douglas., Audio Power Amplifier Design Handbook, Newnes/Elsevier Science.(PRINT)
  13. Lipshitz S.P.; Vanderkooy J.; Wright E.M., "A Perceptual Evaluation Of Digital Audio Reconstruction Filters," J.Audio Eng.Society Vol 29(1981):126–142.(Academic Journal)
  14. Zwicker Eberhard & Fastl Hugo.Psychoacoustics: Facts and Models, Springer-Verlag.(PRINT)
  15. Hawksford Malcolm J._"Is oversampling beneficial for digital audio?" AES Preprint #2410.(Academic Journal)
  16. Moore Brian C.J.An Introduction To The Psychology Of Hearing, Academic Press.(PRINT)
  17. Moulton David L.Total Recording: The Complete Guide To Audio Production And Engineering, Moulton Laboratories.(PRINT)
  18. Katz Bob.Mastering Audio: The Art And The Science, Focal Press.(PRINT)
  19. Watkinson John.The Art Of Digital Audio, Focal Press.(PRINT)
  20. Toole Floyd E.Sound Reproduction, Focal Press.(PRINT)
  21. Lipshitz S.P.; Vanderkooy J.; Wright E.M."A Perceptual Evaluation Of Digital Audio Reconstruction Filters," J.Audio Eng.Society Vol 29(1981):126–142.(Academic Journal)
 
INTRO

I've recently encountered a couple of posts about pre-ringing in linear phase reconstruction filters in DACs. Most claim the usual: How they can definitely hear the difference between filters, how that evil pre-ringing makes linear phase much worse and so on. Impressions from uncontrolled, sighted listening tests of course.

In general, the members stating this appear to have misunderstood important aspects of the topic and it’s sometimes apparent that they are not very familiar with the frequency domain, FFTs and related technical details. One major problem is that there's a lot of bullshit about DAC filters on the net. Perpetuated by manufacturer marketing departments, dealers and misinformed or mistaken reviewers, but also by regular people on forums. There's also some good stuff like [1, 2, 3, 4, 5]. A lot of the good stuff focuses on frequency domain analysis, impulse responses and "illegal" signals like impulses, square waves or clipping. Archimago specifically also investigated real music upsampled using foobar2000 + SoX [2]. On ASR, I have also found one practical comparison using an ADC to capture DAC outputs.

For engineers, looking at frequency and impulse response plots is typically enough for an informed decision. But most people are not engineers. And without the specific knowledge about what a frequency response tells you, how to interpret impulse responses and how a Fourier series and the Nyquist frequency are related to all of this, that information might not be helpful or even misleading to non-engineer readers.

This post therefore focuses on time-domain analysis and real audio samples. It will likely not bring any surprises to those familiar with the math behind audio, but is hopefully insightful for those who are not. I also tried to avoid overly technical descriptions in the important parts to keep this write-up helpful to everybody.

I'd like to point out that I'm not a DSP expert, but some fellow forum members definitely are. If you happen to find any errors in this post, feel free to point them out politely.


FILTER DESIGN

The technical stuff. This is mostly for the people familiar with the math. If you are not one of’em, feel free to gloss over this section.

The filters are designed based on parameters (passband freq, stopband freq, etc.) from the attached ESS ES9039Q2M data sheet. I've made some effort to get them reasonably close to the originals, but for a multitude of reasons, they are not identical. The design process was as follows:
  • All filters are created as linear phase FIR filters using the Parks-McClellan design method
  • Minimum phase filters are derived from the linear phase design using the cepstrum method
  • Fast filters use an order of 512 (also referred to as 512 "taps")
  • Slow filters use an order of 128
  • The passband ripple is <0.0005 dB for all filters
  • Fast filters offer a stopband attenuation of 100 dB or better
  • Slow filters offer a stopband attenuation of 90 dB or better
  • Fast filters are designed with an attenuation of less than 0.01 dB @ 20 kHz
  • Slow filters are designed with an attenuation of 2.8 dB @ 19 kHz, which results in 4.3 dB @ 20 kHz
The generated filters look like this:

View attachment 506438
View attachment 506439

The plots show that the filters are not identical to the ESS ones, but qualitatively similar. The fast filters are essentially flat in the audible band up to 20 kHz. For the slow filters, the decline in the frequency response starts around 15 kHz (-0.2 dB, followed by -0.5 dB @ 16 kHz) for input signals with 44.1 kHz. This decline could be audible, depending on the content and the listening level. However, if you are older than about 40, this is likely not a concern for you anymore ;) [6]

The filter delays are 726 µs (linear fast), 181 µs (linear slow), 57 µs (minimum fast) and 43 µs (minimum slow). These numbers are lower than the delays given in the ESS data sheet, but from my understanding the ones from the data sheet represent the total delay through the DAC pipeline, including more than just the filters.

Now that our filters are ready, we need a data pipeline to use them in. It looks like this:
  • Read real audio data from a file (44.1 kHz bit perfect, uncompressed CD rip) or generate a synthetic signal
  • For real audio: Convert to mono by dropping one channel (reduces clutter in plots)
  • Apply 8x upsampling to the audio by inserting zeros between samples, generating a signal with an effective sampling rate of 352.8 kHz
  • Apply one of the specified reconstruction filters to the upsampled audio signal
  • Do some analysis, like measuring the filter delay
  • Plot the results
Clearly, this pipeline is less complex than what happens in a real DAC chip. But it contains everything that is relevant for comparing our reconstruction filters.


SYNTHETIC DATA

I know, I promised to look at real music. But let's start out with a look at synthetic data constructed from a mix of pure sine waves. We sum up four different sines and add amplitude modulation to the highest one to make the signal a bit more interesting. The following plots shows 1 ms of two signals with this sine mix: Our high res "ground truth" and the "downsampled" data points how they would be stored on a CD or in a typical file or stream.

View attachment 506442

The ground truth is what would be captured by a microphone in the recording studio or concert hall: A continuous signal with very high time and amplitude resolution. This is also close to how the signal looks like throughout the mastering process, assuming it is kept at a high sample rate and bit depth. The downsampled signal is what is provided to us as buyers and what we feed into our DACs. After upsampling and applying the reconstruction filter, we want to arrive back as close to the ground truth as possible on the analog output of our DAC.



A couple of things to note while looking at these signals:
  • The downsampled data points are part of the ground truth signal
  • There are peaks and valleys in the ground truth in-between data points of the downsampled signal which are higher/lower than the closest downsampled data points. Or in other words: A straight line connecting the downsampled data points would not be a good representation of the ground truth signal.
  • Since the sampling rate of our downsampled data stream is more than twice the maximum sine frequency contained in it (44.1 kHz vs. 16 kHz), a faithful reconstruction of the ground truth signal is theoretically possible (see Nyquist frequency, we will get back to this)
Let’s now take a first look at how two of our self-made filters perform on this sine wave mix. Comparing the ground truth of our signal with the data reconstructed from the downsampled signal using our two “fast” filters gives the following graph:

View attachment 506443

It’s apparent that our reconstructions are both not perfect, but at least one of them is pretty close. The reason they are not perfect is that the filters don’t totally suppress all signals beyond the Nyquist frequency of 22.05 kHz and they also inflict some suppression on signals below that threshold. In addition, the minimum phase filter has different delays for signal components of different frequency. For filters with a finite number of taps and signals of finite length, you can’t avoid all of these deficiencies at once. All real filters are a compromise.

Looking at the graph a little closer, we can make out some differences between the filters: While the linear phase one follows the ground truth signal closely and deviations are hard to spot, the minimum phase filter is different: Sometimes, it is very close to the ground truth and at other times, it is pretty far away from it. We can also see that even though there's some high frequency 16 kHz content in our signal, there's no hint of pre-ringing visible in this example.

And just for fun, let’s have a look at what the unfiltered, non-oversampled (NOS / sample and hold) output would look like:

View attachment 506444

It should be pretty clear that NOS is not a good approximation of our ground truth signal at all. Contrary to some audiophile beliefs, NOS is the polar opposite of being “true to the original music”. Unless you’re doing some scientific experiments and explicitly need a square wave output or you sufficiently upsample and filter your audio before sending it to your DAC, don’t use NOS mode.

After this excursion into pure tones, let's now switch to actual music.


REAL AUDIO DATA

From now on, the downsampled data will be our "original" audio signal straight from the CD, from which we try to reconstruct something close to the now unknown ground truth. Let's begin with comparing all four reconstruction filters on a 10 ms sample from Eurythmics - Thorn In My Side (plot timestamp “0” starts at 9.445 s in the song):

View attachment 506447

You can see the different filter delays, which make it a bit hard to compare the filters. Let's zoom in more to the left side and compensate for the different delays by aligning the signals:

View attachment 506448

That shows us that all filters deliver pretty similar results on this part of the song. This is actually how most of the audio signal looks, but we will go on the hunt for the critical sections. The minimum phase filters appear to have a bit more overshoot – for example around the 0.6 ms mark. We can punch in even closer and look at the small region from 0.45 to 0.65 ms:

View attachment 506449

The linear phase fast and slow filters perform very similar in this example and are on average much closer to the data points of the original signal than the minimum phase filters. The latter appear to “swing” more in-between samples. There is also a bigger difference between the fast and slow filters for the minimum phase pair. This audio snippet wasn't too hard on our filters. Let’s take a look at a slightly more interesting section around 7.4 ms:

View attachment 506450

There’s a pretty steep gradient in the center of this plot, which is worth taking a closer look at. Let’s zoom in:

View attachment 506451

All filters appear to “wobble around” somewhat between the samples in front of the gradient (7.3-7.4 ms). It’s important to remember that the ground truth between two original audio samples is rarely a straight line. In consequence, the reconstructions all look plausible at first glance. The minimum phase filters do both overshoot towards the end of the gradient around 7.45 ms, though.

The section in front of this gradient is especially tricky for our filters, because all four original samples between 7.3 and 7.4 ms have almost the same amplitude. If we take a closer look at the linear phase reconstruction in that region, we can see that the curve completes one full (inverted) sine wave between samples one and three of the original audio. At 44.1 kHz, the section between three sample points is exactly 45.35 µs long, corresponding to a wave with a frequency of 22.05 kHz. Something we will keep in mind for later. For the minimum phase filter, the curve takes about 50% longer to complete a full wave in this region, corresponding to roughly 14.7 kHz.

Overall, Thorn In My Side didn’t pose much of a challenge to our reconstruction filters. Nonetheless, it allowed us to discover some distinct differences between linear and minimum phase filters and gave us a rough idea of how they perform. Let's now switch to a more difficult track: An excerpt from Michael Jackson – Beat It (timestamp “0” = 72.38 s):

View attachment 506453

Now that is much more interesting! There is big dip around 7.5 ms and some of our filters produce quite a bit of ringing trying to cope with it:

View attachment 506454

Some also reach the clipping threshold at an amplitude of -1.0. It is clear from the plot that all our filters do ring. But the minimum phase ones perform much worse in this case. Let’s take a closer look:

View attachment 506455

Both linear phase filters stay close to the original samples, but there is a bit of a depression at the foot of the gradient. The minimum phase filters “go wild” here, with the fast variety even reaching the clipping threshold just after the gradient. Clipping is always bad, because it introduces distortion and high frequency noise into the output. If we compare the relative change in signal amplitude over the gradient from 7.45 – 7.55 ms, we can calculate that the fast linear phase filter overshoots the original signal by 6.3%, while the fast minimum phase filter does so by 22.3% – more than triple the deviation.

If we take a look at the ringing period again, it comes out to two samples for the two linear phase filters, which corresponds to 22.05 kHz, as seen above. This is the Nyquist frequency for discrete signals with a 44.1 kHz sampling rate – the highest frequency that can be reproduced faithfully (without aliasing). It is also well outside of the audible range for anybody except young children, maybe a handful of lucky teenagers [6] and your dog. Apart from the fact that it is a mathematically valid reconstruction of the signal, the “evil ringing” in that section is therefore also inaudible for the vast majority of listeners. For the minimum phase filter, the ringing period after the gradient comes out to about 2.5 samples or 56.7 µs, which is equivalent to 17.6 kHz and could potentially be audible for young-ish listeners or those with excellent HF hearing up to maybe 35 years of age.

Small note: Our examples work with CD-quality music (44.1 kHz). For 48 kHz material, the potential ringing of linear phase filters will be inaudible for everybody and with "high res audio" (88.2 kHz +), you don't need to worry about ringing at all, regardless of the selected filter – unless you are a bat.

Back to the time domain: In our latest audio snippet, we are looking at a drop in amplitude. A rising edge might look totally different, right? Would be nice to have a direct comparison, wouldn’t it? Luckily, we don’t need to search hours of audio to find a gradient comparable to our above example. We can just flip the original track and process it again using the same settings. The output then looks like this (you can ignore the different runtime, what was 10 ms before is 0 ms now and vice versa):

View attachment 506456

As we can see, the linear phase filters do not care at all in which direction they are applied to the source material. Same result as before, same limited amount of ringing, no surprises. This is due to their symmetrical impulse response and it is a distinct advantage of this type of filter.

For the minimum phase filters, the results look much different: Gone are the heavy ringing and clipping. We are left with a mostly smooth reconstruction of the signal, albeit with significant overshoot at the top end of the gradient. Clearly, minimum phase filters are not symmetrical, which we can also see in their impulse response. As in our non-flipped example, the relative amplitude error over the gradient is also higher for the minimum phase filters compared to the linear phase ones.

There’s an additional perspective of looking at filter symmetry: Our linear phase filters put a lot of weight on the present, and a bit of equal weight on the past and future of the original signal when generating their output. This symmetry gives them the advantage that rising and falling amplitude signals are processed equally. In contrast, our minimum phase filters put some weight on the present, some weight on the past and no weight at all on the future. This asymmetry gives them the advantage of low latency, but comes with some disadvantages in other areas.

This concludes our investigation into real music snippets. It’s worth keeping in mind that I deliberately selected difficult sections of the two songs presented in this post. For the majority of these tracks, the differences between the reconstruction filters are less pronounced than presented here.

What does all this mean for the end-boss of all audiophiles: Pre-ringing?
  • We have seen in our theoretical investigation, that the ringing some see as a defect isn’t one per-se: It is a valid reconstruction of the ground truth signal in-between the stored samples. This is true regardless of the position of said ringing relative to its trigger (like a steep gradient) – it can appear before or after it.
  • We have seen that the ringing frequency for the filter type most often criticized by audiophiles – fast linear phase – is about equal to the Nyquist frequency and thereby inaudible for the vast majority of humans if 44.1 kHz source material is played. For higher sampling rates starting at 48 kHz, it is inaudible for all humans.
  • We have seen that our DAC-like minimum phase filters trade inaudible pre-ringing for lower delay and potentially audible post-ringing, which is also higher in amplitude compared to that of our linear phase filters.


VERDICT & TLDR

In conclusion, pre-ringing is not an effect of concern in actual music. All reconstruction filters can produce ringing under specific circumstances, but the effect typically represents a valid reconstruction of the audio signal. For fast linear phase filters, the ringing frequency for CD-quality audio is already outside of the audible band for nearly everybody except young children and some teenagers. For minimum phase filters, the ringing is typically slightly lower in frequency and potentially audible for a good portion of listeners in case of CD-quality material. Ringing is never an audible concern for high res audio (≥88.2 kHz).

Among the options available on DACs, fast linear phase filters on average deliver the most faithful reconstruction of ANY audio signal. Their symmetrical nature means they do not care about the direction of change in the signal amplitude. Minimum phase filters on average deliver a slightly less faithful reconstruction and e.g. exhibit higher overshoot, but have other advantages like a significantly lower delay.

Impulses and square waves are not music and the impulse response from a data sheet is not intuitive to read for most people. If you are not an engineer, listen to what independent audio engineers explain and don't get scared by bullshitters trying to sell you the next even more expensive piece of equipment you don't need.

Also, don't use NOS mode. It sucks :)
Thanks for this - I will have a long look when I have time to read through and digest. Just from your intro, though, I have a suggestion that it is helpful (and conventional in science papers) to spell out acronyms and abbreviations fully the first time they are used (with the abbreviation in parantheses), e.g. Fast Fourier Transform (FFT), and/or to provide a key at the top or bottom (FFT, Fast Fourier Transform; FIR, Finite Impulse Response; etc). Also, if this is a primer for readers unfamiliar with all the terms, like me, to add "see below" if they are explained later in the text, or use hyperlinks to explanations consistently, which you've helpfully done for some terms, but not for others.

In any case, I appreciate all the work you put into this.
 
My question addressed to iask.ai (Ask AI):

Please consider a digital audio recording mastered up to 0dB. Let assume a steep filter was used during mastering when downsampling to the target 44.1kHz sample rate. It is known that very fast signal changes can lead to ringing at Nyquist rate 22.05kHz with such steep filters. Can this ringing, caused by clipped source recording, affect audio band by means of intermodulation distortion?

Ask AI answer Conclusion part:

Conclusion​

Yes, ringing produced by steep anti-aliasing filters during downsampling—especially when fed by clipped source material—can indeed lead to audible intermodulation distortion via nonlinearities in playback equipment or acoustic transducers. Although these artifacts originate outside the intended audio band (at or near Nyquist), their interaction within real-world systems creates spurious tones well within human hearing range.
This effect has been documented extensively in authoritative printed books on audio engineering and psychoacoustics as well as academic literature on digital signal processing.
This is Aprodising distortion - correct, Bogi ?
 
This is Aprodising distortion - correct, Bogi ?

You have probably the term "apodizing filter" in mind.
Apodizing filter does not necessarily preserve original samples - it re-calculates also the original samples when doing integer ratio resampling and substitutes them with new values if the newly calculated values differ. It can be used to remove or lower some artifacts of previous digital signal processing like unwanted level of pre-ringing by substituting the original impulse response with a new one.
 
First off, I’ve been wondering about this for a while more so in designing FIR crossover filters. So, great, it made no sense that filters would generate tons of ringing when reproducing music.

The single sample impulse test presents a signal that only exists on a test bench. The steepest transient in actual music is several orders of magnitude slower and lower in frequency.

All my music comes from my computer. Would upsampling/interpolating everything to 88.2/96kHz when playing back 44.1kHz files simply move any filter nonsense out of audibility?
 
My question addressed to iask.ai (Ask AI):

Please consider a digital audio recording mastered up to 0dB. Let assume a steep filter was used during mastering when downsampling to the target 44.1kHz sample rate. It is known that very fast signal changes can lead to ringing at Nyquist rate 22.05kHz with such steep filters. Can this ringing, caused by clipped source recording, affect audio band by means of intermodulation distortion?

Ask AI answer Conclusion part:

Conclusion​

Yes, ringing produced by steep anti-aliasing filters during downsampling—especially when fed by clipped source material—can indeed lead to audible intermodulation distortion via nonlinearities in playback equipment or acoustic transducers. Although these artifacts originate outside the intended audio band (at or near Nyquist), their interaction within real-world systems creates spurious tones well within human hearing range.
This effect has been documented extensively in authoritative printed books on audio engineering and psychoacoustics as well as academic literature on digital signal processing.

...
Not quite sure what you want to say. It seems like you are talking about downsampling filters in mastering/production, but this post is about reconstruction filters after upsampling in our DAC at home. Are you hinting at problems with intersample overs?
 
I still think that filter stacking can shut the mouths of illusion audiophiles. The difference made by a single filtering is too small, which gives them room to talk nonsense.

Well, after re-using them repeatedly 100x or 1000x, can you still call the good one bad?

PS: I'm not saying that minimum phase should be abandoned. Its irreplaceable advantage is low latency, but latency is not a problem when it comes to listening to music.
 
Back
Top Bottom