This thread is my attempt to demystify digital audio (or the basic parts of digital audio) for readers that aren't very familiar in this area of digital signal processing. The goal is to help the readers understand how digitized audio works, and thereby dispel some of the unfounded folklore around it.
This opening post is to explain what the (much maligned) sinc reconstruction filter is and how it works. First, let's see what an interpolation process is and why we need it.
When we digitize audio, we take “snapshots” of the audio waveform at discrete sampling points that are equally-spaced in time – and the interval between the adjacent samples is the sampling period. To convert the digitized samples back into a continuous time signal, we'll need to “recreate” the missing parts of the analog waveform between the digital samples. We know the recreated waveform, if it is true to the original, must match exactly to the digital sample at each of the sampling points – we need to somehow “connect the dots” to recover the missing parts. The mathematical term for this connect-the-dots operation is interpolation. Therefore, if we need to reconvert a digitized signal back to a continuous time signal, we need to interpolate. The rest of this post explains how the sinc function works as an interpolator. In the next post I'll compare the sinc function to 2 other interpolators.
Figure 1 shows an example of a short segment of an analog waveform (the original signal) and its digitized samples. To simplify matters, the sampling rate used in this example is normalized to 1 sample per some unit of time. This means the time stamps of the digital samples are at t = n, where n are integers = … -3, -2, -1, 0, 1, 2, 3, …
Figure 2 shows the sinc function. This is normalized form of the sinc function, and we will use this form of the sinc function in this discussion. [The unnormalized version is sinc(t) = sin(t)/t.] The sinc function gives zeros at all integer values of t, except at t = 0 it is 1. [Technically, from the equation shown in Figure 2 the sinc function is undefined at t = 0, as the equation gives 0 divided by 0. However, when t is very very close to 0, the value given by the equation gets very very close to 1. So we'd say as t approaches 0, sin(πt)/(πt) approaches 1. We follow this property and define sinc(0) = 1.] Note that the sinc function gives non-zero values when t is not an integer.
Since in all our sampling points we have integer values of time t, the property of the sinc function that it is zero at all the sampling points except one makes it very convenient for use as an interpolator. As it is shown in Figure 3, if we have a digital signal that has only one non-zero sample, and we scale and time shift a sinc function to match the non-zero sample, this sinc function will automatically pass through all the other samples (which are all zero). This scaled and time shifted sinc function is therefore a valid interpolation for our digital signal with a single non-zero sample.
If we have a digital signal with 2 non-zero samples, we can use 2 sinc functions to separately fit each of the samples (see Figure 4). The property that the sinc function is zero at all but one integer values of time comes in handy again. When we sum the 2 sinc functions together, the resultant sum is a valid interpolation of our signal with the 2 non-zero samples. Each of the sinc function only contributes to fitting its corresponding sample and they don't affect any of the other non-zero samples. The sum of the sinc functions therefore will interpolate all the samples – the 2 non-zero ones and the rest of the zero ones.
We can therefore split any digitized signal into a series of component signals – each component having only a single non-zero sample. The top plot in Figure 5a shows a small segment (11 samples) of our example digitized signal. The time stamps are labeled 0 to 10. Note that this signal started long before time 0 and continues long after time 10, which is to say our example signal actually is much longer than 11 samples. Below the top plot is our signal segment split into its 11 single non-zero sample component signals.
To the right (see Figure 5b), we fit a sinc interpolator to each of these component signals. When we sum up these sinc functions, we will get a function that interpolates the original digitized samples (see top plot in Figure 5b). The dashed curve is the sum of the sinc interpolators, and it is our reconstructed continuous time signal. As seen visually, the reconstructed waveform matches the original continuous time waveform quite precisely. When we have more than a few “active” samples, the “ringing” or oscillations seen in the 1 or 2 non-zero sample cases (Figures 3 and 4) disappears.
This operation of taking the samples one at a time, fit the interpolator function to it, and then sums up each of these interpolators, is convolution.
Several comments:
- It is evident, at least for this example, that the sinc interpolation is a pretty good one. The reconstructed signal is nice and smooth, and has no resemblance to the stair-steps that are often associate with reconstructed digital signals in advertisements. The fit also looks better than a linear interpolation where we connect the samples with straight lines.
- The convolution process shown above is the continuous time equivalent of the convolution process we use in FIR filters. In the FIR filter case, we convolve the input digital signal with a specially crafted (finite) impulse response that is the filter “kernel”. Here we convolve the input with the sinc function that is the interpolator kernel.
- The convolution process of the FIR filter is convolution in the digital domain, i.e. convolving a digital signal (input) with a digital impulse response (convolution kernel). The convolution in this post is a bit different. The input is a series of digital samples, but the convolution kernel is a continuous time function (the sinc function). This seemingly incompatibility between a digital input and a continuous time kernel is resolved mathematically by considering the input as an “impulse train” in continuous time.
(An impulse train is a continuous time signal consisting of a series of impulses at regular intervals. Between the impulses the value of the impulse train is zero. These impulses align with the digital samples, and the “strength” of each impulse is equal to the amplitude of the corresponding digital sample. We'll revisit the concept of the impulse train in post #3 when we show that the discrete time to continuous time signal interpolation/reconstruction process is the same as passing an impulse train through a low pass filter. We'll also see why low pass “filtering” a “digitized signal” will give us its analog reconstruction.) - The method of computing the convolution shown in this example is not an efficient way to compute a convolution. It is, however, easier to understand how the convolution process operates with this method than with the more computationally efficient ones.
- Only a very short segment of 11 samples are shown in the example, but the signal is much much longer. A one second CD quality audio is 44100 samples. There are many samples before and after the shown segment. The sinc function spread out horizontally (in time), and is theoretically infinitely wide. Its magnitude decays at a rate inversely proportional to the horizontal “distance” from its center peak. Therefore, the sample 500 sampling periods before the shown segment (i.e. t = -500) or the sample 500 periods after (i.e. t = 510) will affect the interpolated waveform in the shown segment. We often hear that we need an infinitely long sinc function for the “perfect” interpolation. However, all our digitized signals are quantized. For example, the value of the least significant bit of 16 bit resolution is 1/65536. We therefore don't really need infinitely long sinc functions as it will drop below the quantization noise floor in a finite distance. There are also clever ways (which we aren't getting into) to shape the sinc function to accelerate its decay that will cause very minor degradation to the interpolation accuracy but greatly reduces the length of convolution kernel.
- The discrete time signal to continuous time signal reconstruction method shown here cannot be practically implemented in electrical circuits, and therefore can't be used for an actual D to A converter. However, we can easily see that we can use this method for (integer or non-integer multiple) sampling rate conversion. Most over-sampling D/A converters over-sample in integer multiples of the sampling frequency. They use simpler and more efficient methods.
Last edited: