• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Alternative reconstruction method for PCM upsampling: impulse response, spectrograms, and samples

Well, let’s see that… reconstruct a 19 kHz sine please!
A single sine is the best-case scenario for sinc, I expect measurable distortion from this method on that test. I'll do it though, as soon as I catch some free time.
 
I think for a good comparison, a flat EQd version would be best.

Here's the composite EQ'd in the FFT domain to match sinc's magnitude. Phase structure preserved, only magnitude corrected. I haven't had a chance to A/B on a proper setup yet, curious if anyone hears a remaining difference vs the sinc version.

Spectrally flattened composite — EQ'd in the FFT domain to match sinc's magnitude spectrum per-bin. Verification:
Sub-bass (20-60Hz): -0.02 dB
Bass (60-250Hz): +0.04 dB
Low-mid (250-2kHz): +0.03 dB
High-mid (2-6kHz): -0.00 dB
Presence (6-12kHz): +0.05 dB
Air (12-20kHz): +0.19 dB

https://drive.google.com/file/d/12dgelGk6rmJxT5hJmxNCmQAuTZJEfMiC/view sinc version
https://drive.google.com/file/d/1ehPvD_buNNZWwRayxh_IGUQDgjn9dyY7/view composite flattened
 
IMO the reconstruction method should mirror the sampling method - which is always sinc-based in downsamplers inside the oversampling ADC chips.
That's the mathematically clean position, and I don't disagree, sinc reconstruction is the theoretically exact inverse of the band-limited sampling process. What I'm exploring is whether there's a perceptually useful alternative path, similar to how some DACs use minimum-phase or short filters instead of the "correct" linear-phase sinc, trading theoretical perfection for different temporal behavior.

Not claiming it's better by the standard metric, just seeing if the tradeoff is interesting.
 
A fun exercise :)

foo_abx 2.2.1 report
foobar2000 v2.25.7
2026-03-10 21:30:20

File A: ROYALTY FREE By HilaryDrummer - The Jazz Lounge (Royalty Free Music) - 01 Mellow Mood_composite_4x_flattened.flac
SHA1: c16ae34db66329ab6db72f8cbea36fe00ba942c3
File B: ROYALTY FREE By HilaryDrummer - The Jazz Lounge (Royalty Free Music) - 01 Mellow Mood_sinc_4x.flac
SHA1: 30c48e2a100513ecc51fa5634a613211cff25ab2

Output:
Default : Primary Sound Driver
Crossfading: NO

21:30:20 : Test started.
21:30:47 : Test restarted.
21:30:47 : 00/01
21:31:25 : Test restarted.
21:31:25 : 01/02
21:31:41 : Test restarted.
21:31:41 : 02/03
21:31:56 : Test restarted.
21:31:56 : 03/04
21:32:18 : Test restarted.
21:32:18 : 04/05
21:32:46 : Test restarted.
21:32:46 : 05/06
21:32:59 : Test restarted.
21:32:59 : 06/07
21:33:13 : Test restarted.
21:33:13 : 07/08
21:33:27 : Test restarted.
21:33:27 : 08/09
21:33:38 : Test restarted.
21:33:38 : 09/10
21:33:38 : Test finished.

----------
Total: 9/10
p-value: 0.0107 (1.07%)

-- signature --
0c635fc01092216d4d4b93aedf17b17f3ff48434

DeltaWave PK Metric is -44.6 dBFS, RMS level is well matched (0.03 dB), peak level shows about 0.13 dB difference. Delta of Spectra shows a small difference in amplitude of about 0.5 dB upwards of 6500 Hz.

Subjective: The composite version has some weird sizzle in the piano notes. Sounds unpleasant to me, but is less objectionable at lower volumes.
 
A fun exercise :)

DeltaWave PK Metric is -44.6 dBFS, RMS level is well matched (0.03 dB), peak level shows about 0.13 dB difference. Delta of Spectra shows a small difference in amplitude of about 0.5 dB upwards of 6500 Hz.

Subjective: The composite version has some weird sizzle in the piano notes. Sounds unpleasant to me, but is less objectionable at lower volumes.

Thank you for doing this. Thanks to your feedback and feedback from others, I was able to locate a flaw in the approach and I think I've managed to fix it (the math was right, my implementation skewed it).

The interpolation is correct now and the spectral balance is much closer to flat. I'm preparing new samples and will update.

This exchange has been very worthwhile and helpful for me, I'm not coming from the DSP field originally, and the scrutiny here is exactly what this needed.

I will post fixed samples and plots first, and then I will try to implement your sinc implementation and test with it.

Thanks again.
 
Here are the regenerated plots and samples after fixing the reconstruction algorithm. Unfortunately I have managed to leave some junk in it that skewed everything. The plots, samples and data are correct now. Everything is uploaded to the same folder with the V2- prefix. The old files are left there for reference.
https://drive.google.com/drive/folders/1mpEib3wkGSQMkhKZ-LXbcFBdX5vlj1Z1?usp=sharing

I'm reposting the plots here:

V2 Impulse response (this one is the same):

V2-impulse_response_overlay.png



V2 Spectrogram:

V2-spectrogram_compare.png

V2 Sample level comparison:

V2-sample_level_comparison.png

V2 Spectral balance:


BandHzDelta (dB)
Sub-bass20–60+0.00
Bass60–250-0.00
Low-mid250–2kHz-0.02
High-mid2–6kHz-0.16
Presence6–12kHz-0.68
Air12–20kHz-0.98
Ultrasonic>20kHz+52.32

Output is now spectrally close to sinc through 6 kHz, with a gentle rolloff above (-0.7 dB at presence, -1.0 dB at air). Correlation with sinc output: 0.997.

Thanks again for helping me to straigthen this.
 
Once the high-frequency content is low-pass filtered, you will still fall into the trap of sinc/ringing.

Any attempt to have both no ringing and band limiting is futile because it is going against mathematics.

In my opinion, your method is an attempt to find another way to "fill in" the high-frequency content above the Nyquist, rather than preserving the mirror image (such as "slow" filter).
 
Just looking at the waveforms, all those sharp peaks are a total nightmare. The good news is it's ultrasonics, so not audible, but this looks like truly shocking levels of distortion. When waveforms look like that in the audible band, they also sound like that. Metallic and harsh AF.

What does the impulse response look like when it's filtered back down to 44.1Khz? My gut tells me that like @OHtaru says, this is going to end up not far from where it started.
 
Once the high-frequency content is low-pass filtered, you will still fall into the trap of sinc/ringing.

Any attempt to have both no ringing and band limiting is futile because it is going against mathematics.

In my opinion, your method is an attempt to find another way to "fill in" the high-frequency content above the Nyquist, rather than preserving the mirror image (such as "slow" filter).
Just looking at the waveforms, all those sharp peaks are a total nightmare. The good news is it's ultrasonics, so not audible, but this looks like truly shocking levels of distortion. When waveforms look like that in the audible band, they also sound like that. Metallic and harsh AF.

What does the impulse response look like when it's filtered back down to 44.1Khz? My gut tells me that like @OHtaru says, this is going to end up not far from where it started.
@OHtaru @kemmler3D You're both right, of course. Local polynomial interpolation is not band-limited, and yes, the waveform peaks above 20 kHz are real and wouldn't be there with a sinc kernel. No argument on either point.

But there's method behind this. What I'm trying to establish is whether upsampling can produce something musically useful entirely through reconstruction, no filtering step at all. The output is a 176.4 kHz file where every original 44.1 kHz sample is passed through exactly, and the interpolated samples (now that I fixed the stupid glitch I produced by being careless) between them are computed from local weighted fits using an 8-point stencil. No post-processing, no cleanup.

Why does that matter? Because if the reconstruction itself is good enough to stand on its own, it opens the door to a different kind of signal processing downstream. You'd have a clean, unfiltered, high-rate signal that you can then EQ, room-correct, or run through any DSP chain without fighting a filter that's already baked in. The filtering becomes a choice you make later, not something the upsampler decided for you.

The specific things I'm trying to find out:
- Whether local reconstruction gives tighter time-domain accuracy on transients than a global sinc kernel, particularly for downstream stereo signal adaptation which could matter for processing like room correction, where time-domain precision is relevant

- Whether busy, contested passages reconstruct more naturally when interpolation only looks at the local neighborhood instead of pulling in distant samples

- Whether having a tunable amount of ultrasonic content (bounded by PNC constraints, not just left wild) is a useful design variable

Whether any of this produces a perceptible difference (or anything I can work with) is genuinely the open question, I don't know yet. But these are the stakes, not experimentation for its own sake.

@kemmler3D, I actually already ran the filter-back-to-44.1 kHz test mutiple times while building this. You do fold back as you said, but you do get a small amount of distortion, which is expected, the ultrasonic content that wasn't there in the original folds back in. It's minor, but it's there, and it confirms the output isn't just a reformatted sinc result.

The point of doing reconstruction this way rather than just turning off a sinc filter is that the interpolation method is designed to work without one, it's not a band-limited method with its safety net removed.
 
@OHtaru @kemmler3D You're both right, of course. Local polynomial interpolation is not band-limited, and yes, the waveform peaks above 20 kHz are real and wouldn't be there with a sinc kernel. No argument on either point.

But there's method behind this. What I'm trying to establish is whether upsampling can produce something musically useful entirely through reconstruction, no filtering step at all. The output is a 176.4 kHz file where every original 44.1 kHz sample is passed through exactly, and the interpolated samples (now that I fixed the stupid glitch I produced by being careless) between them are computed from local weighted fits using an 8-point stencil. No post-processing, no cleanup.
I think this is a bit of a misunderstanding. There is no post-processing or cleanup when using a regular reconstruction filter. You zero-pad the original stream and pass it through the filter function, that's it. In practical implementations, this is done in a single step numerically. This is no different from the approach of calculating the points between samples by any other method in one step.

Just because it's called "filter", it doesn't mean this works like an air filter on a vacuum or something, where the filter is optional but the vacuuming is the goal. Filtering is not a post processing step, it's a convolution of the original signal resulting in an upsampled datastream.

Why does that matter? Because if the reconstruction itself is good enough to stand on its own, it opens the door to a different kind of signal processing downstream. You'd have a clean, unfiltered, high-rate signal that you can then EQ, room-correct, or run through any DSP chain without fighting a filter that's already baked in. The filtering becomes a choice you make later, not something the upsampler decided for you.
Again: The filtering is not an optional step, it is the upsampler. It does stand on its own and (in case of sinc / linear phase fast roll-off) is the mathematically most faithful reconstruction of that signal.

The specific things I'm trying to find out:
- Whether local reconstruction gives tighter time-domain accuracy on transients than a global sinc kernel, particularly for downstream stereo signal adaptation which could matter for processing like room correction, where time-domain precision is relevant

- Whether busy, contested passages reconstruct more naturally when interpolation only looks at the local neighborhood instead of pulling in distant samples

- Whether having a tunable amount of ultrasonic content (bounded by PNC constraints, not just left wild) is a useful design variable

Whether any of this produces a perceptible difference (or anything I can work with) is genuinely the open question, I don't know yet. But these are the stakes, not experimentation for its own sake.
Well, based on my last ABX before the update, there was a perceptible difference and not for the better :confused:

I am speculating here, but it could be that the sharp peaks in the output cause some form of artifacting when run through a regular DAC using a regular reconstruction filter. Possibly intersample-overs because the waveform is almost "illegal" in parts, specifically because it's not bandlimited. It sounded sizzely with some weird ultrashort ticks in there.
 
I think this is a bit of a misunderstanding. There is no post-processing or cleanup when using a regular reconstruction filter. You zero-pad the original stream and pass it through the filter function, that's it. In practical implementations, this is done in a single step numerically. This is no different from the approach of calculating the points between samples by any other method in one step.

Just because it's called "filter", it doesn't mean this works like an air filter on a vacuum or something, where the filter is optional but the vacuuming is the goal. Filtering is not a post processing step, it's a convolution of the original signal resulting in an upsampled datastream.


Again: The filtering is not an optional step, it is the upsampler. It does stand on its own and (in case of sinc / linear phase fast roll-off) is the mathematically most faithful reconstruction of that signal.
Ah, this is valuable. You're right, there probably is misunderstanding on my part. I was using "filter" loosely. What I mean is that sinc reconstruction band-limits via a global kernel (well much bigger than mine in any case), and I'm replacing that with a local kernel that doesn't band-limit. Both are actually single-step interpolation, just different kernels and I I was treating band-limiting as filtering, which muddied my framing. Messy terminology on my part.

Well, based on my last ABX before the update, there was a perceptible difference and not for the better :confused:

I am speculating here, but it could be that the sharp peaks in the output cause some form of artifacting when run through a regular DAC using a regular reconstruction filter. Possibly intersample-overs because the waveform is almost "illegal" in parts, specifically because it's not bandlimited. It sounded sizzely with some weird ultrashort ticks in there.
The sizzle and ticks you heard, that could be V1's bug (original samples were being modified, which is fixed in V2), or it could be the DAC reacting to non-bandlimited content as you suggest. I don't hear sizzle on either version on my setup, but different DACs can handle this differently. V2 now passes through every original sample exactly and constrains the interpolated values. I've regenerated the same Jazz Lounge track with V2 if you want to run another ABX, raw output, no EQ flattening, same source file. :
https://drive.google.com/file/d/1QXhnIYL4MlnxUYJm9HViOtdnrrw_4YhM/view v2 sinc version
https://drive.google.com/file/d/1-5qmt6Wt6Ld4H9N-Prckj3Lmgz3Dchzs/view v2 composite version

No flattening this time, V2's spectral balance is close enough that it shouldn't need it
 
Significantly more difficult to hear a difference with V2:
foo_abx 2.2.1 report
foobar2000 v2.25.7
2026-03-11 15:03:42

File A: V2-ROYALTY FREE By HilaryDrummer - The Jazz Lounge (Royalty Free Music) - 01 Mellow Mood_sinc_4x.flac
SHA1: 30c48e2a100513ecc51fa5634a613211cff25ab2
File B: V2-ROYALTY FREE By HilaryDrummer - The Jazz Lounge (Royalty Free Music) - 01 Mellow Mood_composite_4x.flac
SHA1: 43545fc7c2e718e9b5c99071493c52692a21942f

Output:
Default : Primary Sound Driver
Crossfading: NO

15:03:42 : Test started.
15:04:16 : Test restarted.
15:04:16 : 01/01
15:05:23 : Test restarted.
15:05:23 : 02/02
15:06:04 : Test restarted.
15:06:04 : 02/03
15:06:31 : Test restarted.
15:06:31 : 03/04
15:07:13 : Test restarted.
15:07:13 : 04/05
15:08:22 : Test restarted.
15:08:22 : 05/06
15:08:40 : Test restarted.
15:08:40 : 06/07
15:09:22 : Test restarted.
15:09:22 : 06/08
15:09:36 : Test restarted.
15:09:36 : 07/09
15:09:56 : Test restarted.
15:09:56 : 07/10
15:09:56 : Test finished.

----------
Total: 7/10
p-value: 0.1719 (17.19%)

-- signature --
d38fa518bfa49755a1ef883825739054f34c902c

I'm pretty sure the small level difference (0.5 dB above 6.5 kHz, about 1 dB at 18 kHz) is what remains audible. You can see from the correct guesses that my ears were getting tired around trial 7 listening for any clues. I think that there is a tiny bit of added sizzle left in the composite version which is just barely audible, but that might be my imagination. As the PK metric of -60.1 dB (average) and -47 dB (peak) shows, the differences are small overall - even a good loopback recording only hits an average of about -80 dB. The DW Spectrum of Delta is below -83 dB for the full range.

That being said, if it ends up sounding the same but comes with a lot of ultrasonic stuff potentially causing problems like IMD in some setups, why risk using it?
 
Really interesting to follow this thread. I have absolutely zero knowledge but I'm enjoying the discussion nonetheless.
 
Significantly more difficult to hear a difference with V2:


I'm pretty sure the small level difference (0.5 dB above 6.5 kHz, about 1 dB at 18 kHz) is what remains audible. You can see from the correct guesses that my ears were getting tired around trial 7 listening for any clues. I think that there is a tiny bit of added sizzle left in the composite version which is just barely audible, but that might be my imagination. As the PK metric of -60.1 dB (average) and -47 dB (peak) shows, the differences are small overall - even a good loopback recording only hits an average of about -80 dB. The DW Spectrum of Delta is below -83 dB for the full range.

That being said, if it ends up sounding the same but comes with a lot of ultrasonic stuff potentially causing problems like IMD in some setups, why risk using it?


That's not bad result in my eyes and a perfectly fine question:)). 7/10 at p=0.17 is basically where I'd hope to be, close enough that the difference is marginal. The remaining audible cue is probably the ~1 dB rolloff above 6 kHz, which I can probably tighten.

Now about why risk it: well mostly because this potentially opens a door for me to do more with the signal itself. It gives me a different kind of signal to work with downstream, specifically for room correction and spatial processing. Whether that pans out is still an open question, but that's the motivation.

And as always, I'm immensely grateful for your time here.
 
Your sinc result is actually quite poor, btw.
Coefficients are attached.
I have managed to implement @RandomEar sinc lin-fast 512-tap filter as the comparison baseline. Coefficients are natively designed for 8x, so I zero-stuff by 8, convolve, and decimate by 2 to get the 4x output. Should be faithful to the filter's intended operation if I did this right.

Working on previous 19khz file. Does this looks any better now?

composite_vs_sinc.png
 
Here is also sample level comparison plot from before using @RandomEar's filter. Hope I got all that right.

sample_level_comparison.png
 
Back
Top Bottom