• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Alternative reconstruction method for PCM upsampling: impulse response, spectrograms, and samples

tmilovan

Member
Joined
Mar 9, 2026
Messages
25
Likes
5
**UPDATE: V2 samples and plots posted below (post #27). The original version had a processing error that skewed the spectral balance. Corrected version is flat through 6 kHz with a gentle HF rolloff.**

Hi all,

I've been working on a reconstruction method for PCM upsampling that takes a different approach from sinc-based interpolation. Instead of band-limited reconstruction, it uses locally adaptive polynomial fitting, each inter-sample region gets its own curve based on the surrounding sample geometry. The result is nonlinear and input-dependent, so it doesn't have a fixed impulse response in the traditional sense, so it doesn't have a fixed impulse response in the traditional sense. It works in the time domain — the inter-sample curve is fitted directly from sample values, not derived from a band-limiting assumption.

I'm sharing measurements and listening samples. I'm curious what you see in the data and what concerns you'd have.

Impulse Response Comparison

V2-impulse_response_overlay.png


Difference:
Top: Impulse response. This method (Composite (green)) shows no pre-ringing and faster settling, but a sharper post-impulse dip. Bottom: Amplitude response. Both methods are closely matched through the audible band. Composite does not enforce a band limit, spectral content extends above 22 kHz. This is a design choice, not an oversight (see note on ultrasonic content below).


Spectrogram Comparison

Source: 16-bit/44.1 kHz → 24-bit/176.4 kHz (4× upsampling)

The method generates spectral content above the original 22.05 kHz Nyquist. Sinc does not, by design. The ultrasonic content correlates with the harmonic series of the source material — it's not noise or random artifacts.

V2-spectrogram_compare.png


Note on ultrasonic content: This can be filtered out with a standard low-pass at 22 kHz post-reconstruction, the samples here are unfiltered to better showcase what the reconstruction actually produces. The audible-band behavior is preserved either way.


Spectral Band Analysis (Method vs Sinc)


BandHzDelta (dB)
Sub-bass20–60+0.00
Bass60–250-0.00
Low-mid250–2kHz-0.02
High-mid2–6kHz-0.16
Presence6–12kHz-0.68
Air12–20kHz-0.98
Ultrasonic>20kHz+52.32

Output is spectrally close to sinc through 6 kHz, with a gentle rolloff above (-0.7 dB at presence, -1.0 dB at air). Correlation with sinc output: 0.997
Clipping: 1,503 samples out of 50M+ (0.003%).

Sample-level plots added in V2

V2-sample_level_comparison.png

Black dots are the original 44.1 kHz samples and are now drawn properly.

Listening Samples
Four tracks, all CC-BY licensed. Each folder contains: original 16/44.1, sinc 4×, this method 4×. Look at files prefixed with V2.

https://drive.google.com/drive/folders/1mpEib3wkGSQMkhKZ-LXbcFBdX5vlj1Z1?usp=sharing

I'm not claiming this sounds "better", I'm saying it sounds different, and I'd like to understand why from a measurement perspective. If you hear no difference level-matched, that's a valid and useful result.

What it is
  • Locally adaptive — different reconstruction paths between different pairs of samples
  • Nonlinear overall (though each local reconstruction is a polynomial, so locally smooth)
  • Input-dependent — no fixed impulse response, no fixed frequency response
  • Deterministic — same input always produces same output, no randomness

What it is not
  • Not a filter in the LTI sense — concepts like "minimum phase" or "linear phase" don't directly apply
  • Not AI/ML — no neural networks, no training data
  • Not just EQ on top of sinc — the spectral shaping emerges from the reconstruction weights, not from a separate filter chain

What do you see in the impulse response? What would concern you?

Thanks for reading all this. If any of this caught your eye, even just the IR plot, I'd appreciate your take. And if you're up for it: download, level-match, ABX. Null results are just as valuable as positive ones. Happy to provide any additional measurements the community would find useful.

**This post has been updated: UPDATE: V2 samples and plots are from post #27. The original version had a processing error that skewed the spectral balance. Corrected version is flat through 6 kHz with a gentle HF rolloff.**
 

Attachments

  • impulse_response_overlay.png
    impulse_response_overlay.png
    130.4 KB · Views: 234
  • spectrogram_compare.png
    spectrogram_compare.png
    2.1 MB · Views: 217
Last edited:
Short feedback:
  • Pre-ringing is not an audible concern
  • "Locally adaptive" sounds good, but it's also way less predictable, which is bad
  • The impulse response looks very "choppy" (gut feeling: bad)
  • The impulse response is qualitatively similar to minimum phase slow roll off filters and so is the drooping FR. A non-flat FR should really be left to EQ and not be inherent to a reconstruction filter
  • The ultrasonics may correlate with the audible band, but from looking at the presented graphs, that's probably because it's unsuppressed mirroring (bad)
  • Clearly, there's a drop of 1.35 dB from 20 Hz to 22 kHz (bad), so simple level matching will not work and won't deliver comparable results. You need a multiband EQ to bring the amplitude response back to flat for any fair comparison. Otherwise, there will always be audible differences.
  • The Sinc filter in your comparison is really bad: just -40 dB at 25 kHz is about a factor of 1000 (or 60 dB) less than it should be
I would like to see some sample-level plots of real audio upsampled with this. Show some boring sections, but also some impulse-like sharp gradients (+/-). How does the filter react?
 
Short feedback:
  • Pre-ringing is not an audible concern
Agreed, pre-ringing at -60 dB or below is almost certainly inaudible on its own. The differences between the methods are more complex than pre-ringing alone. I'm not positioning this as a "pre-ringing fix."

  • "Locally adaptive" sounds good, but it's also way less predictable, which is bad
Understood. Predictability is a feature in LTI systems, you know exactly what you're getting for any input. This method trades that for input-dependence. Whether that tradeoff is worth it is the question I'm trying to answer.

  • The impulse response looks very "choppy" (gut feeling: bad)
I see what you mean, it's not the smooth ringing pattern you'd expect from a windowed sinc. The shape is a direct consequence of the polynomial fitting, which doesn't enforce smoothness constraints beyond the local window.

  • The impulse response is qualitatively similar to minimum phase slow roll off filters and so is the drooping FR. A non-flat FR should really be left to EQ and not be inherent to a reconstruction filter
The amplitude shape is superficially similar, but the mechanism is different, a minimum-phase filter has a fixed IR that's the same for every input. This adapts per-gap. You're right that a non-flat FR is a valid criticism, I could post-EQ to flat and the reconstruction differences would still be there. Would a flat-EQ'd comparison be useful to you?"

  • The ultrasonics may correlate with the audible band, but from looking at the presented graphs, that's probably because it's unsuppressed mirroring (bad)
Could be. From a strict signal theory view, anything above Nyquist that wasn't in the source is an artifact. I've noted these can be lowpassed out, the audible-band reconstruction is independent of the ultrasonic content.

  • Clearly, there's a drop of 1.35 dB from 20 Hz to 22 kHz (bad), so simple level matching will not work and won't deliver comparable results. You need a multiband EQ to bring the amplitude response back to flat for any fair comparison. Otherwise, there will always be audible differences.
Yes there is the 1.35 dB tilt, you're right that RMS matching alone doesn't account for the spectral shape difference. A per-band matched comparison would be more rigorous. I'll try to put one together.

  • The Sinc filter in your comparison is really bad: just -40 dB at 25 kHz is about a factor of 1000 (or 60 dB) less than it should be
What filter would you recommend as a proper baseline? I used scipy's resample_poly, if there's a better reference implementation I should be comparing against, I'd like to redo the comparison.

I would like to see some sample-level plots of real audio upsampled with this. Show some boring sections, but also some impulse-like sharp gradients (+/-). How does the filter react?
Good suggestion, I'll put together sample-level plots of a few sections: a sustained note, a sharp transient (snare hit), and a quiet passage. Will post as soon as I catch some time for that.
 
HQ Player comes a couple of polynomial interpolating filters if memory serves.
 
Here are sample-level plots as requested: three sections from the same track (Bill Frisell, "Isfahan"), composite 4× vs sinc 4×. Black dots are the original 44.1 kHz samples. Right column should show the difference (composite/sinc).

Sustained note (top): Nearly identical. RMS difference 0.0014.

Sharp transient (middle): Visibly different reconstruction paths between samples. RMS difference 0.077, peak 0.25. This is where the method diverges most, the polynomial fitting takes a different path through the same sample points than sinc does.

Quiet passage (bottom): Clean I guess. RMS difference 0.0003. No artifacts injected at low signal levels.

The black dots confirm both methods pass through the same original samples, the difference is entirely in how they connect the dots.

UPDATE: overestimated my plotting capabilities, removed the black dots from the plot.
sample_level_comparison.png
 
Last edited:
HQ Player comes a couple of polynomial interpolating filters if memory serves.

Yes, I'm aware HQPlayer does offer polynomial interpolation options alongside it's many other things. I'm not sure of the specifics of it's implementation. The approach here probably differs in that the reconstruction is locally adaptive, different weights for each inter-sample gap based on the surrounding sample geometry, rather than a fixed polynomial interpolation kernel applied uniformly (which I suppose is in HQ player). But you're right in pointing out that that polynomial-based reconstruction is not a new thing.
 
The amplitude shape is superficially similar, but the mechanism is different, a minimum-phase filter has a fixed IR that's the same for every input. This adapts per-gap. You're right that a non-flat FR is a valid criticism, I could post-EQ to flat and the reconstruction differences would still be there. Would a flat-EQ'd comparison be useful to you?"
I think for a good comparison, a flat EQd version would be best.

Could be. From a strict signal theory view, anything above Nyquist that wasn't in the source is an artifact. I've noted these can be lowpassed out, the audible-band reconstruction is independent of the ultrasonic content.
You could lowpass, but the result would look very different. Try it on your "Isfahan" sample.

What filter would you recommend as a proper baseline? I used scipy's resample_poly, if there's a better reference implementation I should be comparing against, I'd like to redo the comparison.
I don't know what's availabe in scipy. I recently designed some filters "by hand" with 512 taps. If you are interested, I could give you the coefficients. They are designed to be applied to 8x upsampled data, though and I'm sure there are more/better standard implementations out there.
 
Here are sample-level plots as requested: three sections from the same track (Bill Frisell, "Isfahan"), composite 4× vs sinc 4×. Black dots are the original 44.1 kHz samples. Right column should show the difference (composite/sinc).

Sustained note (top): Nearly identical. RMS difference 0.0014.

Sharp transient (middle): Visibly different reconstruction paths between samples. RMS difference 0.077, peak 0.25. This is where the method diverges most, the polynomial fitting takes a different path through the same sample points than sinc does.

Quiet passage (bottom): Clean I guess. RMS difference 0.0003. No artifacts injected at low signal levels.

The black dots confirm both methods pass through the same original samples, the difference is entirely in how they connect the dots.
Pretty wild, what's going on there :oops:

sample_level_comparison_crop.png

I've marked the most interesting regions using blue. The composite filter just decides to ignore samples from time to time and do its own thing :D On the left, it doesn't pass through any samples for 100 µs or so. There's also all those small amlost triangular ripples in the center section and those sharp spikes in the rightmost one. That's just odd and looks super wrong. I mean, it's interesting, but it feels overly sharp and edgy for a music signal.
 
I think for a good comparison, a flat EQd version would be best.
Ok, I get that. I'll try to prepare a per-band EQ'd version. Will post when ready.
You could lowpass, but the result would look very different. Try it on your "Isfahan" sample.
Ha, it will probably, I'll try it. If the audible band changes significantly after lowpass at 22 kHz, that's kinda important to know and I'll share the result either way.
I don't know what's availabe in scipy. I recently designed some filters "by hand" with 512 taps. If you are interested, I could give you the coefficients. They are designed to be applied to 8x upsampled data, though and I'm sure there are more/better standard implementations out there.
Yes please, I was using scipy easy route. So yes, I'd like to run a comparison against your 512-tap filters. Even though they're made for 8×, I can try to adapt and test. It might be a much more rigorous baseline than scipy's default.
 
Pretty wild, what's going on there :oops:

View attachment 516697
I've marked the most interesting regions using blue. The composite filter just decides to ignore samples from time to time and do its own thing :D On the left, it doesn't pass through any samples for 100 µs or so. There's also all those small amlost triangular ripples in the center section and those sharp spikes in the rightmost one. That's just odd and looks super wrong. I mean, it's interesting, but it feels overly sharp and edgy for a music signal.
Lol, yes. I messed up plotting the dots. I should use the source file for it, but don't know how to yet:). I've removed them for now and will repost with corrected plots.

The observations you're making about the waveform shape are still valid though, the "triangular ripples" and sharp transitions are real characteristics of the polynomial reconstruction, not plotting artifacts. the correct version is posted already and I'll add the dots again when I figure how to do it properly.
 
Lol, yes. I messed up plotting the dots. I should use the source file for it, but don't know how to yet:). I've removed them for now and will repost with corrected plots.

The observations you're making about the waveform shape are still valid though, the "triangular ripples" and sharp transitions are real characteristics of the polynomial reconstruction, not plotting artifacts. the correct version is posted already and I'll add the dots again when I figure how to do it properly.
Here is the plot with correct original dots. Now that I think about it, the published samples include a norm constraint on the polynomial reconstruction that controls overshoot on transient content. This means the output doesn't pass through the original sample values exactly when norm constraint is engaged, it trades exact sample-passing for controlled inter-sample behavior. The sinc baseline does pass through the original samples. So the dots on the plot represent the original sample values, and you can see the sinc (red) hits them while the composite (green) is close but not exact. This is actually how the method is configured.

UPDATE: fixed in V2, output now passes through original samples.


sample_level_comparison.png
 
Last edited:
Why don't you start with basic measurements? THD, IMD, and most interesting - multitone spectrum.
 
But why? What’s the point of all this?
Curiosity... mostly:)). I wanted to see what happens when you reconstruct PCM without the band-limiting assumption, fit the samples directly in the time domain and let the spectrum be whatever it turns out to be. The norm constraint is there because raw polynomial fitting overshoots on transient content, so it needs to be tamed.

Whether the result is better, worse, or just different from sinc, that's what I'm trying to find out here. So far the measurements say "different," and I don't have a definitive answer yet on whether the differences matter perceptually.
 
Why don't you start with basic measurements? THD, IMD, and most interesting - multitone spectrum.

Suggestion accepted:). A few things to keep in mind though: standard THD and IMD measurements use steady-state sinusoidal signals where sinc reconstruction is provably perfect, any deviation from sinc will show as "worse" THD by definition. So I expect this method to measure worse on those tests, and that's not surprising or controversial.

The multitone spectrum is the more interesting one, that's where the locally adaptive behavior actually diverges from sinc in a meaningful way. I'll try to put one together. Thanks
 
Whether the result is better, worse, or just different from sinc, that's what I'm trying to find out here. So far the measurements say "different,"
That very much depends on the metric to determine that. If you look at what the original signal was, most certainly your reconstruction is worse, not just different.
 
A few things to keep in mind though: standard THD and IMD measurements use steady-state sinusoidal signals where sinc reconstruction is provably perfect
Well, let’s see that… reconstruct a 19 kHz sine please!
 
That very much depends on the metric to determine that. If you look at what the original signal was, most certainly your reconstruction is worse, not just different.
Fair point, by that metric, absolutely. I'm exploring whether there's a perceptually relevant dimension where a different reconstruction path matters, not trying to beat sinc at its own game.
 
Back
Top Bottom