• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Understanding Upsampling/Interpolation

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
If you look at this Deep convolutional neural networks using perceptual loss, you can see the progress that has been made. Problem will probably be that machine learnings requires adequate examples and maybe it differs a lot on type pf music, synthetic, generated, acoustic etc.

What do you actually expect to change vs. accurate spectral and time domain replication? SRC is not a perceptual process, it is defined precisely in mathematics, and it's not even terrible expensive in the modern world. If an SRC is broken enough to have perceptual loss, it's broken. EOF.

This is nothing like image process at all. NOTHING.
 

TabCam

Active Member
Joined
Feb 16, 2020
Messages
192
Likes
162
What do you actually expect to change vs. accurate spectral and time domain replication? SRC is not a perceptual process, it is defined precisely in mathematics, and it's not even terrible expensive in the modern world. If an SRC is broken enough to have perceptual loss, it's broken. EOF.

This is nothing like image process at all. NOTHING.
That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?

No, there is no image processing issue involved in the ACTUAL process of audio upsampling. No. Imaging works in the spatial domain. Sound works in a peculiar time/frequency domain dictated by how the human cochlea actually is known (tested, verified, and understood) to function. Yes, you have to worry about images and aliases in the signal, that's a completely different thing, which you can see discussed below in some detail.

Image detail requires "making up" information based on SPATIAL cues that must be eliminated (pixelation) and other cues that should be carried through (image edges). The information is spatial in character, and conversion to the frequency domain may be useful as a processing step, but is not the key to understanding the perception of the imagine.

Audio is frequency based. There is one, I repeat ONE issue. Do not muck up the spectrum. If you double the sampling rate, do not add anything, do not take anything away, because there is no detectable feature to be "inserted", barring some very young ears listening. Even if there was, the structure of audio signal creation makes "guessing right" much more difficult.

As a result, the examination of actual audio upsampling is one of the very few things in the audio domain for which least-mean-squares error is actually important. What do you even MEAN "compared to the original"? What original do you have in mind?

With PCM you ***GET*** all of the original in the bandwidth you started with. Your idea does not even fit into the reality of the process.

If you mean fixing the output of a perceptual codec (the equivalent of reducing pixel count in an image, to some poorly equatable extent) then you're arguing about something that has exactly ZERO to do with upsampling, or downsampling.

So, look, don't condescend to me here, by telling me what upsampling means. Yes, I know about both image and audio, and the two problems are simply not the same in any reasonable regard.

Ditto downsampling, by the way.

Here, this is how sampling rate conversion works, and why least mean squares matters for audio. I haven't done the same for imaging, because it's MUCH more in infancy (the work you show is reasonable for images), partially because of the rather substantially different perceptual constraints, and I prefer to work on audio.

So read this for audio sampling rate conversion, covering both upsampling AND downsampling.

https://www.aes-media.org/sections/pnw/pnwrecaps/2016/jjsrc_jan2016/

Scroll up at http://www.aes-media.org/sections/pnw/pnwrecaps/index.htm if you need some updating on how the ear works.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?

Audio resampling is not a perceptual process, there's no detail lost if done properly. Up to about 1/2 of the sampling frequency there's no missing information that needs to be filled-in.

If you're talking about recovering missing frequencies above 1/2Fs then, sure, try the convolutional neural nets or any other interpolation/extrapolation you want. But for normal, every-day audio resampling there's no reason to guess, interpolate, infer, or extrapolate unless your sampling frequency is so low that it can't fully represent the audible frequency range. And by the way, this is the same with image resampling.
 

UliBru

Active Member
Technical Expert
Joined
Jul 10, 2019
Messages
123
Likes
337
I like to share some basics for a better understanding of upsampling:

Let's start with a logsweep signal 10 Hz to 48 kHz @ samplerate 96 kHz with length of 10 seconds. The frequency response looks like
Downsampling96to48.png

The red curve shows the 96 kHz sweep downsampled to 48 kHz = green curve. It should be clear that the downsampled signal cannot contain the frequencies from 24 kHz to 48 kHz (marked area).
The time signal for the two signals looks like this
Downsampling96to48_time.png

It becomes clear that now the high frequency content is simply nulled. By this example the HF content is positioned at the right side. From logic it does not matter if it is separated or mixed with the signal.

The logic also tells us that there is no way to reconstruct the red part from the green part as we simply do not know if e.g. the original sweep has ended at 48 kHz or already at 40 kHz.

Now we try upsampling by filling in zeros between each green sample (zerostuffing). This looks like
Zerostuffing.png

Zooming into the detail reveals a bit more
ZerostuffingZoom.png

Now lets look at the frequency response of the brown zerostuffed signal
Upsampling48to96.png

The chart here displays the frequency axis in linear view. We can see that the right side above 24 kHz is a mirror view of the left side below 24 kHz = aliasing. The right side is not allowed to exist as the original signal = green curve has no information about any frequency content above 24 kHz
Upsampling48to96linearview.png

So obviously we have to take away the frequencies above 24 kHz by a brickwall filter
sincFilter.png

The ideal brickwall filter is a sinc filter of infinite length. In practice shorter filters are applied and there is much discussion about required legth, windowing, linear phase or minimum phase. Anyway the convolution of the brown zerostuffed signal with the sinc filter results in
Upsampling48to96sinc.png

The cyan curve is the 96 kHz upsampling of the 48 kHz downsampling of the original 96 kHz logsweep.
It clearly ends at 24 kHz, there is no information in the downsampled signal which lets reconstruct the upper right part of the red curve.
For frequencies below 24 kHz the convolution of the zerostuffed signal with the sinc filter results in a nearly perfect reconstruction of the time domain signal, here in comparison between zerostuffing and interpolation
Upsampling48to96_time.png

And finally a comparison between original and upsampled curve part example
Comparison.png


I hope it becomes understandable that there is no logic (even no deep convolutional neural network) which allows to reconstruct the frequency content a signal which is lost forever.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
Sure, perception is very different. The math governing resampling is the same, except for the extra dimension.

No, the math is quite different, because you do not preserve spatial frequency content in images, it's a near-meaningless idea (beyond MTF at least), as grating sensation tests show. What you must do is preserve edges, with some control over frequency noise. Sorry to insist, but basic accurate (in frequency) interpolation looks pretty crappy indeed.
 

ElNino

Addicted to Fun and Learning
Joined
Sep 26, 2019
Messages
557
Likes
722
No, the math is quite different

pkane is correct... the math is the same.

Some of your comments in post #25 above suggest that you would benefit from taking a course on the mathematics of signal processing (convolution, etc.).
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
No, the math is quite different, because you do not preserve spatial frequency content in images, it's a near-meaningless idea (beyond MTF at least), as grating sensation tests show. What you must do is preserve edges, with some control over frequency noise. Sorry to insist, but basic accurate (in frequency) interpolation looks pretty crappy indeed.

Maybe true for “pretty” pictures where lossy, perceptually-weighted algorithms are acceptable. Not in most scientific image processing where data preservation is a must. Try to extract a proper star profile from an edge-enhanced, resampled image. Or use a deconvolution algorithm on it, or measure flux fall off due an exoplanet transiting a star. The results will be catastrophic.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
pkane is correct... the math is the same.

Some of your comments in post #25 above suggest that you would benefit from taking a course on the mathematics of signal processing (convolution, etc.).


I'm sorry, I do signal processing for a living, and I have a couple of IEEE awards to show for it.

The math is different BECAUSE THE PERCEPTION IS DIFFERENT. Live with it. Nonlinearities make sense with image interpolation.
They are tragically disastrous for audio interpolation.

These are testable, verifiable facts, and your vile professional insult shall be retracted promptly.

I would suggest that rather than make false professional attacks, you wander back up thread a couple of steps, and read a couple of the tutorials I cited. You might dig up a few of my papers as well, and find the examples of use of both convolution and deconvolution (both fast and numerical), as well as filter design, perceptual analysis of both audio and video, and consider that you may be way off course here.

Furthermore, your intentionally vague professional accusations about "#25" I note are specifically avoiding being specific, so as to further your false professional attack.
 
Last edited:

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
Maybe true for “pretty” pictures where lossy, perceptually-weighted algorithms are acceptable. Not in most scientific image processing where data preservation is a must. Try to extract a proper star profile from an edge-enhanced, resampled image. Or use a deconvolution algorithm on it, or measure flux fall off due an exoplanet transiting a star. The results will be catastrophic.

You're talking about a completely different issue here, that of accurate to Least-mean-squares interpolation. Except for the debate on separable vs. nonseperable filtering, something like deconvolving a telescope image is reasonably similar, BUT image interpolation as discussed by the fellow above is talking about dealing with perceptual issues, not LMS. For images to be VIEWED as opposed to analyzed, preservation of lines and edges, and avoiding any "blockiness" are the key.
 

ElNino

Addicted to Fun and Learning
Joined
Sep 26, 2019
Messages
557
Likes
722
I'm sorry, I do signal processing for a living, and I have a couple of IEEE awards to show for it.

The math is different BECAUSE THE PERCEPTION IS DIFFERENT. Live with it. Nonlinearities make sense with image interpolation.
They are tragically disastrous for audio interpolation.

These are testable, verifiable facts, and your vile professional insult shall be retracted promptly.

I would suggest that rather than make false professional attacks, you wander back up thread a couple of steps, and read a couple of the tutorials I cited. You might dig up a few of my papers as well, and find the examples of use of both convolution and deconvolution (both fast and numerical), as well as filter design, perceptual analysis of both audio and video, and consider that you may be way off course here.

Furthermore, your intentionally vague professional accusations about "#25" I note are specifically avoiding being specific, so as to further your false professional attack.

Sorry, I'm not understanding your response here or why it's so emotionally charged. I didn't say anything that could be construed as a professional attack -- I honestly have no idea who you are, and I don't know anything about your background, but I have studied signal processing at the graduate level, and it simply isn't true that audio and 2D image signal processing are fundamentally different.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
Sorry, I'm not understanding your response here or why it's so emotionally charged. I didn't say anything that could be construed as a professional attack -- I honestly have no idea who you are, and I don't know anything about your background, but I have studied signal processing at the graduate level, and it simply isn't true that audio and 2D image signal processing are fundamentally different.

I do signal processing for a living, I have written papers on various things using, involving, speeding up, etc, convolution and the like, and you turn around and tell me that I am ignorant of a basic subject on which I have written rather extensively, both to theory (less so, I use the processes) and practice (which I have written a lot about).

You choose to make this very serious professional accusation without actually stating any specifics, in a vague, offhand fashion, and then you wonder why I'm offended.

THEN you play the old "emotionally charged" card after you uttered truly horrible professional disparagement.

AND THEN YOU CHANGE THE SUBJECT, or try to. This is not all of image processing, the subject here is UPSAMPLING of images, for viewing. NOT for instrumentation, NOT for sharpening, but for viewing.

Given I have posted a couple of papers on sub-band image coding, you'd think I know something or other about actual use of filters in the real world, yes? Yeah, I do. The subject is upsampling for viewing. It is not all of image processing. So why try to move the goalposts now?

Why? Because you're trying to play "king of the hill".

I suggest you learn something about both audio and video perception, and then maybe you'll see why your entire set of posts are simply confusing the entire issue.

FINALLY you accused me of not understanding basic FIR filters, so please, show, exactly, where I exhibited that? You said it, now either produce or do not.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,198
Likes
16,981
Location
Riverview FL

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.

ElNino

Addicted to Fun and Learning
Joined
Sep 26, 2019
Messages
557
Likes
722
And it shouldn't matter who I am, he should be careful with really nasty accusations and learn not to defend mistakes. Besides, I DID send him a link to a c.v. including the IEEE awards in signal processing. So, he DOES know.

You only emailed me your CV at 4:34pm EST, after I had posted.

Sorry, I’m not interested in engaging with the level of vitriol you’re expressing here.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,267
Likes
4,758
Location
My kitchen or my listening room.
You only emailed me your CV at 4:34pm EST, after I had posted.

Sorry, I’m not interested in engaging with the level of vitriol you’re expressing here.

Really? Color me skeptical. As to "vitriol" the vitriol is 100.00% yours. You picked a fight, and refused to admit you stepped into a pile thereof. In the future, perhaps you should concentrate on the technology rather than on winning an argument.

The dishonest argumentation methods you've used here are rather obvious. You made the "emotional" accusation, then you accuse me of "vitriol", after claiming I don't even know my field.

If in fact you got the CV after you said that, I apologize. It's easy to miss one comment in the midst of all of your defamatory nonsense.
 
Top Bottom