• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Upconverting and Resampling Questions

mike7877

Addicted to Fun and Learning
Joined
Aug 5, 2021
Messages
818
Likes
184
Consider below:

1729291361847.png



These are zoomed in sections of a saw wave crossing the 0. Grey is the filtered (low-passed) output

1 is the original (16/48)
2 is the original with twice the horizontal resolution (16/96)
3 is the original with twice the horizontal resolution and twice the vertical resolution (17/96)


When we upsample from 48kHz to 96kHz in a program like foobar2000, does each sample just get stretched to twice as wide, as illustrated in figure 2?

Usually samples don't go from 0 to + 1 to +2 to +3 on both the x and y axis at the same time in digital audio, like my stupid illustrations do (too late, I made 'em)
so let's say you have 0 10, 24, 44, 60 on sample 1, 2, 3, 4, and 5. If you double the amount of samples to 10, is foobar2000 going to interpolate the line of best fit and send 0, 6, 10, 19, 24, 35, 44, 54, 60, 68? Or will there be two 10s, two 24s, two 44s, and two 60s?

Also, when we convert 16 bit audio to 24 bit, are there only samples on 65,536 levels with large gaps between possible levels (with 0 = 0 and 65536=~16,700,000)??
Or does a bunch of interpolation (like above) end up happening?
 
Obligatory link which answers all these questions and more! https://xiph.org/video/vid1.shtml

But the answer is something like your “line of best fit”, except that line is the reconstructed waveform, so is a perfect (if band limited) replica of the original analogue input. The devil is in the details so strongly recommend watching the video and its sequel.
 
Everything is interpolated.

When down-sampling a low-pass filter is applied to prevent aliasing.

Up-sampling doesn't really gain anything since your DAC "connects the dots" and makes a continuously-varying output, and a low-pass filter "smooths" the wave.

When you increase the bit-depth, the least significant bits are filled with zeros. Something like writing your bank balance as $1K, $1,000, $1,000.00 or $1,000.0000, etc. You may have to round when you down-sample but no detail/resolution is gained when you up-sample.

As you may know, FLAC is lossless compression and it's "smart". If you up-sample from 16 to 24 bits the FLAC file will be the same size.

levels with large gaps between possible levels (with 0 = 0 and 65536=~16,700,000)??
The numbers are bigger so numerically the "gaps" are bigger. But as a percentage or decibels, it's the same. When you play the file it's automatically scaled up-or-down to match the bit-depth of your DAC.
 
Everything is interpolated.

When down-sampling a low-pass filter is applied to prevent aliasing.

Up-sampling doesn't really gain anything since your DAC "connects the dots" and makes a continuously-varying output, and a low-pass filter "smooths" the wave.

When you increase the bit-depth, the least significant bits are filled with zeros. Something like writing your bank balance as $1K, $1,000, $1,000.00 or $1,000.0000, etc. You may have to round when you down-sample but no detail/resolution is gained when you up-sample.

As you may know, FLAC is lossless compression and it's "smart". If you up-sample from 16 to 24 bits the FLAC file will be the same size.


The numbers are bigger so numerically the "gaps" are bigger. But as a percentage or decibels, it's the same. When you play the file it's automatically scaled up-or-down to match the bit-depth of your DAC.

To confirm my understanding, if I were to convert a 16 bit WAV file to 24 bits, then convert it to FLAC, that FLAC file would be the same size as if I just made it from the 16 bit WAV file? And this would be because all of the samples would be on only ~65,000 different vertical levels, and this data is what is compressed into the FLAC?
edit: and I don't mean the first 65,000 levels because then the audio would be extremely quiet. 0dB is still 0dB
first level of 16 bit would be like 250th level of 24 bit (16.7m / 65k = 246)
 
Last edited:
When we upsample from 48kHz to 96kHz in a program like foobar2000, does each sample just get stretched to twice as wide, as illustrated in figure 2?
No. That would be considered a zero-order hold.

In a simple case such as this, you would merely insert samples of a value of zero every other sample, and then run the whole shebang through a sinc-ish interpolation filter (usually but not always the FIR kind) to apply the necessary band-limiting. Foobar2000's engine operates in 32-bit or 64-bit float depending on whether it's a 32- or 64-bit build, so that's the kind of precision the samples would have at this point (everything that goes through there gets converted to that first). They then eventually need to be requantized into 2s-complement 16-bit integer if that's what you want, either simply truncated or optionally with dither applied first.

A typical 8X oversampling filter in a DAC consists of three such stages in series. The remainder to whatever the delta-sigma modulator is running at (e.g. 64fs, 128fs, 192fs, 256fs etc.) is then typically bridged in larger steps with more basic filtering.

To confirm my understanding, if I were to convert a 16 bit WAV file to 24 bits, then convert it to FLAC, that FLAC file would be the same size as if I just made it from the 16 bit WAV file?
You can easily try that using Foobar2000's trusty converter.
Here a file of 25,740,635 bytes grew to 25,834,685 bytes, so a very minor difference only.
 
To confirm my understanding, if I were to convert a 16 bit WAV file to 24 bits, then convert it to FLAC, that FLAC file would be the same size as if I just made it from the 16 bit WAV file? And this would be because all of the samples would be on only ~65,000 different vertical levels, and this data is what is compressed into the FLAC?
edit: and I don't mean the first 65,000 levels because then the audio would be extremely quiet. 0dB is still 0dB
first level of 16 bit would be like 250th level of 24 bit (16.7m / 65k = 246)

The file size will increase a bit since every sample will contain an additional 8 bits (which are useless). What really blows up file sizes is uselessly increasing the sample rate.
i.e. going from 44.1 khz to 96 khz or 192 khz.

At 16/44.1 you are storing 44,100 samples per second with each sample requiring 16 bits (2 bytes) of storage or bandwidth.

At 24/96 you are storing 96,000 samples per second with each sample requiring 24 bits (3 bytes) of storage or bandwidth. This will increase the file size quite a bit to store even more useless data.
 
To confirm my understanding, if I were to convert a 16 bit WAV file to 24 bits, then convert it to FLAC, that FLAC file would be the same size as if I just made it from the 16 bit WAV file? And this would be because all of the samples would be on only ~65,000 different vertical levels, and this data is what is compressed into the FLAC?
edit: and I don't mean the first 65,000 levels because then the audio would be extremely quiet. 0dB is still 0dB
first level of 16 bit would be like 250th level of 24 bit (16.7m / 65k = 246)
There would be no point to this bit-depth alteration. Likewise for upsampling. You can't add information to a digital audio file. The file size would increase a little depending on how FLAC actually compresses, but with no effect on audio quality.

At playback time some tricks are used (like oversampling), and others at recording time (like noise shaping and dithering). But unless the file you have is badly made and you know exactly how to fix it, there's no point altering it prior to playback.

Some of the answers in this thread are close enough to correct, but omit detail which you will find in Monty's xiph.org videos - they really are the best way to understand how digital audio really works! In fact I think I'll rewatch them later today, he is such a brilliant and watchable communicator.

Those links again:
 
There would be no point to this bit-depth alteration. Likewise for upsampling. You can't add information to a digital audio file. The file size would increase a little depending on how FLAC actually compresses, but with no effect on audio quality.
In some cases there is a point even if there is no effect on audio quality. I create FIR room corrections filters @ 96 Khz 24 bit and resample everything (up or down depending on source) to 96 Khz 24 bit so it works correctly with the filter. In addition I have a tri-amp system that uses a Motu multi-channel interface which doesn't like to change sample rates on the fly. Resampling everything to the same bit and sample rate solves both issues.
 
Last edited:
...
Usually samples don't go from 0 to + 1 to +2 to +3 on both the x and y axis at the same time in digital audio, like my stupid illustrations do (too late, I made 'em)
so let's say you have 0 10, 24, 44, 60 on sample 1, 2, 3, 4, and 5. If you double the amount of samples to 10, is foobar2000 going to interpolate the line of best fit and send 0, 6, 10, 19, 24, 35, 44, 54, 60, 68? Or will there be two 10s, two 24s, two 44s, and two 60s?
...
Post #4 of this thread illustrates how interpolation in up/over-sampling works.
 
In some cases there is a point even if there is no effect on audio quality. I create FIR room corrections filters @ 96 Khz 24 bit and resample everything (up or down depending on source) to 96 Khz 24 bit so it works correctly with the filter. In addition I have a tri-amp system that uses a Motu multi-channel interface which doesn't like to change sample rates on the fly. Resampling everything to the bit and sample rate solves both issues.
I'm also living shared mode life here. 48 kHz on the PC, 192 kHz on the laptop with a slightly dodgy digital filter. SoX resampler DSP for Foobar FTW. (Though sound stack upsampling as of Windows 10 also is fine.)
 
There would be no point to this bit-depth alteration. Likewise for upsampling. You can't add information to a digital audio file. The file size would increase a little depending on how FLAC actually compresses, but with no effect on audio quality.

At playback time some tricks are used (like oversampling), and others at recording time (like noise shaping and dithering). But unless the file you have is badly made and you know exactly how to fix it, there's no point altering it prior to playback.

Some of the answers in this thread are close enough to correct, but omit detail which you will find in Monty's xiph.org videos - they really are the best way to understand how digital audio really works! In fact I think I'll rewatch them later today, he is such a brilliant and watchable communicator.

Those links again:
Ah, a second video. I liked his delivery too, I just didn't hear anything that answered my question in the first. I'll watch #2.

I get you can't add information. Well, technically I think information is added by the low pass filters which makes the waveform more resemble the original input. If you were to do the very best job interpolating the 65k different levels into 16.7 million different levels, I believe you'd end up with is the identically formed waveform (once low passed). There's obviously no point to this, I was just wondering if it was done. Why? To avoid the process whenever possible, in case of imperfect methods of interpolation. If the closest height (word chosen for ease of use and applicability) of the 16 bit waveform is approximated in the 24 bit space, to me, that's as good as it gets.

I am still wondering if, when double sampling, you get two samples at the same height, or if the additional sample there is interpolated. For the same reason as above. Maybe it's in video 2, I'll watch it tonight
 
In some cases there is a point even if there is no effect on audio quality. I create FIR room corrections filters @ 96 Khz 24 bit and resample everything (up or down depending on source) to 96 Khz 24 bit so it works correctly with the filter. In addition I have a tri-amp system that uses a Motu multi-channel interface which doesn't like to change sample rates on the fly. Resampling everything to the same bit and sample rate solves both issues.
This is partially why I'm wondering - I output in 24 bit all the time, even when playing unmodified 16/44.1

I think when using DSP and (less so) EQ, it's good to switch to 24 bit first - when it's transformed there will be samples at all heights and you get rid of rounding errors (however small). It must be better for phase, especially in combination with 96 or 192kHz. When I eventually get to making my speakers active, I'll definitely be working at 24/192
 
The file size will increase a bit since every sample will contain an additional 8 bits (which are useless). What really blows up file sizes is uselessly increasing the sample rate.
i.e. going from 44.1 khz to 96 khz or 192 khz.

At 16/44.1 you are storing 44,100 samples per second with each sample requiring 16 bits (2 bytes) of storage or bandwidth.

At 24/96 you are storing 96,000 samples per second with each sample requiring 24 bits (3 bytes) of storage or bandwidth. This will increase the file size quite a bit to store even more useless data.

I'm more wondering for outputting at increased bit depth and sample rate. I don't think there's any point to upsampling/increasing bit depth for storage because the information that could be stored by the increased resolution isn't there to begin with.
 
In some cases there is a point even if there is no effect on audio quality. I create FIR room corrections filters @ 96 Khz 24 bit and resample everything (up or down depending on source) to 96 Khz 24 bit so it works correctly with the filter. In addition I have a tri-amp system that uses a Motu multi-channel interface which doesn't like to change sample rates on the fly. Resampling everything to the same bit and sample rate solves both issues.
"Fir" enough but you could just as easily resample at playback time rather than altering the source files.
 
"Fir" enough but you could just as easily resample at playback time rather than altering the source files.

I think he just means his output is always at 24/96
 
This is partially why I'm wondering - I output in 24 bit all the time, even when playing unmodified 16/44.1

I think when using DSP and (less so) EQ, it's good to switch to 24 bit first - when it's transformed there will be samples at all heights and you get rid of rounding errors (however small). It must be better for phase, especially in combination with 96 or 192kHz. When I eventually get to making my speakers active, I'll definitely be working at 24/192
There's no point altering the files though, just have your player of choice resample on the fly to match what your DSP device expects. Or let the DSP device do it internally, which it has to do anyway because they're all floating point internally.
 
I am still wondering if, when double sampling, you get two samples at the same height, or if the additional sample there is interpolated. For the same reason as above. Maybe it's in video 2, I'll watch it tonight
See post #6.
This is partially why I'm wondering - I output in 24 bit all the time, even when playing unmodified 16/44.1

I think when using DSP and (less so) EQ, it's good to switch to 24 bit first - when it's transformed there will be samples at all heights and you get rid of rounding errors (however small).
Perhaps more easily: When doing these things you need extra headroom, upping your dynamic range requirements. Even playback volume normalization (i.e. ReplayGain) counts. Here I'm attenuating a bunch of material to around -15 dBFS peak tops, since RG preamp is set fairly low. That's 2.5 more bits that are technically needed. Of course that's relatively loud material that doesn't exactly make use of most of the 16 bits to begin with, but still.
 
... When we upsample from 48kHz to 96kHz in a program like foobar2000, does each sample just get stretched to twice as wide, as illustrated in figure 2?
...
Or does a bunch of interpolation (like above) end up happening?
In addition to the good answers already provided, I'll add that one thing that helped me understand what's going on is the theoretical perspective. Consider the Shannon-Whittaker reconstruction formula, which provides mathematically perfect reconstruction of the sampled wave. Note that it is not a stair-step but a summation of sinc(t) functions, which is continuous and its 1st derivative is also continuous. This is necessarily so because infinite rates of change cannot happen in the real world.

DACs do not implement this because it is computationally infeasible, since each sample point requires an infinite sum across all sampling points. But since its result is the mathematically perfect ideal, a DAC's goal is to reconstruct that same wave by more efficient means. The algorithms that DACs use for this, such as Delta-Sigma, are essentially more efficient methods to get approximately the same wave. Here "approximately" can be very close, in well engineered DACs it can be so close that the differences are below the level of analog circuitry noise. Practically speaking, they have essentially perfect reconstruction.

The pragmatic answer is that there is no point to upsampling. Many DACs already do it internally anyway as part of their conversion process. No need to waste your disc space or processing time modifying the files, nor even doing it on the fly in real time.
 
Back
Top Bottom