• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Upconverting and Resampling Questions

Perhaps more easily: When doing these things you need extra headroom, upping your dynamic range requirements. Even playback volume normalization (i.e. ReplayGain) counts. Here I'm attenuating a bunch of material to around -15 dBFS peak tops, since RG preamp is set fairly low. That's 2.5 more bits that are technically needed. Of course that's relatively loud material that doesn't exactly make use of most of the 16 bits to begin with, but still.
It is fundamental that you need to do your attenuation before any peaking filters, and also ensure that you don't break 0dB at any point in the filter chain.

But typically this is done by using 56 bit floating point or 32 bit fixed point for all the internal DSP calculations. For example a MiniDSP flex will take your 44.1/16 input, convert it to 96/32 itself, do all the work, then output analog signal. All you have to do is remember to use the built-in attenuation of the MiniDSP to make enough headroom for the rest of your filters. Upsampling to 96/24 before you hand it to the MiniDSP isn't going to hurt, but it's not adding anything either.

If for some specific reason you do need to attenuate 16 bit content before putting the signal into the DSP pipeline then converting to 24 bit before attenuation is of course the only way to do it without loss of info. I'm curious what ReplayGain software you are using that doesn't do all this automatically?
 
I'm more wondering for outputting at increased bit depth and sample rate. I don't think there's any point to upsampling/increasing bit depth for storage because the information that could be stored by the increased resolution isn't there to begin with.

Useless. Don't bother.
 
Consider below:
Input is an amplitude modulated 4 kHz tone, peak at -30 dBFS, in 8-bit and 12 kHz sampling rate. We start without using dither.
(all audio files in attachment)
8_12k.nd.png


Input only bit-extended to 16-bit:
8_12k.16_12k.nd.png


Input only upsampled to 96 kHz:
(Note: of course all the upsampling calculations are done in whatever format SoX is using internally, only the result is then bit-reduced to 8-bit)
8_12k.8_96k.nd.png


Input both bit-extended and upsampled:
8_12k.16_96k.nd.png


And the same but with dither:
8_12k.d.png

8_12k.16_12k.d.png

8_12k.8_96k.d.png

8_12k.16_96k.d.png
 

Attachments

  • bitextend_and_upsample.zip
    74.6 KB · Views: 48
Last edited:
If for some specific reason you do need to attenuate 16 bit content before putting the signal into the DSP pipeline then converting to 24 bit before attenuation is of course the only way to do it without loss of info. I'm curious what ReplayGain software you are using that doesn't do all this automatically?
Technically the Foobar2000 playback chain is 16 bit material --> float32 --> DSP and RG --> conversion to output (24 bit) --> Windows sound stack --> sound driver.

The attenuation is not primarily intended to tackle overs, it just happens that RG doing its job also tends to generate enough headroom in a lot of cases. Now I have a number of classical records with positive album gain, and while "avoid clipping" would do its job, it would also partially undo RG's efforts, so I dialed in some negative preamp. On the bedside-fi setup that's -7.9 dB, which accommodates even the worstcase in my collection (+7.83 dB, peak 1.01something). So that particular recording would match up nearly 1:1, everything else gets attenuated to some degree.
 
I upsample my CD Rips (done with dBpoweramp to FLAC level 6) to 176.4 kHz/24 bits. I use for my upsampled Weiss Saracon, which is a professional sample rate Converter. I use WAV as the output container for my upsamples and then I compress them to FLAC at level 6 with dBpoweramp, same program and FLAC level as the original CD Rips.
They do sound subtlety better, bass is a bit more thight, and high frequencies are more defined and less agressive. And best of all, the digital filter I use. Iown an SMSL D400 PRO that uses the latest AKM chipset, AK 4191 plus AK 4499EX. If I use the Super Slow filter, I get great sound but also some audible Aliasing at Red Book resolution. With upsampled CDs at 176.4/24 and the same Super Slow filter, I get the same smooth sound WITHOUT aliasing.
By the way, with my CD Rips when I upsample with Weiss Saracon and then compressing to FLAC level 6 with dBpoweramp, I DO get bigger sized files.
I took some Pictures while playing CD Rips on my UHD BD player Sony UBP X-800 M2. The bit rate shown is from the actual file, not after decompressing to PCM.
I played two different albums from two different genres as Phil Collins But Seriously (Audio Fidelity Steve Hoffman's remaster from 2010) and the soundtrack for Star Trek The Motion Picture (La La Lan 3 CD edition remixed and remastered version from 2012).
Have a look at the pictures.
IMG_20241109_150817~2.jpg
 
I upsample my CD Rips (done with dBpoweramp to FLAC level 6) to 176.4 kHz/24 bits. I use for my upsampled Weiss Saracon, which is a professional sample rate Converter. I use WAV as the output container for my upsamples and then I compress them to FLAC at level 6 with dBpoweramp, same program and FLAC level as the original CD Rips.
Ît seems kind of dumb to waste disk space on storing upsampled versions when the Foobar2000 SoX resampler DSP takes like 1-3% of CPU in realtime on a 15-year-old Core 2 Duo, but to each their own. I hope you're giving Saracon enough negative gain to reliably avoid overs, it does have a setting for that according to the manual. You could probably set -6.000000 dB and be done with it, 176/24 gives you plenty of extra dynamic range after all (about 54 dB more than 16/44 - obviously that's just on digital level and will be limited by analog DAC noise, but the D400 Pro isn't exactly a slouch either).
With upsampled CDs at 176.4/24 and the same Super Slow filter, I get the same smooth sound WITHOUT aliasing.
That's exactly how these kinds of slow rolloff filters are supposed to be used. Although "Super Slow" is so insanely slow to roll off that I'd almost want to go to 352.8 kHz. 176.4 is about the minimum for that. It's basically the "no filter" option. At 176.4, "Short Slow" may be a good fit.
 
Ît seems kind of dumb to waste disk space on storing upsampled versions when the Foobar2000 SoX resampler DSP takes like 1-3% of CPU in realtime on a 15-year-old Core 2 Duo, but to each their own. I hope you're giving Saracon enough negative gain to reliably avoid overs, it does have a setting for that according to the manual. You could probably set -6.000000 dB and be done with it, 176/24 gives you plenty of extra dynamic range after all (about 54 dB more than 16/44 - obviously that's just on digital level and will be limited by analog DAC noise, but the D400 Pro isn't exactly a slouch either).

That's exactly how these kinds of slow rolloff filters are supposed to be used. Although "Super Slow" is so insanely slow to roll off that I'd almost want to go to 352.8 kHz. 176.4 is about the minimum for that. It's basically the "no filter" option. At 176.4, "Short Slow" may be a good fit.
I don't use a computer to play my audio files, but a Sony UBP X-800 M2 UHD BD player. Calle me old fashioned but I don't want a computer on my audio set Up.
Regarding gain, it's not needed for upsampling Red Book Rips to 176.4/24. I know the option is there, but it's intended for other purposes.
The gain control on Weiss Saracon IS mostly used for DSD to PCM convertion and viceversa. To convert DSD to PCM +6 dB of level increase must be applied, as an average. To convert from PCM to DSD a -6 dB level decrease mut be applied. In a nutshell -6 dBs DSD = 0 dB PCM.
 
Regarding gain, it's not needed for upsampling Red Book Rips to 176.4/24. I know the option is there, but it's intended for other purposes.
Not sure about the "not needed" part... I've seen true peak values exceeding +3dBFS at times.
 
In addition to the good answers already provided, I'll add that one thing that helped me understand what's going on is the theoretical perspective. Consider the Shannon-Whittaker reconstruction formula, which provides mathematically perfect reconstruction of the sampled wave. Note that it is not a stair-step but a summation of sinc(t) functions, which is continuous and its 1st derivative is also continuous. This is necessarily so because infinite rates of change cannot happen in the real world.

DACs do not implement this because it is computationally infeasible, since each sample point requires an infinite sum across all sampling points. But since its result is the mathematically perfect ideal, a DAC's goal is to reconstruct that same wave by more efficient means. The algorithms that DACs use for this, such as Delta-Sigma, are essentially more efficient methods to get approximately the same wave. Here "approximately" can be very close, in well engineered DACs it can be so close that the differences are below the level of analog circuitry noise. Practically speaking, they have essentially perfect reconstruction.

The pragmatic answer is that there is no point to upsampling. Many DACs already do it internally anyway as part of their conversion process. No need to waste your disc space or processing time modifying the files, nor even doing it on the fly in real time.

The DAC chips' constraints are most evident in the maximum slope of the reconstruction filter. This is typically much lower than what can be achieved using a computer. Of course, the ideal of infinity cannot be achieved either way but one can get significantly closer to it. This also comes at the cost of latency since more original samples have to be used to calculate each new sample.

For 44.1 kHz, the steepest filters on most DACs do not attenuate output up to 20 kHz but also only fully attenuate output from 24 kHz upwards. Of course, the whole thing can be moved around 9% lower in frequency if one wants to achieve full attenuation at fs/2 but then output is attenuated beginning around 18.4 kHz.

Overall, upsampling allows for fine-tuning the reconstruction filter. This only makes sense if its cutoff frequency is close to the audible range, i.e. for 44.1 kHz and 48 kHz recordings and only needs to done to 2x. For anything above 44.1kHz or 48 kHz, there is no point really.
 
... For 44.1 kHz, the steepest filters on most DACs do not attenuate output up to 20 kHz but also only fully attenuate output from 24 kHz upwards. ...
This is because aliases introduced by insufficient filtering mirror-image around Nyquist. If we consider the passband to end at 20 kHz and Nyquist is 22,050, the proper filter transition band is 20k to 22.05k. It just so happens that 24.1 kHz exactly doubles the width of the transition band. If we set the stopband at 24.1 kHz instead of 22.05 where it should be, any alias introduced must be at least 20 kHz or higher, which is inaudible. Thus stretching the transition band twice as wide makes the filter easier to implement, and any noise it introduces is inaudible. Engineering-wise it is improper, but it's a pragmatic kludge given the limited processing power for filtering in real-time (during playback).

... Overall, upsampling allows for fine-tuning the reconstruction filter. This only makes sense if its cutoff frequency is close to the audible range, i.e. for 44.1 kHz and 48 kHz recordings and only needs to done to 2x. For anything above 44.1kHz or 48 kHz, there is no point really.
Yes. The above kludge is not necessary at 48 kHz and higher, because at those higher sampling rates the transition band is wide enough.
 
This is because aliases introduced by insufficient filtering mirror-image around Nyquist. If we consider the passband to end at 20 kHz and Nyquist is 22,050, the proper filter transition band is 20k to 22.05k. It just so happens that 24.1 kHz exactly doubles the width of the transition band. If we set the stopband at 24.1 kHz instead of 22.05 where it should be, any alias introduced must be at least 20 kHz or higher, which is inaudible. Thus stretching the transition band twice as wide makes the filter easier to implement, and any noise it introduces is inaudible. Engineering-wise it is improper, but it's a pragmatic kludge given the limited processing power for filtering in real-time (during playback).


Yes. The above kludge is not necessary at 48 kHz and higher, because at those higher sampling rates the transition band is wide enough.

I understand the symmetry of the transition band around fs/2.

However, the images can still produce intermodulation distortion. Also a shallower filter is going to ring more for content below fs/2 possibly leading to more intermodulation distortion.

Edit. Images are not mirrored at fs/2. Images are content mirrored at fs/2.
 
Last edited:
This is because aliases introduced by insufficient filtering mirror-image around Nyquist. If we consider the passband to end at 20 kHz and Nyquist is 22,050, the proper filter transition band is 20k to 22.05k. It just so happens that 24.1 kHz exactly doubles the width of the transition band. If we set the stopband at 24.1 kHz instead of 22.05 where it should be, any alias introduced must be at least 20 kHz or higher, which is inaudible.
AFAIU...
Aliasing happens only in AD conversion (or in downsampling), so the above description applies only to ADCs. DACs on the other hand (or upsampling) produce images, so the consequence of stretching the transition band to 24 kHz should be just a bit more HF noise in the output. The HF noise can potentially cause IMD down the line but that's different from aliasing.

Oh, and I wrote that already in the past:
 
AFAIU...
Aliasing happens only in AD conversion (or in downsampling), so the above description applies only to ADCs. DACs on the other hand (or upsampling) produce images, so the consequence of stretching the transition band to 24 kHz should be just a bit more HF noise in the output. The HF noise can potentially cause IMD down the line but that's different from aliasing.

Oh, and I wrote that already in the past:
A DAC without a proper filter will create frequencies that weren't encoded. Aliases, images, whatever you want to call them. To-may-to, to-mah-to.
 
However, the images can still produce intermodulation distortion. Also a shallower filter is going to ring more for content below fs/2 possibly leading to more intermodulation distortion. ...
True, and that IMD can be around 1-3 kHz where our hearing is very sensitive. But any IMD generated is probably very low level.
 
AFAIU...
Aliasing happens only in AD conversion (or in downsampling), so the above description applies only to ADCs. DACs on the other hand (or upsampling) produce images, so the consequence of stretching the transition band to 24 kHz should be just a bit more HF noise in the output. The HF noise can potentially cause IMD down the line but that's different from aliasing.

Oh, and I wrote that already in the past:
The images between fs/2 and fs are mirrors of the content below fs/2. Their frequency is fi = fs - f for any f below fs/2. Thus, for 44.1kHz the range from 22.05 kHz to 24.1 kHz contains the content from 20 kHz to 22.05 kHz. With the filter, attenuation is increasing as the mirrored frequency decreases.
 
A DAC without a proper filter will create frequencies that weren't encoded. Aliases, images, whatever you want to call them. To-may-to, to-mah-to.
My point is not so much about names but about the description. At least to me the description above strongly implies that 24.1 kHz is some special value that won't affect the audible band (0-20k) while anything more will. But that's not true in DACs. You could have a filter starting at 20k and stopping at 30k and other than potential IMD it wouldn't affect the audible band.
 
My point is not so much about names but about the description. At least to me the description above strongly implies that 24.1 kHz is some special value that won't affect the audible band (0-20k) while anything more will. But that's not true in DACs. You could have a filter starting at 20k and stopping at 30k and other than potential IMD it wouldn't affect the audible band.
Yes that is what I meant, the mirroring across Nyquist applies to both AD and DA. If not, tell me where I can read more.

Yet if it's not, are you saying that it's just a coincidence that most DACs set the stopband for 44.1 kHz sampling at 24.1 kHz, and it just so happens that 22,050 (Nyquist) is exactly half-way between 20 kHz and this stopband? That's hard to believe.
 
Last edited:
Back
Top Bottom