• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Upconverting and Resampling Questions

Not sure about the "not needed" part... I've seen true peak values exceeding +3dBFS at times.
I check EVERY SINGLE CD RIP I do with Audiogate to see the level of the mastering. In addition to that, I only upconvert recordings worthy of being upconverted, like old MoFi CDs or Audio Fidelity 's, which are not mastered "the Loudness Wars way", or some properly mastered Japanese CDs or old CDs from the 80s or early 90s that are mastered at a quite low level.
I may be wasting some SSD Space, but I'm not Dumb.
 
I check EVERY SINGLE CD RIP I do with Audiogate to see the level of the mastering. In addition to that, I only upconvert recordings worthy of being upconverted, like old MoFi CDs or Audio Fidelity 's, which are not mastered "the Loudness Wars way", or some properly mastered Japanese CDs or old CDs from the 80s or early 90s that are mastered at a quite low level.
I may be wasting some SSD Space, but I'm not Dumb.
How is the upsampling that you are doing, any different from the upsampling most DACs do internally, automatically, as it decodes and plays the content?
 
How is the upsampling that you are doing, any different from the upsampling most DACs do internally, automatically, as it decodes and plays the content?
Very simple: what uses a better upsampling algorithm, a 100 $ IC or a 1000 $ upsampling professional software? And believe me, I've been using Weiss Saracon for a long time, and I know how It works, as well as the flat of my hand.
 
Do you think the algorithm makes a difference when upsampling at integer multiples, which is computationally simpler? If so it would be interesting to see measurements or comparisons like we've seen here with the various "sox" settings.

The notion that a $100 IC, or even a $10 IC, can do the job isn't crazy. It just depends on the job. My 40 year old hand-held calculator does arithmetic just as well as my modern expensive desktop PC.
 
There is absolutely no question that software upsamplers can do much better than DAC filters simply by being able to throw a lot more computing power at the problem, instead of being limited to just a hundred-ish taps at best. You just don't see DAC filters with a -170 dB filter ultimate rejection, or super-low passband ripple while reaching ultimate at fs/2.

The question is whether the DAC filters aren't good enough anyway. The sharp one in the AK4191EX obviously involves a cascade of two half-band filters, with passband ripple of ±0.001 dB, a 0.4535fs cutoff, -12 dB at fs/2 and stopband of -150 dB (of which Amir only found about -110) being reached by 0.5465fs, all with a group delay of 41.7 samples. That's really not bad as far as compromise filters go.

BTW, you can compare foo_dsp_resampler (which performs like SoX VHQ or Audacity >=2.0.3) and Weiss Saracon 1.6 among others... guess which one has the better filter? Well, the free software, of course.
 
There is absolutely no question that software upsamplers can do much better than DAC filters simply by being able to throw a lot more computing power at the problem, instead of being limited to just a hundred-ish taps at best. You just don't see DAC filters with a -170 dB filter ultimate rejection, or super-low passband ripple while reaching ultimate at fs/2.

The question is whether the DAC filters aren't good enough anyway. The sharp one in the AK4191EX obviously involves a cascade of two half-band filters, with passband ripple of ±0.001 dB, a 0.4535fs cutoff, -12 dB at fs/2 and stopband of -150 dB (of which Amir only found about -110) being reached by 0.5465fs, all with a group delay of 41.7 samples. That's really not bad as far as compromise filters go.

BTW, you can compare foo_dsp_resampler (which performs like SoX VHQ or Audacity >=2.0.3) and Weiss Saracon 1.6 among others... guess which one has the better filter? Well, the free software, of course.
Great comparison site, exactly what I was asking for. Nice to see that SoX is so well implemented, it's open source and installed by default on many Linux systems.
 
There is absolutely no question that software upsamplers can do much better than DAC filters simply by being able to throw a lot more computing power at the problem, instead of being limited to just a hundred-ish taps at best. You just don't see DAC filters with a -170 dB filter ultimate rejection, or super-low passband ripple while reaching ultimate at fs/2.

The question is whether the DAC filters aren't good enough anyway. The sharp one in the AK4191EX obviously involves a cascade of two half-band filters, with passband ripple of ±0.001 dB, a 0.4535fs cutoff, -12 dB at fs/2 and stopband of -150 dB (of which Amir only found about -110) being reached by 0.5465fs, all with a group delay of 41.7 samples. That's really not bad as far as compromise filters go.

BTW, you can compare foo_dsp_resampler (which performs like SoX VHQ or Audacity >=2.0.3) and Weiss Saracon 1.6 among others... guess which one has the better filter? Well, the free software, of course.
Are you Serious about the performance of the "free" upconverting software versus Weiss Saracon, or are you being sarcastic?
Even the Sony Super Audio CD Center in the US recomends the use of Weiss Saracon to convert native PCM recorded albums to DSD for SACD release.
 
Are you Serious about the performance of the "free" upconverting software versus Weiss Saracon, or are you being sarcastic?
Of course this is serious. Free or low-cost software (open-source or not) does not lag behind in the professionalism and the dedication its creators bring to the table compared to costly commercial software and hardware, quite the contrary.
Even the Sony Super Audio CD Center in the US recomends the use of Weiss Saracon to convert native PCM recorded albums to DSD for SACD release.
That does not mean anything. Weiss does have a (now historic) reputation of course but that alone does not imply their stuff is beyond everything else.
 
Are you Serious about the performance of the "free" upconverting software versus Weiss Saracon, or are you being sarcastic?
You can see it for yourself, can't you? Saracon is by no means bad, but it's using a half-band filter of rather average steepness, more like a decent DAC. (Perhaps that was even done on purpose in order to keep overs in check.) It does seem to be using higher internal precision, so probably float64 instead of float32, but that's about it.

If anything, the correlation between price and resampling quality is negative. There is some high-dollar software that doesn't do well at all. ProTools' filter is just "meh". Sony Vegas is particularly egregious.
 
You can see it for yourself, can't you? Saracon is by no means bad, but it's using a half-band filter of rather average steepness, more like a decent DAC. (Perhaps that was even done on purpose in order to keep overs in check.) It does seem to be using higher internal precision, so probably float64 instead of float32, but that's about it.

If anything, the correlation between price and resampling quality is negative. There is some high-dollar software that doesn't do well at all. ProTools' filter is just "meh". Sony Vegas is particularly egregious.
Regarding Sony Vegas, I haven't used that one in ages. Does It feature DSD to PCM convertion? If so, does It use Sony 's algorithm Super Bit Mapping Direct, that was Sony's first DSD to PCM Converter that was supposed to get the most out of DSD when converting to Red Book resolution, for example to get a DSD mastering down to 44.1/16 for the CD layer of an hybrid SACD?
Super Bit Mapping Direct was developed over 25 years ago, I'm sure there are current DSD to PCM convertion software that performs better than SBM Direct.
I've only tested Audiogate and Saracon for DSD to PCM convertion, and I much prefer the sound of Saracon when converting to PCM at any resolution.
 
I discovered in the data sheet for the ESS ES9039Q2M that it uses 128 coefficients for the 2x FIR and 32 coefficients for the 4x FIR used after it in the 8x oversampling filter. No idea what number of coefficients that corresponds to for a single 8x oversampling filter, though.
Screenshot 2024-11-15 at 01.04.19.png
 
Yes that is what I meant, the mirroring across Nyquist applies to both AD and DA. If not,
I'm not saying that mirroring doesn't happen. I'm saying that in DA (or upsampling) it doesn't affect the 0-22k band if you don't filter out the mirror images from >22k band. No, you won't recover the signal, as a whole, as it was before sampling, but the 0-22k part of the signal will be the same.

tell me where I can read more.
Usually here on ASR, when someone corrects those who say that images reflect back to 0-22k in DA process, like here or here :)

I don't know, nothing I read about sampling theorem ever implied to me that images reflect back, so it's rather hard to point out any source that explicitly says that it does or doesn't happen.

But I can actually show what I described previously. Here's a 24/44k multitone generated by REW (in attachment):
mt.input.png


I can upsample it using a steep filter with 20k-22k transition band and with a shallow filter with 20k-28k transition band:
Code:
sox "mt.input.wav" -r88200 -b32 "mt.fast.wav" upsample 2 sinc -a 170 -t2k -21k vol 2
sox "mt.input.wav" -r88200 -b32 "mt.slow.wav" upsample 2 sinc -a 170 -t8k -24k vol 2
mt.fast.png

mt.slow.png


As far as I understand what you are saying, there should be some difference between them in 16k-22k band? Null shows only the images:
Code:
sox -m -v1 "mt.fast.wav" -v -1 "mt.slow.wav" "mt.null.wav"
mt.null.png


And at this point this signal is like any other signal sampled at 88.2k. The images became part of it and there isn't any special relation between 0-22k and 22k-44k bands anymore.

Yet if it's not, are you saying that it's just a coincidence that most DACs set the stopband for 44.1 kHz sampling at 24.1 kHz, and it just so happens that 22,050 (Nyquist) is exactly half-way between 20 kHz and this stopband? That's hard to believe.
If I had to guess, I'd say that's because they are using halfband filters. An advantage of halfband filters is that every other coefficient is 0, so this is very desirable because it reduces costs. A disadvantage is that they can only be centered around Fs/4. That means that when you use them in x2 upsampling to 88'200, then 22'050 lies right in the middle of the transition band. And there's where you get 20k-24k transition band.

For an example, there are 2 filters in attachments [1]:
  • The halfband filter with 20k-24k transition band, 100 dB stopband attenuation and 0.0001 dB passband ripple; it has 131 taps and only 67 of them non-zero.
  • The steep filter with the same stopband attenuation and passband ripple but only 20k-22k transition band; it has 263 taps and none of them 0.
So that's 4x difference in non-zero taps [2]. This, I assume, translates to costs, size, power consumption, etc, and these are probably deemed too much for what they offer.
Code:
sox "impulse.wav" -r88200 "impulse.halfband.wav" upsample 2 fir "fir.halfband.txt" vol 2
sox "impulse.wav" -r88200 "impulse.steep.wav"    upsample 2 fir "fir.steep.txt"    vol 2
impulse.stopband.png

impulse.passband.png


[1] The filters were created by following these blog posts Designing Generic FIR Filters with pyFDA and NumPy and Half-Band Filters, a Workhorse of Decimation Filters

[2] AFAIU, these filters should be optimal, but I'm just a dude on the internet, so maybe something better is possible :) . But I don't think the improvement could be significant without changing filter characteristics.
 

Attachments

  • fir.halfband.txt
    1.8 KB · Views: 19
  • fir.steep.txt
    6.5 KB · Views: 20
  • impulse.flac.zip
    1,022 bytes · Views: 24
  • mt.input.flac.zip
    191.7 KB · Views: 20
There would be no point to this bit-depth alteration. Likewise for upsampling. You can't add information to a digital audio file.

1732459908293.png
 
I'm not saying that mirroring doesn't happen. I'm saying that in DA (or upsampling) it doesn't affect the 0-22k band if you don't filter out the mirror images from >22k band. No, you won't recover the signal, as a whole, as it was before sampling, but the 0-22k part of the signal will be the same.

Usually here on ASR, when someone corrects those who say that images reflect back to 0-22k in DA process, like here or here :)

I don't know, nothing I read about sampling theorem ever implied to me that images reflect back, so it's rather hard to point out any source that explicitly says that it does or doesn't happen.
Here's what I had in mind. Thinking of this in terms of reflecting "back" or "forward" can lead to confusion. The point is that sampling points are ambiguous; each frequency below Nyquist has a "sister" frequency above Nyquist that shares the same sampling points.

Here's a simplified visual example how this mirroring happens. Sample at 10 Hz and take a 3 Hz wave. Nyquist is 5 Hz, so the alias of 3 Hz is (5 - 3) + 5 = 7 Hz.

Here are the 3 hz and 7 hz waves superimposed:
1732464697329.png


Note that they intersect at regular intervals - the same sampling points. I'll highlight them:
1732464748436.png


Now back to your example: Suppose you properly filter and digitally record a sound with lots of HF energy, like castanets, at 44.1 kHz. It happens to have plenty of energy at 19 kHz. The sampling points that capture that 19 kHz wave are exactly the same as they would be for a 25.1 kHz wave. (22050 - 19000) + 22050 = 25100. This is why the filter is essential when decoding and playing back this data. Without the filter, you can construct a 19 kHz wave, or a 25.1 kHz wave. Both are "correct" (if you ignore Nyquist) since both match the sampling points. Put differently: if a reconstruction algorithm produces all waves that fit the sampling points, it will produce both 19 kHz and 25.1 kHz. The Nyquist limit is essential to make the sampling points unambiguous. The 25.1 kHz wave is above Nyquist, thus invalid and must be filtered out.

Your example shows the same mirroring. The 20 kHz wave mirrors to 24.1, the 19k to 25.1, etc.
 
I'm not saying that mirroring doesn't happen. I'm saying that in DA (or upsampling) it doesn't affect the 0-22k band if you don't filter out the mirror images from >22k band. No, you won't recover the signal, as a whole, as it was before sampling, but the 0-22k part of the signal will be the same.


Usually here on ASR, when someone corrects those who say that images reflect back to 0-22k in DA process, like here or here :)

I don't know, nothing I read about sampling theorem ever implied to me that images reflect back, so it's rather hard to point out any source that explicitly says that it does or doesn't happen.

But I can actually show what I described previously. Here's a 24/44k multitone generated by REW (in attachment):
View attachment 409175

I can upsample it using a steep filter with 20k-22k transition band and with a shallow filter with 20k-28k transition band:
Code:
sox "mt.input.wav" -r88200 -b32 "mt.fast.wav" upsample 2 sinc -a 170 -t2k -21k vol 2
sox "mt.input.wav" -r88200 -b32 "mt.slow.wav" upsample 2 sinc -a 170 -t8k -24k vol 2
View attachment 409176
View attachment 409177

As far as I understand what you are saying, there should be some difference between them in 16k-22k band? Null shows only the images:
Code:
sox -m -v1 "mt.fast.wav" -v -1 "mt.slow.wav" "mt.null.wav"
View attachment 409178

And at this point this signal is like any other signal sampled at 88.2k. The images became part of it and there isn't any special relation between 0-22k and 22k-44k bands anymore.


If I had to guess, I'd say that's because they are using halfband filters. An advantage of halfband filters is that every other coefficient is 0, so this is very desirable because it reduces costs. A disadvantage is that they can only be centered around Fs/4. That means that when you use them in x2 upsampling to 88'200, then 22'050 lies right in the middle of the transition band. And there's where you get 20k-24k transition band.

For an example, there are 2 filters in attachments [1]:
  • The halfband filter with 20k-24k transition band, 100 dB stopband attenuation and 0.0001 dB passband ripple; it has 131 taps and only 67 of them non-zero.
  • The steep filter with the same stopband attenuation and passband ripple but only 20k-22k transition band; it has 263 taps and none of them 0.
So that's 4x difference in non-zero taps [2]. This, I assume, translates to costs, size, power consumption, etc, and these are probably deemed too much for what they offer.
Code:
sox "impulse.wav" -r88200 "impulse.halfband.wav" upsample 2 fir "fir.halfband.txt" vol 2
sox "impulse.wav" -r88200 "impulse.steep.wav"    upsample 2 fir "fir.steep.txt"    vol 2
View attachment 409185
View attachment 409186

[1] The filters were created by following these blog posts Designing Generic FIR Filters with pyFDA and NumPy and Half-Band Filters, a Workhorse of Decimation Filters

[2] AFAIU, these filters should be optimal, but I'm just a dude on the internet, so maybe something better is possible :) . But I don't think the improvement could be significant without changing filter characteristics.

ESS uses a 2x FIR followed by a 4x FIR to do 8x oversampling. In the ES9039Q2M, the 2x FIR has 128 taps and the 4x FIR has 32 taps. For the linear phase fast roll-off filter, pass band ripple is 0.0031 dB and stop band attenuation is 115 dB with transition band from 0.45 fs to 0.55 fs.
Screenshot 2024-11-24 at 17.34.14.png

The pass band ripple has both a higher and lower frequency component. The higher frequency component is similar to your filter with both at a low peak at 10 kHz and 9 high peaks until the drop off. The lower frequency component is likely due to the following 4x FIR filter. Looking at the drop off in the stop band around 80 kHz, the 4x FIR may be a half band filter as well with 88.2 kHz being 1/4 of 352.8 kHz which is of course 8x44100 kHz.

Screenshot 2024-11-24 at 17.44.55.png
 
Here's what I had in mind. Thinking of this in terms of reflecting "back" or "forward" can lead to confusion. The point is that sampling points are ambiguous; each frequency below Nyquist has a "sister" frequency above Nyquist that shares the same sampling points.

Here's a simplified visual example how this mirroring happens. Sample at 10 Hz and take a 3 Hz wave. Nyquist is 5 Hz, so the alias of 3 Hz is (5 - 3) + 5 = 7 Hz.

Here are the 3 hz and 7 hz waves superimposed:
View attachment 409206

Note that they intersect at regular intervals - the same sampling points. I'll highlight them:
View attachment 409208

Now back to your example: Suppose you properly filter and digitally record a sound with lots of HF energy, like castanets, at 44.1 kHz. It happens to have plenty of energy at 19 kHz. The sampling points that capture that 19 kHz wave are exactly the same as they would be for a 25.1 kHz wave. (22050 - 19000) + 22050 = 25100. This is why the filter is essential when decoding and playing back this data. Without the filter, you can construct a 19 kHz wave, or a 25.1 kHz wave. Both are "correct" (if you ignore Nyquist) since both match the sampling points. Put differently: if a reconstruction algorithm produces all waves that fit the sampling points, it will produce both 19 kHz and 25.1 kHz. The Nyquist limit is essential to make the sampling points unambiguous. The 25.1 kHz wave is above Nyquist, thus invalid and must be filtered out.

Your example shows the same mirroring. The 20 kHz wave mirrors to 24.1, the 19k to 25.1, etc.

That is called imaging and it only happens above fs/2. What you originally wrote some time ago is:
This is because aliases introduced by insufficient filtering mirror-image around Nyquist. [...] If we set the stopband at 24.1 kHz instead of 22.05 where it should be, any alias introduced must be at least 20 kHz or higher, which is inaudible.

This is aliasing which does not occur for upsampling or DA conversion. The images are not folded back into below fs/2.
 

What information is added? It is an undithered 16 bit 1 kHz sine at -80 dBFS. It should have steps and ringing like the one on the left.

Of course, filtering can be applied and noise and distortion added but the point is increasing the number of bits does not reduce the noise and distortion and increasing the sample rate does not increase the bandwidth.
 
Er, not really Alan. The one on the left just hasn't had any filtering or noise reduction applied. In the plot below, the blue dots are samples. The grey staircase is what the output would look like if there was no filter. But square waves have infinite bandwidth, so when the reconstruction filter is applied we end up with the lovely smooth blue output.

I'll once again recommend the amazing views by Monty of xiph.org, where he both explains this _and_ demoes the whole process with vintage analogue gear! https://xiph.org/video/


1732497520384.png
 
@GXAlan @MaxwellsEq another plot, this time to illustrate quantisation noise at low levels like -80dB. The samples are stored as integers so we get the red dots instead of the orange dots we would like. This is why dither is applied - it adds noise but improves resolution. To reiterate, this process doesn't add information, but it does turn a correlated source of distortion (quantisation) into an uncorrelated one (noise)

1732501631750.png

If we go to -60dB the quantisation noise is almost gone:

1732501965594.png

And here's -80dB again but with 10 rounds of random dither - note how the white noise is increased but the underlying signal is much clearer:

1732502829958.png
 

Attachments

  • 1732502790516.png
    1732502790516.png
    40.7 KB · Views: 16
Last edited:
Back
Top Bottom