- Thread Starter
- #41
Let's dial the emotions down folks. It is not doing us any good.
Straight resampling does not thing to restore original data as discussed in the OP. It simply makes the data suitable for the new sample rate without artifacts added.Although the math and perceptual senses are different, the underlying goal is the same. How can we as accurate as possible regenerate the original signal. We know we have enough information if we sample with double the frequency. If we half the sample rate, we cannot restore the signal. So it is not like can we regenerate the signal from a 10 kHz file but can we use deep NN techniques to reconstruct the signal more accurately than with other means?
Although the math and perceptual senses are different, the underlying goal is the same. How can we as accurate as possible regenerate the original signal.
The problem, of course, with that kind of processing is that you have to assume what you threw out sounded like. You can't actually describe it accurately, or you'd have to increase the bit rate (Shannon, as always, wins), so you have to develop a hypothesis. Then you put in "We Shall Be Happy" from Jazz by Ry Cooder, and the hypothesis is WRONG! (I'm thinking the way that Marissa Tomei says "wrong" in "Cousin Vinny")This is used in low bit rate music codecs that use lower than 44.1 kHz sampling rate, to reproduce the "original CD" sound. We developed this at Microsoft and while it worked surprisingly well on some content, it also generated annoying high-frequency response at times.
If you know the type of content you have, more optimal algorithms can be used for better results. But there will always be limitations.
Easiest to take a lot of high resolution recordings. Sample those down to 44.1/16 bit and see which upsampling algorithm comes closest. Will still be difficult but is at least repeatable/verifyable. An electrical approach can also be taken but then there is the extra DAC/ADC step which adds it's own additional distortion. The last remark just as the DAC conversion is the reason we want as good quality as we can, either with upsampling or original sample rate.There's a couple of things here. First, what do you use as an accuracy measure? For images for viewing, the process is enormously nonlinear and has to be (that is, for viewing, not for analysis). For sound, nonlinearity must be avoided at all costs.
But then you say "the original signal". Ok, if you have a fixed photo that's been downsampled, maybe you have an original. If you have a statue and you can replicate the lighting, that's "an orginal".
But what's the original audio signal. Seriously. What is the "original".
My apologies if this is not a pertinent question to this thread (it *seems* related to me, but I'm still very ignorant). Are these information-theoretic properties of sampling frequency and audible frequency the reason why there are comparatively many wireless sub-woofers and adaptors vs. "normal" speakers? Because, e.g. you can transmit A3 and below at 24 bit resolution but use only ~ 10.6Kbits/second of bandwidth if you low-pass filter and "down-sample" the line signal? Vs. ~1.5 Mbits for the full audible spectrum?
Easiest to take a lot of high resolution recordings. Sample those down to 44.1/16 bit and see which upsampling algorithm comes closest. Will still be difficult but is at least repeatable/verifyable.
The question is more, does upsampling help to improve timing reconstruction accuracy?So the question you're really asking (it's a valid question, but not likely to enlighten much) is "does the signal above 20kHz matter to adult humans, adolescent humans, youngsters".
It is easy to show (see this presentation https://www.aes-media.org/sections/pnw/pnwrecaps/2016/jjsrc_jan2016/ ) that the in-band results can be gotten to arbitrarily close to the original in-band part of the signal.
The question is more, does upsampling help to improve timing reconstruction accuracy?
I think that mixed approach makes for so many different experiences in soundstage, localization etc., not only with the same speaker but also with different electronics.
If you half the frequency, we only have four samples per wave but is it enough to also account for modulation of intensity and frequency?
and look at benefits and/or drawbacks, if any.
I think your calculation of 0.12 ns is way off
What are "more advanced upsample algorithms"?In short, we could use high accuracy recordings (>=176 kHz/24 bit/DSD512) downsampled to 44.1/16 or 48/16 and use more advanced upsample algorithms to see if those reconstruction filters provide more accuracy and look at benefits and/or drawbacks, if any.
You seem to assume that our auditory system uses 20 kHz tones for localization.