Based on my interview with Rob Watts, this can be measured and understood using the digital domain. As I understand it, this involves feeding in the input data and reviewing the output data. My understanding is that you cannot measure it in the analog domain, but that doesn't make it false. Also, not being able to measure something doesn't make it inexistent. The Higgs boson couldn't be found/measured prior to 2012 but was theorised in 1964. Similarly, many of the medicines we are familiar with today (Lithium, Tylenol, Penicillin) are still not understood in terms of how they work. We just know that they do.
Further to that, my superficial understanding of the sinc function theory used as the foundation for the 1,000,000 tap design of the M-Scaler and how it is doing the upsampling (possibly more important than the fact that it is upsampling) is that the fine detail accuracy of the reconstructed wave form gets greater as it moves towards the infinite product. If that is true and the sinc function theory isn't wrong then does it not stand to reason that more processing of that algorithm will result in tighter timing accuracy?
Thank you for your thoughtful and detailed reply - much appreciated!
I think the issue with Watts’ reply to your question is that, as has been noted by others, we don’t (in fact, we can’t) listen to a digital signal in the digital domain. So yes, a higher sample rate will reduce the gaps between samples, and therefore the digital waveform will visually look “smoother” and more “refined” or “high-res” when you examine it in an audio editing program or a similar app or device.
But digital sampling theory tells us that this visual appearance does not correspond to any difference in sound. Any frequency we want to be able to reproduce need be sampled only twice. This is not “just a theory,” or an “it sounds good enough most of the time” type of theory. This is a “mathematical-truth,” “cell phones and the sound on all your favorite streaming services wouldn’t work at all if it weren’t true” type of theory. I know it seems almost inconceivable that Rob Watts would ignore this or be incorrect if he claimed it was untrue. But I don’t know what else to say: it’s true, and you don’t have to take my or Amir’s or anyone else’s word for it, as it’s copiously documented in the scientific literature and well-established. Can’t say the same for Watt’s claim about a more “refined” sound coming from upsampling.
As I’ve written in another thread, we all experience this truth whenever we listen to music because a bass drum at 50Hz is sampled 100 times more than a cymbal or a vocal harmonic at 5kHz, regardless of the sample rate. And no one ever claims that bass sounds are always more “refined” than midrange or treble sounds within the very same recording.
So if a bass drum that’s 100x oversampled compared to a cymbal doesn’t sound more refined or high-res, then upsampling a digital recording by only 2x or 4x (or 20x) isn’t going to do a thing.
Please note that the previous two paragraphs are just a possibly easy way to think about this idea - again, this is not a perceptual “take my word for it, you won’t hear the difference” rule; it’s a hard and fast “there is no difference, by definition “ rule.