• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Stereo and higher sampling rates - "time domain" question

It's not immediately intuitive, particularly as a lot of marketing still promotes the "stair step" misconception.
And then they advertise a DACs with a NOS so that these stair steps interfere more :).
 
OK - maybe I need a primer in stereo then - because aren't all these examples just a simple sine wave which would just be a single mono channel?
Any real signal (finite energy, finite length) can be constructed from a sum of sine waves, or an integral if in the continuous domain.

Mono, stereo, 44.8.16, whatever. (made the 44... one up though, there's no point to that)
 
This is true. For a sine wave in the non-dithered case, it depends on the frequency and effective bit depth (see the first link in post #2). With proper dither, I believe it should be identical to a non-quantized, continuous time (i.e. analog) signal with the same SNR.
Yes, it's bandwidth, not sampling rate, that matters. Dithering matters, but down at the nanosecond level.
 
Maybe it was about poor reconstruction filter algorithm (Fs 96kHz x 44.1kHz).
 
Perfect, thanks all.
I will try and have a reasoned conversation and work out why he feels there is a time domain issue now I have some better understanding of where he might be going wrong with an assumption.

"They're perfectly accurate in isolation, hence also perfectly accurate to each other." I think that line is the one I need!
This is a great 101 primer for digital audio:

Intersample timing is demonstrated from around 20:50


 
Yes, it's bandwidth, not sampling rate, that matters. Dithering matters, but down at the nanosecond level.
Thanks; your comment made me realize that the formula given in the first link in post #2 is in fact incorrect—it uses the frequency of a given sine wave rather than the actual signal bandwidth after quantization.

As an aside, I did a small dither experiment: I generated a full-scale sine wave at 20Hz with a 44.1kHz sample rate, duplicated it to make two channels, delayed one channel by 1/1000th of a sample (using a thiran allpass filter), then quantized both channels to 8 bits with simple noise-shaped dither. Here's the spectrum of the residual:
residual_20Hz_8bit_22.7ns.png


Edit: Can one encode a time difference of just 10 picoseconds at 3.5kHz with Red Book (16-bit, 44.1kHz)? Sure! Here's the residual:
residual_3.5kHz_redbook_10ps.png

Shown with averaging. The noise shaping filter is a 9-tap "psychoacoustically optimal" filter proposed by R. A. Wannamaker.
What about a single picosecond? Yep, still discernible over the noise floor:
residual_3.5kHz_redbook_1ps.png
 
Last edited:
He claims that 96kHz is far better for stereo
Regardless the reasoning used by a friend of yours, it is a fact that many musical instruments have and many transient signals (like hand clapping) have spectrum with frequencies far exceeding 22kHz. So, 44.1kHz sampling frequency inevitably results in spectrum truncation and signal distortion (I do not mean harmonic distortion). Do not ask me if it is audible or not, I am not sure. And I would be careful to say that it is always inaudible. To me, 96kHz sampling makes sense regarding audio signal fidelity.
 
Regardless the reasoning used by a friend of yours, it is a fact that many musical instruments have and many transient signals (like hand clapping) have spectrum with frequencies far exceeding 22kHz. So, 44.1kHz sampling frequency inevitably results in spectrum truncation and signal distortion (I do not mean harmonic distortion). Do not ask me if it is audible or not, I am not sure. And I would be careful to say that it is always inaudible. To me, 96kHz sampling makes sense regarding audio signal fidelity.
Without knowing if it's audible or not, it seems strange to say that 96kHz sampling makes sense. Given that we know that 20kHz is actually extremely generous as a limit to human high-frequency hearing, and most people are well shy of that, it seems more likely that 44.1kHz is already more than necessary. For various reasons, moving to 48kHz as a standard can be sensible. But 96kHz or higher just seems wasteful (vanishingly little content but a lot of noise and other potential spuriae can be found in that extra octave or so) and potentially harmful given that most systems are not designed to reproduce extremely high frequencies.
 
Regardless the reasoning used by a friend of yours, it is a fact that many musical instruments have and many transient signals (like hand clapping) have spectrum with frequencies far exceeding 22kHz. So, 44.1kHz sampling frequency inevitably results in spectrum truncation and signal distortion (I do not mean harmonic distortion). Do not ask me if it is audible or not, I am not sure. And I would be careful to say that it is always inaudible. To me, 96kHz sampling makes sense regarding audio signal fidelity.

There is some thin, but unproven justification of going to 64kHz sampling, based on impulse responses (FIR or IIR, doesn't matter which, for different reasons) of the anti-aliasing filters interacting with the nonlinearity of the ear. THIN, mind you, and as of yet without evidence, but that can be potentially made hypothetical by the extremes of human hearing as we know today.
 
There is some thin, but unproven justification of going to 64kHz sampling, based on impulse responses (FIR or IIR, doesn't matter which, for different reasons) of the anti-aliasing filters interacting with the nonlinearity of the ear. THIN, mind you, and as of yet without evidence, but that can be potentially made hypothetical by the extremes of human hearing as we know today.


Whereas (and feel free to correct me if I am wrong - I am a layperson preaching here) the first-order effect is that the biomechanical properties of the ear will filter out the frequencies above the limit of around 20 kHz. So even if those frequencies (e.g. from hand clapping) are left in the signal by a 96 kHz sample rate, and if they get through the output transducer - it will make no difference to audibility since they don't reach the brain in any case.

And related to the topic of this thread:

While the absolute limits of audibility are academically interesting, far too much emphasis is placed on this (IMO) in the context of enjoyment of recorded music.

If the only way you can detect a difference is by close and careful listening with fast switching between "with" and "without", and even then you can't be certain - and can't trivially get 100 out of 100 correct - then in real-world listening it just doesn't matter.
 
Last edited:
You also have sound from Ultrasound, which demodulates ultrasonics to audible sounds in the air. This surely might cause audible effects. But if so, then they were already recorded in the original recording (assuming that a microphone was used).

But surely you can construct an artificial signal that would yield an audible sound with only audio > 20 kHz and assuming you have a playback chain that can put such a signal in the air.
 
Whereas (and feel free to correct me if I am wrong - I am layperson preaching here) the first order effect is that the biomechanical properties of the ear will filter out the frequncies above the limit of around 20kHz. So even if those frequencies (eg from hand clapping) are left in the signal by a 96kHz sample rate, and if they get through the output transducer - it will make no difference to audibility since they don't reach the brain in any case.

First, young people CAN (at least some of them) hear a bit beyond 20kHz. So that, while not particularly of interest to audiophiles, is something that is determined. I know it's been demonstrated in several labs at various times.

Now, your "filtered at 20khz" for the hand clap, or glockenspiel hit, or rimshot, etc...

The filter must broaden the signal AT 20khz, that's kind of required. If it's an IIR filter, there will be substantial phase shift at the point where the cutoff starts. That phase shift changes the relationships of the attack at various frequencies. While I'm not too concerned about the time alignment there, it DOES create problems with ring tones, by shifting the phase of the tones, and thence the envelope. Does it matter? That's unclear, but it does demonstrably matter at lower frequencies, so it's hard to say. Note, we're not hearing above 20kHz, we're talking about effects in-band due to the filter.

Now, for FIR filters of the 'constant delay' variety, there is pre-ringing. If you analyze those on a short-term basis (well shorter than the filter), you may (and do) find that there is some "early' content that MIGHT cause cochlear compression before the main peak (again, with codecs this is very well understood to be a <redacted> mother of a problem, at lower frequencies), thence changing the perceived timbre of the attack. Does it matter? Well, there's a chicken/egg problem there, you'd have to have yourself a 96k DAC with a very slow (short) antialising filter, maybe starting to roll off at 25kHz and finishing at 48k to be sure you're not confounding yourself. Then, you'd need a young, very well trained subject with an excellent set of headphones and very carefully prepared sounds that were ALSO captured properly or alternatively synthesized properly. Then you could filter that set of signals with a variety of filters, and run an ABC/hr test on the lot.

I hope I don't have to explain how much that's going to cost, between building the hardware, testing the hardware, finding a suitable subject or three to take the test, and then run it until you eliminate both type 1 and type 2 errors to a satisfactory level.

Yeah. Got a few million laying around? No, I didn't think so.

Note: 64kHz would bypass all of this, start to roll off (somewhat slowly) at 20kHz down to 32kHz, and I'd be pretty convinced things would be fine. Yeah. Wanna get me the egg, or the chicken, here?

 
So even if those frequencies (eg from hand clapping) are left in the signal by a 96kHz sample rate, and if they get through the output transducer - it will make no difference to audibility since they don't reach the brain in any case.
This is quite a courageous statement, as it only takes into account a certain kind of signal path (to brains) and certain, physical sensor-like processing. Even in such simplified case, intermodulation effect cannot be completely excluded.

Just as an example, a section of hand-clapping recorded. Just to mention, it is a section of a non-stationary, transient signal. A kind of a surge wave.

hand-clapping.png
 
You also have sound from Ultrasound, which demodulates ultrasonics to audible sounds in the air. This surely might cause audible effects. But if so, then they were already recorded in the original recording (assuming that a microphone was used).

But surely you can construct an artificial signal that would yield an audible sound with only audio > 20 kHz and assuming you have a playback chain that can put such a signal in the air.

This is not a large problem at sane levels, but it is a problem with stuff like rimshots, etc, that start out in a nonlinear range of "spl" (more like a shock wave at the start), which also interferes with miking, etc, since the timbre changes with distance for some amount of distance.
 
You also have sound from Ultrasound, which demodulates ultrasonics to audible sounds in the air. This surely might cause audible effects. But if so, then they were already recorded in the original recording (assuming that a microphone was used).

But surely you can construct an artificial signal that would yield an audible sound with only audio > 20 kHz and assuming you have a playback chain that can put such a signal in the air.
But this is the argument that says reproducing inaudible frequencies can actually reduce audible quality.

Since the inaudible frequencies are inaudible - they can't improve the sound. But they can inter-modulate with other frequencies (especially in the output transducer) to create audible distortion that otherwise wouldn't exist.

Again - whether this effect is sufficient to impact real-world listening enjoyment is highly debatable, but it is measurable and real - and a good reason to have the reconstruction filter not too much higher than the audible limit.


EDIT : though high enough to avoid the filter artefacts (phase shift etc) around 20kHz as pointed out by @j_j


I'm still of the opinion - possibly formed mostly from my own listening and limited auditory bandwidth (around 13 to 14khz) - that most or all of these effects are insufficient to impact real world listening enjoyment.

EDIT SOME MORE:
First, young people CAN (at least some of them) hear a bit beyond 20kHz. So that, while not particularly of interest to audiophiles, is something that is determined. I know it's been demonstrated in several labs at various times.
Which is why I wrote : "around 20kHz" :D
 
Last edited:
Back
Top Bottom