Let me say that I have experienced no audible degradation as a function of using the USB connection except when there are verifiable packet errors.
Same here. The only time I've experienced any degradation at all using USB was when I crafted a
really poor cable just to see what would happen.
I do, however, dispute that 8kHz is the only frequency content on the 5v USB supply due to PC activity. My own experience contradicts that, for whatever that is worth.
The 5 V USB power (Vbus) can of course have all manner of noise on it. That has nothing to do with jitter, though.
The USB receiver must take the information from the packet/frame and transfer it one sample at a time to the DAC for processing at the correct rate. Surely this means that the input data has to be transferred into a data pipe using a clock, and then transferred into the DAC also using a clock.
Let's assume that the DAC produces an output which is linearly dependent on the data value, and also is dependent in some fashion on the duration of the value.
This means that the DAC converts a series of samples, the magnitude of which is the data, but the duration of which is defined by the varying period of the clock, which is defined by the "jitter". Effectively at that point you move from the data domain to the analog domain. The dependence on the clock may be something as simple as the duration of a zero order hold. When the output is low pass filtered the variation in the duration of the zero order hold, in this instance, appears as noise on the output waveform.
I'm going to assume we're talking about DACs with asynchronous USB interfaces here. Anything else is pretty much non-existent. I'm also going to assume a typical sigma-delta DAC chip with internal oversampling.
Whenever a USB packet is received, the sample data it contains is placed into a FIFO buffer. Separately, and continuously, samples are read from the other end of the FIFO buffer and transferred to the DAC chip. I'm assuming everything is working properly and the FIFO buffer never over- or underflows.
The input to the DAC chip is generally an I2S link. This is a serial interface with three signals: LR (or word) clock, bit clock, and data. The data values are sampled on the rising edge of the bit clock while the LR clock edges indicate the boundaries of the sample values as well as which channel (left or right) they belong to. Within the DAC chip, this bit stream enters something equivalent to a serial-in/parallel-out shift register with the output latched on the rising LR clock edge. However, and this is important, the parallel samples emerging from this circuit do not go directly to the D/A conversion stage.
After deserialisation the audio samples enter the digital interpolation stage which, as the name implies, interpolates the data producing a sample rate typically 8x higher than the input. This is then further oversampled using zero-order hold to the rate of the sigma-delta modulator. The modulator output is what enters the D/A conversion stage, and here the timing is important.
The digital interpolation and sigma-delta modulator are operated from a separate master (or system) clock input, often 24.576 MHz when the audio sample rate is 48 kHz or a multiple thereof. Although the I2S input is typically (ESS being the notable exception) required to be synchronous with the master clock, there is enough internal buffering that some wavering is tolerated. For example, TI/Burr-Brown DACs only require that the LR clock remain within ±6 bit clock periods of the ideal.
In a well designed DAC, the master clock is located close to the DAC chip with ample power supply decoupling, perhaps even a dedicated regulator. This makes it difficult for activity at the USB receiver to cause jitter at the critical point, i.e. the D/A conversion stage. Even if the readout from the receiver FIFO buffer has some jitter, the data is reclocked again within the DAC chip.
Now someone might point out that switching noise in the I2S input buffer will be tied to whatever jitter is present on the bit clock. This is true. However, this input stage is tiny compared to all the other digital circuitry making up the interpolation filter and modulator. It stands to reason that any jitter-correlated noise will be swamped by switching noise from the rest of the chip, which as discussed is operated from a clean clock.
Of all the things that can adversely affect the output quality of a DAC, jitter on the USB link really should be very far down the list. If there is a problem, it is almost certainly caused by something else.
It's this idea which is at the heart of the crystal paper and which, potentially, creates a way for noise on a digital interface- either due to variations in the data edges ( if the clock is generated from the data) or the clock itself (if the clock is generated from the host)- to create noise that is audible at the output of a DAC.
In an asynchronous USB DAC, the clock is not derived from the host in any way whatsoever. The USB receiver does recover a clock signal from the NRZI data stream, but only for the purpose of extracting the data. This clock is not used for anything else.
In general terms, either the data transfer rate discrepancy has to be fed back to the host, which then deals with the problem by transferring more or less data per frame on average
That is exactly how it works. The receiver compares the received packet rate with an 8 kHz reference derived from the local clock and informs the host how many (fractional) samples per packet it wants. This average rate is then maintained by rounding up or down as needed for each packet. The desired rate is continuously recalculated to account for drifting clocks.