A comparison of S/PDIF to I2S is just fine, but a comparison to USB doesn't really make much sense. S/PDIF and I2S are both protocols for transmitting audio, USB is not.
I2S is a very basic communication protocol for transmitting audio data from one chip to another but it's also unedited, untouched and completely free from any processing. It is actually a three wire data protocol that includes the bit clock, the data stream and LR clock. The LR clock indicates which channel the incoming data is for and runs at the sample frequency. The data stream carries data and runs at the bit clock frequency (usually 64x fs these days). The bit clock is used in combination with the data line to tell the receiver when it needs to sample the data stream.
With audio transmission the stability of all three of the above clocks is absolutely paramount to achieving low jitter. These days a forth line is added, typically called the master clock, it isn't needed for data transmission and most devices don't even require the master clocks leading edges to be synchronised with the others, just that it runs at an exact multiple of them. Once upon a time, with R2R multi-bit DACs you didn't need anything other than the bit clock, data stream and LR clock as those were what carried the data. These days though, with DS DACs and ADCs the master clock is required to run the over sampling circuitry that operates at much higher frequencies. The master clock isn't really a part of what I2S was originally about but almost everyone includes one as it's necessary for DS device operation. Just like the other three clocks though clock stability is absolutely necessary for the master clock too if you want low jitter.
Whatever format your audio is in you need some way to turn it into an I2S data stream, this could be standard I2S, left justified or some form of TDM. They all use the same wires it's just the protocol is different. This is why the DAC-X26 has dip-switches on the bottom and is reason enough to show that I2S wasn't designed to be used over wires. It's not standardised in the same way that S/PDIF is, ie for faultless plug and play communication between any connected devices, but I digress. You need some way to turn your stored data into an I2S stream.
Back in the day we used CDs and had the associated silicon required to control and read the data from the CD and convert it into an I2S stream. This was all controlled by referencing the data extraction to the rate at which a master clock ran. *Note here I'm using master to describe a clock that
must be obeyed, not today's usage which usually means the one that runs the fastest* Back then the data extraction was timed and delivered by the rate at which the master clock ran, if the master clock ran fast, so did your CD and people would sound like chip-munks. Obviously no one actually installed a 48kHz base oscillator into a 44.1kHz redbook system, but if you did so Alvin, Simon and Theodore would greet you.
The data extracted would be timed precisely to the ticks of the master clock and the CD player silicon would generate the bit clock, LR clock and data stream based off of it, your I2S data stream. This would then be input directly to a DAC chip, conversion would happen and you'd get audio. If your master clock oscillator was of high quality then you'd get low jitter performance. It's as simple as that. Change the clock to something better and you'd get lower jitter performance.
Everything was timed and communicated from the master clock oscillator and everything was synchronous. This is literally the holy grail for audio quality, what you get off the CD goes straight into the DAC, the literal definition of bit perfect & low jitter. If you placed a DSP directly before the DAC then this would process the data, the output wouldn't be bit perfect any more because you've changed it by whatever you told the DSP to do, but it would still all be timed/referenced to the master clock (the one that must be obeyed) so you'd still have very low jitter performance. This would be the holy grail of how to do a DSP active loudspeaker, put the DSP between whatever gives you your I2S data stream and DAC and reference it all to the master clock.
So if this was perfect why fix what wasn't broken? Because it became necessary to transmit audio data from one box to another in a simple and reliable way. This is essentially what S/PDIF is and as nothing is as simple as one wire, one wire was chosen. There isn't any problem with doing this from the transmitter end either because the transmitter is still referenced to the master clock. The problem is at the other end, with the receiver. This is where things go horribly pear shaped.
Above I said this.
The bit clock is used in combination with the data line to tell the receiver when it needs to sample the data stream.
The trouble with S/PDIF is that there is no clock to tell the receiver when data needs to be sampled. The data and the clock are all baked into one line. Protocols are used to extract the clock and data but the extraction process uses an adaptive, ever changing process, usually based on PLLs. This results in the opposite of clock stability, what's required for low jitter. Essentially the S/PDIF receiver doesn't know when what's coming next is actually going to come next, it knows
roughly when this is going to occur but there's nothing it can
rely on, so they use a tunable PLL that can vary its frequency accordingly to match the rate at which the data is output as closely to the rate at which the data is input. Obviously there are buffers involved, so if the buffer starts to run empty, or starts to fill up too quickly the output data rate is adjusted to there isn't any break in the data stream, but PLLs don't have the same kind of clock stability required for ultra low jitter performance as oscillators do. I make it sound far worse than it is, it's not like you can perceive the changes in the PLL shifting as changes in pitch (like Alvin, Simon and Theodore up above), everything runs in a stable fashion, but the quality of the clock that the PLL can create is limited by the inherent limitations of PLLs so although the data gets there it does so with additional jitter versus what came out of the original box.
Effectively, with S/PDIF we've broken the link back to our ultra high quality master clock oscillator in our CD player and are now forced to do some nifty tricks to create something similar again. It's not just less than ideal it's a shambles in terms of what you don't want to be doing but it's simple and works. As an aside jitter is only an anomaly that crops up during data playback in
real time. If you were a sound/recording engineer and used S/PDIF to transmit digital audio from one device into a PC for some post processing then it would be absolutely fine. All the data would get there intact. All the ones and zeros are in the right place, it's just that during immediate play-back, from the clocks generated by the S/PDIF receiver, you end up with subtle timing errors and those are what create jitter.
So what do you do? Why can't we just introduce another ultra low jitter master clock at the receiver end. You can do if you want but you still need the PLL. The trouble is that even if you use another of the same ultra low jitter oscillator used in your CD player the two wont run at exactly the same speed. They aren't synchronised, they are close, but not identical. This is where the asynchronous sample rate converter (ASRC) comes into play, in other words the 'jitter removal' part of what's included in ESS DACs. These take the bit clock, LR clock and data line that is generated by your S/PDIF receiver and process it. In doing so they (in basic terms) essentially generate a software based version of what the analogue waveform would look like for the signal input into it. You then attach your ultra low jitter master clock to the output side of the ASRC. The output side then samples the software waveform and creates a new set of data. This is then output from the ASRC, you get a new bit clock, LR clock and data line timed to a new master clock of ultra low jitter and Santa comes down your chimney. Jitter be gone. So it's perfect right? No. First of all it's not bit perfect because the output side of the ASRC generated a brand new set of samples. Second is that (apparently) the jitter present on the input of the ASRC is simply shifted to the output in a different form. ASRCs weren't actually invented to remove jitter, they were invented in order for two systems using dissimilar sampling frequencies to interface with one another. You might have an ADC and DSP that you want to link together. The obvious way to do this is to run one as a master and the other as a slave, but the world happens and they might both be masters, or both in entirely different boxes from one another, so the ASRC is used to allow them to talk to one another. It just so happens that ASRCs also attenuate jitter so they found themselves being incorporated inside DACs.
A very obvious way to circumvent all these jitter issues is to simply remove the S/PDIF receiver and transmit the I2S stream directly from box 1 to box 2. This isn't simple (like S/PDIF), for a number of reasons, but if you do it correctly then it does work very well. We can see this in the measurements above with the ESS chips jitter reduction switched off. The direct I2S stream is much cleaner. It should also be bit perfect because the on-board ASRC has been disabled. I've done I2S over network cables myself using LVDS no problems there and you can also get transformer based LVDS systems that offer isolation between one end and the other, a bit like TOSLINK optical cable only not using the S/PDIF standard.
I've watched a number of Paul McGowan's video's and can't stand them with his obvious pedalling of audio nonsense, but in this case he isn't exactly wrong. A direct I2S connection between two boxes
is better. If the system your connecting together via I2S over cables has a data flow like the holy grail CD player described above, only with I2S over cables before the DAC, then you are literally still in holy grail CD player territory. Where I2S over cables makes very little sense is if you do have ASRCs, that you cannot turn off, being used. Or if your DAC has a USB interface...
Which then brings us on to the USB interface itself. This is just like the CD player transport above and isn't akin, at all, to S/PDIF/I2S. USB doesn't transmit audio data and even when used with a USB>Audio interface it doesn't transmit audio data. It transmits USB protocol packets of data. The USB silicon chip decodes the USB data packets and then creates an audio bit clock, LR clock and data stream based off of the data it receives. We're back in holy grail territory because we can, theoretically, connect an ultra low jitter master clock to the USB chip to which the I2S output is timed to.
I say theoretically because USBs protocol for data transmission isn't inherently suited to the kind of timing that audio wants for low jitter. Remember how the CD player spun at the rate dictated by the oscillator you had connected to it and everything else had to obey?...Yeah not quite the same with USB, it's more like S/PDIF. With basic USB the data transmission is not dictated by the clock that you've got connected to your USB>Audio chip. Apple had this sussed with firewire, which slaved the computer to the device it was connected to. Sadly USB dances to the tune of the computer, which is less than ideal. Basic USB>audio interfaces, like TIs PCM2902/2702 operate in crap mode with the computer timing everything and as as result they aren't particularly low jitter. For these you need an ASRC.
Modern USB>Audio devices, such as the XMOS products and the correctly configured Cmedia stuff operate in a different mode where the clock on board the USB device dictates the rate of data transmission. As far as I know the data is still sent down the USB cable using the same basic USB transmission protocol it's just that the USB>Audio device tells the PC when it needs more data, rather than the PC just sending it out whevever.
So basically speaking if you have a modern, high quality USB>Audio device as your source inside a stand alone DAC then there is absolutely no reason to use anything else to get your data into the DAC. Using I2S over wires is better than S/PDIF if everything at both ends is configured correctly to allow it to work and configured correctly to allow you to bypass any on board ASRC. It also requires the thing sending out I2S to be very low jitter in the first place. If it's not definitely use modern USB or S/PDIF + any on board jitter reduction. It's all just a huge bunch of ifs and buts and isn't particularly useful.
Because ESS are used so regularly in products it almost makes a lot of this redundant because their on board ASRCs aren't usually defeatable. If used with a decent quality oscillator you'll get excellent jitter performance whatever the source. It does kind of irritate me a bit because what I want to see is a modern USB input DAC using very low jitter oscillators at the USB>Audio converter, followed by DACs that aren't using ASRCs for bit perfect data going into
and out of the DAC. It doesn't matter a jot if your USB>Audio converter is conveying bit perfect data from the PC if there's an ASRC somewhere in the chain after it. Yes ESS I'm looking at you!