On delta-sigma DACs: There are a lot of subtle things about the beasts. Bear in mind I have not piddled with them for a while now, and I was more focused on RF applications (last one I piddled with targeted 5 - 10 GS/s modulator rates). But the basics are all there...
LF (i.e. audio and instrumentation) delta-sigma DACs may use multi-bit DACs to reduce the complexity of the digital modulator and reduce the required oversampling ratio (i.e. they don't need to be clocked so fast). At RF this is a bigger deal as high-speed multipliers are a royal PITA and take a lot of power. I am not all that familiar with the audio variety (just have not looked recently; it has been years since I designed one) and so I do not claim to know what is currently in vogue. The catch with multi-bit DACs in a delta-sigma design is that, even if it is only say a 5-bit DAC, the steps and linearity must meet the desired output requirements. That is, if you are building a 16-bit delta-sigma DAC using a 4-bit DAC at the output of the modulator, that 4-bit DAC must have 16-bit linearity. So even though it only has 16 steps (2^4), those steps match match to 1/2^16 (1 part in 65,536) to create a 16-bit DAC. That usually means trimming, calibration and compensation, or some sort of dynamic element matching scheme (van de Plassche presented the latter in papers and a book on data converters; I actually piddled with it for a while).
Idle tones happen when the finite filter length in a delta-sigma modulator or demodulator repeats the same values endlessly. This was a big problem when folk started using DS ADCs for precision DC and LF measurements -- the cycling of filter values causes fairly large spurs in the output spectrum. Bad at audio, natch. Fortunately higher-order loops, multibit designs, and adding dither pretty much gets rid of them. But there are other concerns, including keeping those very high-speed clock signals and switching spikes away from the output (fried tweeters, anyone? Not to mention driving an amplifier into instability with ultrasonics...) That is in addition to the basic issue of noise shaping; all the quantization noise pushed out of baseband (audio band) by the delta-sigma modulator ends up at HF where it must be filtered away from the system. The scheme is to push it high enough that noise filtering isn't too hard, and the high oversampling ratio (clock frequency) means images (at multiples if half the sampling frequency) are more easily filtered. In practice, when talking about 16-24 bits and more, it takes some pretty fancy filters. A sneaky problem is noise modulation and this is a common concern for audio DACs. Essentially the signal interacts with those digital filters to modulate the noise floor. These creates "humps" in the noise floor for steady-state signals, and noise modulation looks like modulation or dynamic "pumping" of the noise floor with time-varying signals (e.g. music). It may audible; I do not know. And I mean that sincerely -- I do not know how audible the effect. It can be large enough that I suspect so with the right signal and DAC.
I'll stop there, hard to write decades or research in a few paragraphs, and while I have designed and worked with them in the past, I'd have to bone up a bit to be a competent source. Most of my designs were more conventional DACs (segmented ladder types), and while I have designed and built 16-bit DACs from 5 to 100+ MS/s 10+ years ago, I have not really done much with audio DACs.