I have done exactly what you're doing, but without the wider research you have done. I just blundered into a way of doing it and stuck with that, even though it may not be very elegant.
My solution involves the ALSA-supplied loopback driver that acts as a virtual sound card and can sink and source audio streams. This can be seen by applications such as REW and streaming music player software, allowing me to intercept the stream and apply my own processing before I send it off to the multichannel DAC. However, I don't actually believe that the loopback driver can work properly as it is supplied, and in the end I had to modify the code ever so slightly and re-compile it, then invoke it in my system as a runtime module using 'modprobe' - and believe me, that is not something I would ever tackle lightly!
The loopback driver effectively has its own sample clock, and I control that dynamically with my application, keeping it synchronised on average with my DAC's consumption rate using ALSA system commands. i.e. the rate at which data is consumed from streaming apps etc. by the loopback driver is regulated by me to prevent a software FIFO from over-filling or emptying, and which is being drained by the DAC at its constant sample rate.
The main issue, it seems to me, is sample rate conversion: avoiding it if possible, and knowing when, where and how in the system it is being applied at other times. For my purposes, no sample rate conversion is needed because the sources I am using are supplying their streams as demanded by the loopback driver (whose sample rate I have control over), but I have written my own sample rate conversion code for the time when, inevitably, such digital streams are taken away from us and I will have to use a S/PDIF link that runs at a slightly different rate from my DAC and has no facility for closing the loop.
I agree with you that the 'infrastructure' and operating system is far more difficult to deal with and understand than the DSP stuff.