Thanks for replying!
I spent some time this weekend reading the SPDIF spec (extremely simple and short) and my understanding is that there are multiple sources for delay:
1) Syncing to the preamble and then to the beginning of an audio block (which at 96KHZ means up-to 2msc) after each stop/start action, but after further investigation I learned that miniDSP output continuous data stream (with NULL-DATA when nothing plays) keeping both sides in-sync at all times.
2) Resync after FS change (up-to 6 msec on DAC3), but that is not a problem with miniDSP as it is locked to 96/24
3) Processing time on DAC3 (set to 0.82 ms at 96 kHz) which is a fixed value and can easily be solved using miniDSP delay adjustments
4) Clock skew/drift between the source (miniDSP) and target (DAC3) as the S/PDIF spec allows time difference of 1000 ppm between sides translating to almost 1 second delay after 15 minutes of music (what King Crimson refers to as single track

)
I assume that well built modern products like miniDSP and DAC3 aim much higher than the
1000 PPM allowed by the standard, but even at
1 PPM drift we accumulate almost 1 msec delay every 15 minutes of play time.
The S/PDIF protocol doesn't define any feedback channel from the destination (DAC) to the source (miniDSP) so there is no way to slow down or speedup the source.
This seems to be the real problem.
I'm still trying to understand if and how this issue is solved -
A. A simple approach will buffer the incoming stream reclocking it for perfect timing.
This will create unavoidable drift between the channels after the XO (assuming 2 standalone DAC).
B. An adaptive approach might sample the incoming stream at power up and if the drift seen after a 10-15 second of sample time is proven to be very low (say 1PPM) will humbly accept the source clock and disable the reclocking.
This might be possible with miniDSP being a high quality source.
C. Use ASMR to keep both sides in-sync stretching time by removing or adding samples (after extrapolation) to keep source and destination in-sync.
This will eliminate the drift, but might cause *some* artifact.
Maybe if the clocks difference is very low and the extrapolation done wisely this won't be detectable by human ear