The device tested was a TASCAM US144 Mk.II. With that one (not HiFi by any standard) the modulation is there. I've tested this with two different units (happen to have two of them and a US122 Mk.II, as well) and in a series of experiments to make sure what I'm seeing is correct. It's real. But, mind you, it's only visible once you climb below the analog noise floor.
[side note]
The hard part was the manual nulling (Paul's DW software would have been of great help). The recordings to obtain the block averaging from were made sequentially (not interleaved as I would do it today) and that caused that even the most minute on-average changes of ref and clock make the residual explode from linear (trivial) differences. Effect of V.Ref change is easy to explain, clock change less so: when the clock is, on avgerage, marginally higher in the second take vs. the first one this means all the analog filters shift corner frequencies accordingly, being a bit low in effect (as seen from the clock rate). That causes minute phase differences even when magnitude difference from this in the passband is really low. The phase differences are most prominent at the passband outer edges, giving the residual at bath-tub shape. I could perfectly model the apparant analog filter change in LTspice and get the same exact shape of the residual.
This effect is also seen with the RME, though to a much lesser extent because the analog filters, notably the high-passes, are at lower (or higher, for low-passes) frequencies, plus the clock seems to have a better absolute long-term stability. With this test I also see that the clock in the RME needs about 1 hour to settle, that is, until thermal equlibirium in the device is reached (it get's hot, that thing). This shows the extreme resolution that can be had from heavy time-domain block-averaging.
That's why I had to develop the interleaving method which greatly removes the effects of clock and ref drifts, after all.
[/side note]