I admit, I haven't read all accompanying technical articles linked before. Honestly, I'm not really sure about such metrics.
Similar methods are there for quite long time, I recall for example thread at Gearslutz site, where someone collects AudioDiffmaker results from various AD/DA loops.. I personally don't find it very useful, rather misleading. This seems to be similar, just with defined set of various test signals and statistics.
I commonly use residuum analysis after subtraction for various small isolated comparisons of changed components in common chain, debugging of issues, quick checking of transparency etc. and it's indeed very useful for that.
However I was always bit skeptical about single all-encompassing figure based on signal difference level or correlation depth for generic evaluation of audio devices. Or more precisely about relation of such single figure to some perceptual differences.
Such difference is level rough indication about similarity of input and output signal and in general of course, the less difference, the better. But in practice it's not so good indicator for overal comparisons IMO, because there can be multitude of reasons, why signals are different (like all kinds of distortions with very different characteristics, phase shift...) and basically not all those reasons has the same weight in perceptible difference.
So two DUTs can have very similar overal level of diff signal, but one of them will be more transparent sounding.. for example because distortion will have different frequency distribution and different characteristics. Or with different example.. some device would have say 30 dB lower level of diff signal than another one. At first look it can indicate, the second device is vastly inferior to first one, however there can be just minimum phase interpolation filter at its DAC, which cause big difference in diff level, but in listening test, it's not necessarily perceived as problematic.
Also there can be issues with clock drifting of source and capture devices if those aren't synced, there are certainly ways to compensate that in measurement with varying degree of efficiency, but in that case you're also altering effects of clocking at source device, which is still one of important aspects for comparisons.
I'll definitely check those papers.. and please don't take my initial comment as sheer negativity towards your work or "diffing" in general. Just with my previous experiences, it had several natural limitations, which is why I personally wouldn't use it to make any general chart of various audio devices.
Michal