Firstly, thank you, Serge, for responding with those test results and your writeup.
Does this correspond to my conclusion (from df measurements) that discerning them "needs very careful listening"?
About as careful as I'd expected--especially via desktop PC speakers in a room with a noisy bunch of fans in a PC case and a fan heater...
ABX tests are "torture tests." It was definitely easier than ABX'ing, for instance, 320kbps AAC with an uncompressed version. (Which I have done, and yes, I could. I think one of the tests I did was over 12 trials; I have the results somewhere. I used IEM's for those tests.)
I'd want to also do the other pairs--particularly, the "Original" and
"Softube Trident A-Range." Off the top of my head, I wouldn't necessarily bet on being able to tell the difference between those two in an ABX Test.
Here's the "Delta Waveform" plots from DeltaWave for the "Soft Clip" and "Softube Trident A-Range":
Once again, it can be seen that the "Soft Clip" process (top) has a much larger effect on transient peaks. That's what to listen out for--it's what I listened for when presented with "X."
At least now you have an idea how audible is the difference in 1.3dB of df levels in case of similar artifact signatures (distance = 0.18dB). This will help us to draw conclusions for your other test items:
[snip...]
Similarity of their artifact signatures:
I'm afraid the "8-Bit Dither and Noise Shaping" result is invalid--as I mentioned, this was performed by a Reaper "JS Script" plug-in, and the noise shaping loop produced an output signal exceeding 0dBFS, which causes the plug-in to turn off its noise shaper. Presumably at 8-bits the noise shaper was unstable? Anyway, I used a version of the script that I'd modified to turn noise shaping back on in case of overs. It does this by waiting for a certain number of samples before doing so; I wrote it so I could use it on the Master bus without having to keep resetting the noise shaper in case of overs. (Usually 24-bit output, so that would be "my" fault for going over the Master bus rather than the noise shaper going off the rails...)
In a nutshell, the result was that the noise shaper was turned on and off throughout the file.
Perceived audio quality of eight DUTs (from bottom of the dendrogram) can be assessed by their df levels. Probably WaveArts Tube Saturator Classic can be also included into the cluster of close DUTs.
OK, before proceeding further, I think it would be helpful if you define/explain some of your terminology. (Please forgive me if you've already done so previously in this thread.)
Particularly--artifact signature, distance, what the dendrogram represents, what the plots that you've posted for each file are.
Df levels of other DUTs are not indicative of their SQ.
Because...?
Now you can take df levels of DUTs from "good cluster" and compare them to perceived closeness to original. For example one can take samples from good cluster, listen them and sort according to closeness to original, then compare the found order with df levels.
What is "good" about the "cluster?" And what do you suggest to use for "listening and sorting"?