Sigh. You (and earlier replies) have my proposals so completely backward that I don't know where to begin.
Your proposal requires a vast amount of resources to probe the properties of a phenomenon that has yet to be well established, is the trouble. If there was no opportunity cost to such work, that'd be one thing, but we live in a finite world with finite means, so the most efficient means to determine if something is worthy of further investigation wins out. If we can spend a moment to save an hour, or spend a dollar to save a twenty, surely it makes sense to quickly investigate whether the phenomenon under study persists once eyes close.
Regarding comments about moving the goalposts, I never said there was ANY value in listening to recordings of amps. I've tried a dozen or more ABX comparisons online and they are meaningless. Sometimes the producers even add a voiceover saying "It sounded different in the room." The problem appears to be that recordings insert too many intermediaries between the source (Amp #1 and Transducer #1) before the test participant hears the output. So, one is not hearing the tube amp or the solid state amp or the speaker. They are hearing some interaction of different systems and unknown transformation of the sound through a mic, another amp, and another transducer. I fully agree that you'll generally find nothing through ABX of recordings. It has little value so I ignore it.
This is your interpretation, but an at least equally - and in fact I would assert markedly more - valid interpretation is that the recordings reveal that the difference was in the eye, rather than the ear, of the beholder once the veil has gone up. It is quite trivial to make an ADC that is vastly more linear than any of the devices we're discussing, and
@SIY has an extremely high quality one - there is no chance that his ADI2's inputs are in any way distorting the output of a given tube amplifier, so other than the effect of the speaker load (admittedly, a dynamic interaction, but one which could be replicated with a driven load if that's what we wanted to test - and SIY is talking about gain stages here), you really only functionally have the impact of the DUTs and your playback speaker (assuming you have a decent DAC).
What I'm proposing is to consider if and how tube amp patterns of noise and distortion may involve Shepard Tones or some similar characteristic that makes them sound very different to people than to machines. This would account for the difference between the poor measurements and that tube amps somehow "hang together."
While broadly speaking I'm in support of any investigation of subjective perceptions of audible nonlinearities - although of course, there have been quite a few - there's a fairly important chain to follow here in seeking effects. First, verify that the phenomenon in question is audible - what SIY is proposing with his tube gain stage comparison. Having done so, verify that the audible effect has some property worth being interested in (e.g. what you're proposing, that it might be preferable to some).
Then we go looking for
why that might be, because that's going to be exponentially harder to do, and there's no sense going to all that trouble when an ABX could have saved us the time to begin with.
Edit: I'll note, if you think that software ABX tests specifically mask the phenomenon in question,
that is itself a testable hypothesis - there's plenty of ways to skin a cat for blinding listeners, so if you can show that recorded software ABX doesn't let listeners reliable differentiate, but some other equivalently blind and error free methodology (analog ABX with a switchbox, mayhaps?) does, then hey, you've got an interesting result, take it to the
presses peer review.