I think the general and accurate conclusion, at least in this community is that most tubes don't measure well and at the same time bad/mediocre measurements might not easily be reflected in normal listening conditions - and as far as I know even Amir did have positive subjective impressions with certain badly measured designs (under certain conditions).
My point is that we've been having subjective impressions/conclusions besides our pristine and accurate measurements for some time now and frankly we even have rankings now. Secondly, it actually contributes a lot of value to people that visit this site. Not everyone is primarily interested in the final measurement, they also want to know how it translates to listening experiences (might be the main reason why we now have more elaborated subjective impressions that we didn't have years ago? idk) - so why not continue to do that? For example, if the distortion is completely inaudible then great. But hey, if it's more or less the same in the first case but this time you can hear it loud and clear if you use a pure 20Hz sin or with some other specific conditions then it's another great discovery. Readers that are interested in this audio piece now have concrete information and they now can decide to do what they see fit with it!
Even if we go with your cow sh!t analogy I still think there's nothing that can be worsened if we have more information about a product in different angles - which is why I think milosz has a good suggestion because it can further remove bias of confirmation (if there is any).
And lastly, you say you don't see why you should do an experiment like that, with a device most people buy to play music (or maybe as a decorative piece...), but you'd rather do an experiment with 20Hz sin wave? I mean if we look at the facts, we already know it measures badly and is quite expensive. Nobody will try to deny that, it'd just be more helpful to know how it performs in the most realistic scenarios (which is playing music).