There is a reason to do it and that is to find extraneous factors that lead to differences. Let me give an actual example.
Many years ago in another forum the topic of demagnetizing CDs and LPs came up. There were folks that swore by the latter using a specific DeMag device from Japan. A friend and high-end manufacturer volunteered to capture the output of a virgin LP (never played before), first as is and then after the de-mag process. I listen to the two captures and there was clear difference with the latter sounding more open. Objective analysis using an audio editing program showed clear differences as well. Quite surprising for those of us who thought there is nothing to demagnetize in an LP.
Fortunately, I thought of another possibility and I asked my friend to capture a virgin (never played) LP twice without any de-mag process. Well, wouldn't you know it: the same difference was there! In other words, the sound of LP improved on second playback compared to first. That would explain why people would hear a difference. But of course association with the de-mag was faulty.
Now, this kind of thing happening is rare so likely all such tests will fail but it doesn't mean we shouldn't try it.
As to not believing it, yes, we would be incredulous but as long as we can replicate the test, all is well. We could do that and do our own analysis of any audible difference. Or show that the test was wrong.
You are right. I think it is fair to say that a lot of interesting science came out from the pursuit of the mythical phenomena, like Aether, and from trying to convince stubborn old men out of their beliefs, like the cosmological constant. So I can see how there might be value in investigating some of these more outlandish claims, which I understand is also your point.
At the same time, I also think if someone came to me and suggested we should be sending expeditions to North Pole to look for Santa, I would not be inclined to agree that it is a good idea, despite the interesting discoveries such expedition might yield. So maybe it boils down to how do you decide whether this is an Aether story or a Santa story?
For me, based on pkane's results of -250db difference in audible range, this is more like a Santa story than an Aether story. I will be very surprised if anything interesting would come out of listening tests that is not about the listening tests themselves, because it seems to me it is very unlikely that such a test would detect any audible difference.
In the DeMag device example - if people already knew the LP sounds different between first and second playback, nothing was really accomplihsed other than convincing some people that there is no magic in their fancy toys, was it?