Only the blind test above will provide real proof.
The rest is anecdotal or at least suspect/questionable.
Oh, blind tests can be just as questionable. Blinding removes some bias, it doesn't automatically make it a truth teller. Real example:
We developed WMA audio codec to be twice as good as MP3. Marketing department comes and says they need an independent study to prove that. Knowing that we could not do that across all people and all content, I was worried but could not push back with any good reason. So we hired an outside agency to the tune of $25,000 to hire some 100 people to come and take a double blind listening test. The testing company resorted to ITU-R BS1116 as the protocol with this the gold standard for finding small impairments especially in lossy audio codecs. I was worried in a pre-call with the company until I heard what music they had selected: "audiophile classical music." Classical music is harmonic and as such, compresses a lot easier than say, rock music. The testing company proudly declared that since classical music was what audiophiles used to test gear, it surely made for best test material. WRONG!
They proceeded to hire the 100 people and what was the outcome? Better than 90% of the listeners thought that our codec at 64 kbps sounded the same as MP3 at double the rate at 128 kbps. Marketing was happy and press release declared the same.
Of course I had countless audio clips where the above was not true. That I could easily tell that we had issues with transient response and such at such low bit rate of 64 kbps (21 to 1 compression!). But here it was, a full, standard compliant, double blind test saying otherwise!
Where the testing company went wrong was that they did not know how lossy compression worked and as such, what kind of content would find issues with it. And further, the role of trained listeners in being able to hear artifacts that general public could not. They hired people from a local mall, gave them a few dollars to take the test which assured few if any critical listeners.
I listened to their test and despite them being easy to compress, I could tell we were not as good.
Blind tests are only good for controlling bias. That alone does not in any way assure that the truth is out, or that there is any real proof. As you know, Harman has used blind tests to say we like exaggerated bass. I am sure I have read that you don't agree with it. And I don't either. Yet, we have the research. We have the blind test. Clearly then no "proof" has been provided that at least picky audiophiles like exaggerated bass in headphones.
Conclusion
There is a key truth here: you must look at the specifics to know if a test -- blind or sighted -- is correct. The fact that a test is sighted doesn't automatically make the outcome wrong ("we are all human"). Or because a test is blind and is published, it automatically supports the general conclusions.
Sean Olive's papers always end with a lot of issues not addressed in the testing for above reason. Alas, people don't read those papers or if they read them, they ignore the qualifications.
I take a trained listener sighted tests over untrained and improper blind testing every day of the week and twice on sunday!