Your example is about cherry-picking, which is worth discussion.
"trust" can be a reason to require higher test confidence than would normally be needed. If a person does consistently get 6 of 10 right, the odds are he is hearing something real. However, this confidence being only slightly better than guessing makes it easy to cheat. 62.3% confidence means 37.7% chance to score that well by guessing. Even if he isn't hearing a difference, just keep taking the test until he passes. At 37.7% that shouldn't take more than 3 tries. Then he can cherry-pick that one test and say he's hearing a difference. But this cherry-picking is only fooling himself. What is the point of that? It undermines the goal of education & fun.
So using 95% confidence has the benefit of making it harder for people to pass one test by accident and cherry pick it. They'll have to take the test about 20 times, more or less. But the drawback is that this high confidence has false negatives, people who consistently outperform guessing and really are hearing something, but scored as "fail" by the test threshold.
I am 100% fine with the side arguing in favor of impossible things, of them cherry picking their BEST evidence and scores. I want to see what their best case scenario has to offer. If they can cherry pick, I don't mind in the slightest as it still qualifies as some scientifically conducted evidence. But when I say cherry pick. I don't mean running 100 trials, and not publishing 98 of them for example. I hope people pick studies that people find fairly and rigorously conducted, and then we can go over the value of each portion of the study.
I have no problem with tiring people out until they collapse if needed. So as long as that is accounted for in the study. There is nothing wrong with this in my book so as long as each side can agree at what thresholds fatigue is having an effect. Having people score 4/5 is also data if they can do it consistently. But then we have to run more specified studies surrounding the parameters that allowed them to achieve 4/5 for example.
The problem with people who are claiming "i can hear a difference" is the fact that sort of natural language has no formalized mathematical meaning when they say that. If they can specify what "I can hear a difference" actually means then we can perhaps understand each other better. What ends up actually happening is we will assume "I can hear a difference" means "You can hear a difference in any practically conducted scientific or reality-based home-tested scenario"? But then when the tests are done, the subjectivist has to clarify "The pressure got to me" or "I was hungry" and other such excuses. Which in my book are fine. But they should have told us what conflicts exists when they say "i can hear the differences between cables" actually are. Don't tell us after the fact you agreed you were fine to take the test, and then the test is invalid because you forgot that being hungry would have such an effect on you.
In the majority of cases, when both objectivist and subjectivst agree on terms for testing, it is nearly ALWAYS the case the subjectivst is having qualms about the event after the fact. It's fine if you want to say "oh but this can lead to false positives". Well then why did you agree to these aspects of the test, and why don't you simply propose your own parameters and then we can hash out if they will satisfy both sides?
I am very sorry for going on, and on, and on. But I cannot stress the massive miscommunication due to the limits of natural language, that occur when two parties are debating or testing the validity of their claims. This is why having as clear of an understanding what each side means when they say things is far more important than virtually any test that I can imagine. This is why many highly educated people prefer to use formalized forms of conveying ideas, and not just natural language (scientific tests are an extension of trying to remove natural language interpretations of results, which is why things like numbers are so valuable since all but the insane agree equally on what they mean with no room for interpretation).
95% confidence ratings need to be used if a strict deductive affirmation is going to be claimed. If we're going to make inductive statements, then 95% confidence is uneeded as we're only testing the amount/frequency something is or isn't the case.
I can think of a test that accounts for fatigue and such really quickly. For people that claim they hear cable differences, I can instantly take away the only excuses subjectivsts still have left and invoke majority of the time, (fatigue, and pressure), and simply swap cables in their own homes anytime they're out of the room, without them knowing it's even swapped. And all they would have to do is simply say when they detect a the cables were changed. They wouldn't know when it happens, nor what cable is which (though they get to choose the cables in the start they claim they can hear differences between). So no fatigue or claims about how they're under pressure. Anytime they don't say a cable isn't changed, isn't counted against them. The only thing that will be graded is when they claim the cable was changed from the last listening session they engaged. Otherwise they can conduct their daily live unobstructed.
But even there, they will conjure new excuses, and will say "they felt pressure whenever they had to second guess themselves" or "my hearing might've gotten worse". Totally unaware we're not testing how much of a difference they can hear, simply testing whether what they hear was actually due to a cable change or not.
At some point, you reach a phase where walls of practicality are hit. And the amount of excuses simply exhausts both sides, and finally a conclusion is made out of simply so much probability being stacked against one side - that it's pointless to even entertain the topic anymore. (It gets comparitively unwieldly as the length of this post, and by that point, only but the most OCD acedemically inclined are still left wondering about the truth).