I just read the paper. It doesn't say much about sighted listening being any good. It does start with this telling graph which I have mentioned countless times:
View attachment 28294
Out of megabits/sec of of audio data, the brain only captures a few bits/second. No way, no how does the brain capture for the long term minute details in music. You would go crazy in real life if you had to remember such nuance in my voice as we were talking for example. Or the background noise in your home or environment.
So anyone who says they listened to something for weeks, then switched to something else for weeks, is just wrong.
As to the thesis of the paper, the conclusion does it in: that most of what they are talking about is good practices summarized in ITU BS1116, "Methods for the subjective assessment of small impairments in audio systems." That is, familiarity and training matters when it comes to testing.
And yes, "slow learning" is super important. But it has another name: training. It took me about 6 months to get trained to hear the smallest amount of lossy compression artifacts. This does NOT at all say that when conducting listening tests, I need 6 months, or even 6 minutes to hear differences. Indeed, switching times over 1 second severely limit my ability to detect small artifacts.
The long process for hearing small artifacts may be needed to find the right artifacts. Sometimes it takes me half of hour of comparisons to figure out what to focus on. But once there, the actual comparisons occur in milliseconds, not "long term."
And again, there is nothing in there about sighted listening being more correct. Yes, you can do the training sighted, but the final test better be blind to mask other factors getting into the test.