That is only one half of the evaluation. Let us say you have two speakers that don't sound terrible in mono. What can you say about how they sound while used in stereo as they typically are? Will both sound the same or very different in a way that cannot be inferred from a mono evaluation. This is the crux of the question.
So, it depends on the goal. Are you trying to detect "anomalies" in a speaker or are you trying to objectively measure/quantify the audible qualities of a device? This is why I keep saying that you can take a review here as a spectator sport of relative grading (former goal) or as a buying guide based on measurements (latter goal). Two entirely different goals that cannot be conflated with each other or assume one necessarily implies the other.
Moreover, I am not even sure the bolded portion above is necessarily true. There is no binary of "terrible" and "not terrible". There are various grades of deviations in multiple dimensions. So, the "stereo" effect may mask some degree of "terrible" but not some other degree of terrible. Unless you quantify and establish a threshold for the latter, you cannot assume any degree of terrible isn't masked by stereo.
(1) One can say measuring stereo is difficult or that there is no standardized way to set up and test a stereo speaker. That may be so but it doesn't imply the testing in stereo isn't necessary or useful.
(2) One can say that two speakers that measure the same in mono (which is entirely theoretical since no two speakers will ever measure the same) won't sound any different in stereo. While, there may be some justification for this in electronics, it becomes a lot more tenuous in evaluating speakers since there is no such thing as a perfect/transparent speaker.
So the whole argument for the case of mono testing boils down a very flawed and artificial constraint on the speaker behavior.
A speaker so terrible that stereo won't save it (and so stereo isn't necessary) but not so terrible that only mono measurement can expose it (and so mono is necessary).
At best, that seems like a self-serving rationale for (1) above.
I think you raised a very good point about stereo testing: there is no standardized way to set up a test for stereo listening. This leads to the conclusion that current reviews in stereo are at best a crapshoot - some may be good some may be inaccurate. This reduces the value of stereo reviews - although not eliminating it completely, you are now left wondering how much of the review is about the room interaction and how much is the pure speaker? I just believe the heuristic for speaker performance is much more predictive with mono testing than stereo testing. I'm not saying that stereo does not offer unique performance measures that should be investigated but I do not believe it would contradict the conclusions of a mono review.