As long as there is a reference to compare to, single speaker evaluations have the highest resolution. Evaluating without a reference means that you are in the "circle of confusion" dance - you are relying on quite large errors and/or your "memory" of good speakers. When errors get small, it is difficult without comparing A-B to the reference.
That said, it should be a goal to start with a linear on-axis response with an even dispersion and width according to personal preference. Finally, mixing and mastering does not follow any strict rules to adjust for the centre phantom image timbre. Some here say they do (perhaps some do), but others I've spoken to, does not follow any "anti-Shirley" EQ curve at all. They use EQ to taste. IMO, the "linear response" only refers to what's in the final audio file. Some say the ideal is have the same signal at the ear drum as what the mixing/mastering engineers had. I don't know how to control for that, so I rather have a sound that please me (which is very close to a linear response).
That said, it should be a goal to start with a linear on-axis response with an even dispersion and width according to personal preference. Finally, mixing and mastering does not follow any strict rules to adjust for the centre phantom image timbre. Some here say they do (perhaps some do), but others I've spoken to, does not follow any "anti-Shirley" EQ curve at all. They use EQ to taste. IMO, the "linear response" only refers to what's in the final audio file. Some say the ideal is have the same signal at the ear drum as what the mixing/mastering engineers had. I don't know how to control for that, so I rather have a sound that please me (which is very close to a linear response).