My experience is for evaluation, mono is probably more revealing as your brain has less to focus on. Stereo starts to have much larger dependence on the environment.
Speakers are all so bad, fancy automated behind a screen shuffling may be valid for voicing one speaker, but between different designs, the differences are so gross I do not believe it is needed. Accurate eq is far more significant. Speakers are bad enough I can get the "signature" just in a store and when I find it in another, that memory holds. Electronics are far harder. Back in the old days, we had a remote control relay board to swap out components in a crossover for "voice by group and wine" sessions. Test equipment back end was out of reach of all but major corporations. I hope some day speakers will catch up, but materials science is not encouraging. The engineering going into large membrane, multiple drivers is making leaps and bounds due to the OLED market.
I never see much on which recordings are used for evaluations as far as why. I have a select set of cuts that I know emphasize defects in the speakers and amps ( and now DACs) I have other music that on about everything, you just chill and listen ignoring the differences. My most reveling tend to be very simple orchestration. Single female voice, single classical guitar, single piano, dominant trumpet solo etc. Easier to get lost in a full symphony to good old rock and roll.