Specifically, some vocals in pop and rock recordings don’t sound very pleasing. Classical and jazz vocals, on the other hand, sound spectacular.
If that result is consistent over a vast number of tracks, there must be a reason for that.
I mean, what is the difference between pop/rock vocal recordings and classical/jazz? One might think on the former some things like EQ or effects are more likely to applied, but that is not a consistent thing. One really consistent difference is the fact that reverb on the recording differs. Pop and rock recordings tend to have artificial, decorrelated reverb, while acoustic recordings of classical and jazz catch the whole reverb in the concert hall including the footprint of the room and dispersion pattern of the voice in that room. The latter phenomenon helps our brain during stereo playback to ´contruct a virtual image´ of the whole scenario.
So I would try to find the answer in the acoustic situation in the playback room and how it interacts with the speakers. My experience with electrostatic planar speakers in a treated/overdamped room is limited, but I have encountered a similar scenario many years ago with some Quad ESL. Like your MLs, these are dipoles with a very high directivity index particularly in the upper mids and presence bands which is unusual for home speakers (Particularly the KEFs you mentioned are pretty broad-dispersion loudspeakers in that band).
So if the room suppresses early reflections from the sides as well as behind the listener, and the speakers tend to deliver a narrow beam, recordings without a natural reverb footprint were tending to sound ´like a laser beam flashlight´, particularly vocals. Overly thin tonality, obtrusively direct, bright, pronounced, artificial. The reverb from the recording was there, but detached from the direct sound impression of the voice. It might be an explanation that our brain cannot bring direct sound and resulting indirect reflection pattern together.
The same happens when listening to dry rough mixes in a studio on constant directivity monitors which are unforgiving regarding tonality (like the popular Kii Audio). Dry voices tend to sound overly bright, extremely narrow, very direct and ´like mono birds on a wire´, as a former professor of mine would put it. Maybe your ears are very sensitive to this scenario and you do not just perceive it as overly bright and direct, like me, but as annoying.
The moment the recording contains a meaningful reverb pattern, like classical music, this effect vanishes, and everything falls back into place. That was the case with the Quads back then. In a more lively room with decreasing RT60 towards higher frequencies hence ´warm reverb´, this problem was basically non-existent.
Note: This is just a theory. To verify it, an alternative scenario with higher and decreasing RT60 in the room might be a good idea.