I'm also using KH120s now for my rear side speakers in a surround setup, and I've noticed a big difference in having them matched with my front pair (as of about ten days ago). Sounds are much more cohesive and seem to establish a much more continuous space around me with surround mixes, even within the limitations of my room -- and even off-center. But they're almost all equidistant, too (the KH80 center is a bit closer to make up for the delay). Granted, my previous surrounds were Genelec 8010s -- now height speakers in a 5.1.2 setup -- which have a very different tonality, but I can also hear the difference between the KH80 and KH120, so I also wouldn't discount that as a factor.
The difference between the KH80 and KH310 must be even greater. You have KH120(s) in your gear list. Have you tried up-mixing with them, instead? I would bet that they're a closer match to the 310s than the KH80 is. I'm a bit out of my depth with this stuff, though, and it's possible that your 310s + 80s in a treated room would beat my KH120s very handily in a comparison, if one were possible.
As a filmmaker, I'm trying to think of the analogy of the stereo "image" with a real image. I think for effective up-mixing, you would need to have a properly calibrated quad or 5.1(+) system first. And the up-mixed stereo image would ultimately begin somewhere to the left of the L speaker, and end somewhere to the right of the R speaker, which means the coherence (?) of the two fronts would be altered, so your KH80s would actually be doing some of the work of representing the original recorded stereo image, not just duplicating or synthesizing part of it. And I'm afraid the KH80s are going to be badly outmatched even by the KH120 in that department, let alone the KH310.
There's an additional question about whether it makes sense to do this kind of up-mixing at all, instead of just widening the distance between the front speakers. Because in the end, what you're getting by widening the image like that is something more akin to the soundstage presented by a pair of open back headphones without much crossfeed, and with relatively uncontrolled tuning. Someone can correct me if I'm wrong, but if you're "blowing up" the stereo image (on analogy with visual images), then you would need a highly coherent sound field to properly represent it. You would also need to think about the vertical dimension.
If you think about an in-room stereo recording, you have two microphones that encode sound information for two point-source speakers. Mess with that, and you're likely to hear various kinds of weird effects depending on the space. I'm guessing that's why Atmos versions of older live jazz are generally just presenting room effects, with the sound stage squarely in front and represented by the front speaker pair. My theory, as I've said elsewhere, is that such recorded reflections could mask the reflections in the listening room, because they arrive earlier and with more energy. But that is very different from wrapping (smearing?) a stereo image that was meant to be perceived frontally, around you in a rectangular room.
Even perceptually, our focus can't comfortably shift in the same ways when an image is too big. Think about sitting too close to the screen in a theater. [EDIT: And now think about wrapping the image not across a 180° curved screen, but projecting a third of it onto each wall.] I think we're often fatefully front-oriented with our senses, including hearing. So that may have something to do with your disappointment with some recordings. Maybe it's inevitable, to some extent. I would probably just calibrate a surround system and seek out recordings in Atmos. Rendered to quad on a Mac can sound pretty good in my experience. And do the upmixing when it works for stereo recordings that you just want to sound "big".
By the way, I'm not even sure how you measure soundstage. Has anyone found a way?