Imo there is a solution.
Here is some background for what I'm about to suggest: The ear determines the direction of a sound source by two mechanisms - arrival time, and intensity. If a sound arrives at the same time from both speakers but is louder from one of the speakers, the image location will be pulled towards the louder speaker. If the loudness is the same from both speakers but the sound arrives earlier from one of them, the image location will be pulled towards the first-arrival speaker.
There is a clever (though somewhat counter-intuitive) way to exploit this characteristic of human hearing to increase the width of the listening area within which you get a good soundstage.
The technique is called "time/intensity trading". You'd use a pair of speakers with a fairly narrow and well-behaved radiation pattern and toe them in aggressively such that their axes criss-cross in front of the normal central "sweet spot". The situation from well off to one side of the centerline is this: The listener is very far off-axis of the near speaker, but on-axis, or nearly so, of the far speaker. So the output of the near speaker arrives FIRST, but the output of the far speaker is LOUDER. The net result is that the two localization mechanisms (arrival time and intensity) approximately cancel one another out, so we still have a pretty good spread of the instruments in between the speakers, with the center vocalist roughly in the center. The image locations are not as precise as from listening locations up and down the centerline, but the imaging is still enjoyable.
The SECRET to time/intensity trading being successful is this: The output from the near speaker must fall off SMOOTHLY and RAPIDLY as the listener moves off-axis. In my experience time/intensity trading does not work well with conventional speakers; their radiation patterns are too wide and usually have too much variation, such that timbre is degraded at locations well off-axis.
Credit to Earl Geddes for teaching me the technique. I've been using it for more than twenty years.
If you'd like a visual, here's one such set-up. Underneath the grilles are waveguide-style horns with a 90 degree pattern width in the horizontal plane, crossed over to 12" woofers where their radiation patterns match (about 1.5 kHz). From the location where this photo was taken, with eyes closed the dialogue comes from onscreen (there is no center-channel speaker behind the screen). Note how we're approximately on-axis of the far speaker, but well off-axis of the near speaker:
View attachment 426890