One straightforward method for finding the optimal distance between loudspeakers, also optimal toe in that I would suggest is by using transients such as this:
To elaborate, note that this kind of waveform contains pitch down sinewave starting from about 17kHz to about 44Hz. However, if you look closely at the time related to the mouse cursor, it appears that the signal pitches down from 17kHz to about 90Hz or so in 2-3 milliseconds. This goes down way too fast for us to perceive it as anything but a "click", with a substantial weight of mid-bass.
Optimal setup should have a very clear and pinpoint focus on that. A "click" should be very tiny, floating exactly in the middle of phantom center, augmented by very focused and strong mid-bass. The rest of the waveform contains low frequencies and on a system which is capable enough, you should get a very distinct and focused "kick", followed by envelopment of lowest of frequencies and soft decay. Low frequencies are long in duration in comparison but should never mask the higher frequencies. Subjectively, it should be perceived all at once, but high to mid frequencies should never be masked or loose focus. Also, they should be loud enough in comparison to lows.
You can start wider with no toe in, and then gradually decrease the distance and introduce some degrees of toe in. This will depend on speaker directivity. The "sweetspot" is usually where mid-bass is strongest with the same level of amplification.
It is hard to describe, but once you get it right, what you hear can be an "aha" moment. On the headphones, the pinpoint click would obviously be right in the middle of your head. On loudspeaker system, it should be even more distinct right in the middle of the phantom center. To test the image stability, you can move yourself out of the comfort zone of your MLP and see if you loose any focus.
To me, this method is all in one to show what your loudspeakers are doing and what your room is doing. It shows direct to reverberant ratio in a very distinct and simultaneous manner.
I find it funny that usually there's so much gear in between the speakers that they can't be setup any closer, form over function