I think the resulting quality of all attempts of trying to extract more channels out of a stereo mix will be highly dependent on how the audio mixes have been done in the first place. Some stereo mixes have a lot of out-of-phase content going on and will simply not be 100% mono-compatible, I think we will run the risk of making a mess if we try to extract a center channels out of a mix that has a lot of "stereo tricks"/phase differences going on.
And what changes will be heard if we play a particular stereo mix that doesn't have any distinct phantom channel content, like recordings made with just a pair of microphones in a stereo recording configuration, will a recording like that even work in a similar way as a multi-mono recording if we extract a center channel out of the mix?
Well, if the sound system in use can be configured from 3 to 2 channels with a push of a button depending on if the content sounds better with one or the other setting, there will not be a problem in deciding what works best on a song-to-song basis.
I've done a lot of listening with my array, and as far as I can tell it makes nothing worse. In cases where there is not much phantom center it still sounds great. The one downside is that music without a strong phantom center actually sounds perfectly fine on regular 2 speaker setups, and works better for more positions in the room. I had a friend over yesterday listening to my array and he was sitting in the sweet spot so I sat off to the side. I hear a solid mono center from the side, but the stereo effects are diminished and spacial separation is all but lost.
With my setup the potentially lost signals due to phase issues are restored so long as the speakers are kept close together, like I do. Keeping the speakers close is the key. Any spaced speaker arrangement, whether 2 speaker stereo or more, will have coherency problems whenever sounds are not pinned to a particular speaker. Phantom images between speakers are bad news. Any phantom images should be to the outside of the speakers, never between them. It's actually not quite that simple. More accurately, any phantom images should be achieved by crosstalk reduction, so if someone is using something like BAACH or ambiophonics then it's ok to have a phantom beween speakers because the crosstalk reduction will eliminate the comb fitlering and crosstalk induced imaging confusion.
I consider my system to be a low tech alternative to BAACH, not an alternative to up-mixing alogrithms like Dolby, which are intended to be used with widely spaced speaker arrays. I've played with Dolby Pro Logic and it does a great job of isolating sounds panned center, left, and right and depositing them in the correct speaker. I was surprised that I didn't like Dolby as much as my own channel mixing matrix. I'm pretty sure the reason is that my arrangement achieves some crosstalk reduction while Dolby is a virtual approach. I would have thought that real multi - channel mixes would be the ultimate, but now I'm more sold on well made 2 channel recordings listened to with crosstalk reduction. I think the old mantra that we only have two ears, so ultimately 2 streams of sound are all we need is true, so long as only 1 stream reaches each ear, which means the streams don't cross into the wrong ears. With my system they do cross a little bit, but in much less harmful way. The crosstalk is reduced, and what does reach the wrong ears is out of phase across the head, which means it does not contain reverse location information to compete with the correct signal reaching each ear. Instead it just adds spaciousness.
Using more channels is similar to what Microsoft was experimenting with for VR goggles. Depth effects are difficult to really get right because the goggles will have to track your eye convergence and adjust the scene to make foreground and background items go in and out of focus appropriately. Because of this difficulty, Microsoft proposed having multiple transparent displays at various distances from the eye to deal with this. This is similar to more channels in multi-channel audio. It works better than not doing it. But what about objects at distances that are between screens? They're going to be blurry because they have to be partially projected on two displays at once. This is the same as the problem with sounds that are between speakers. Utlimately it's a flawed approach unless your going to make curated material that doesn't produce anything not distanced precisely at a particular screen depth, or with speakers at a particular speaker location.
In a nutshell, I think the ultimate solution is crosstalk reduction for audio, not more channels - although more channels is better for providing lower quality imaging effects over a larger area of the room. For VR headsets it's eye tracking to calculate convergence and then adjustment of the image and mechanical lens focusing accordingly. Not easy.
One last thought - it should be possible to combine something like BAACH with multi-channel. You could have 7 speakers arrayed across the front of the room, and something like BAACH could allow phantom images between any two speakers to be free of crosstalk anomalies. It would basically require 6 times the processing power of BAACH, and might sound slightly better.