I'm surprised you haven't looked at the underlying research. Preference studies fall into the same type research done in all controlled studies in social science and psychology. Set up a situation carefully enough, and human behaviour falls into repeatable patterns. It makes the engineering simpler by defining the limits of perception and, accordingly, the design parameters.
This is a good example of the limits of perception. What you're capturing is not the space, but the pressure fluctuations at the points where the microphones are located, according to their polar sensitivity. You're then reproducing the pressure fluctuations without directional information through loudspeakers, which are coupled to the room they play in. And yet the end result still sounds good.
The situation isn't simplified by using loudspeakers with restricted directivity or dead rooms given the way that you're capturing or synthesizing the soundfield in the first place. There are aspects of playback which you simply cannot control. But you can, however, use their characteristics to your advantage.
This ambisonic setup for example isn't the definition of soundfield accuracy. It has two main advantages besides the ones offered by other multichannel systems (no reliance on speaker location or discrete channel count, so you can input any format you like) and many disadvantages (small sweet spot, fuzzy or smeared imaging, timbral inaccuracy and spatial aliasing), even in anechoic circumstances.
In all other situations reflections are integral to the playback system by supplementing the distribution of spatial energy through acoustic phantom sources and can be controlled through treatment.
The main caveat, or frustration, I guess, is that you really can't take full soundfield accuracy as your goal. None of the known system configs really for allow it. Even with that in mind, I'd really like to hear the WFS system at EMPAC one day.