Low Q / High Q: not responsible for "boxy" sound. This is because, as @voodooless points out, at low frequencies the Q is dominated by the room, and at high frequencies there is minimal box interaction.
That might be true to most of conventional speaker concepts, combining a midwoofer with larger enclosure volumen. In this case, any resonance or deca issues can be more or less traced down to the air in the enclosure (such as a standing wave resonance) or the cabinet.
There are exceptions to this rule, though, particularly with compact midrange drivers in restricted volume as part of a 3-way concept. I have taken part in an experiment comparing different enclosure sizes for the midrange drivers, under anechoic conditions and corrected for on-axis frequency response. The difference is astonishingly huge, but the high/low Q is rather contributing to some kind of more or less warm/full-bodied sound character, not necessarily ´boxy´.
Internal cabinet reflections: As @OCA points out, these are easily measurable. Although I should point out that these very early internal reflections are well within the Haas fusion zone and should be masked by the main signal. Anybody disagree with this?
It depends on how these internal reflections/resonance effects have a chance to leave the enclosure. If the reflex port is the main passway, that might be easily measurable. If it is cabinet walls or diaphragms, not so much. We are talking about wavelengths which make it difficult to time-window the measurement, or separate the measurements of different sources, as crosstalk is an issue even with microphones positioned in the nearfield. Imagine a tower speaker, measuring internally 1m in height - the main vertical standing wave is expected to occur around 170Hz - pretty difficult to measure, if you do not have a waterfall plot with sufficient time window.
In a conventional monopole loudspeaker, to maintain a flat on-axis response, the sound power has to be tilted downwards, meaning that there is more bass energy in the room.
It is not only bass, which could be corrected by EQ as our brain cannot really distinguish direct from reflected bass. It is mainly lower midrange, like 300-800Hz. If due to omnidirectional pattern, the latter is dominant over higher frequency bands in the room, you cannot simply EQ it, because our brain would notice the lack of SPL in the direct sound. That is why broad-baffle speaker, such with large midwoofers or similar concepts, do sound ´thinner´ in the lower midrange and much less ´boxy´, as their baffle step is around 300Hz or even lower.
A dipole can have a flat on-axis response AND a flatter sound power curve.
Omni and dipole are the easiest ways (from mathematical point) to achieve both, as the late S. Linkwitz has pointed out correctly. I would argue, that concepts like (hybrid-) cardioids, line sources, curved planars or very sophisticated multi-way horns/waveguides could achieve the same objective, with almost no downsides.