There is a crossover between sub(s) and mains. Unless you get the sub(s) phase to match mains phase around the XO freqeuncy they will never sum correctly.
So we need to back up and define what "sums correctly" would look like in the context of human hearing perception at low frequencies, and the answer is highly counter-intuitive. Briefly, "sums correctly" in this context looks like "smooth frequency response."
The in-room sound field at low frequencies is complex, with loudspeaker outputs and reflections and reflections of reflections combining in and out of phase, and mostly in between, in ways which change with location throughout the room. Fortunately there is a nice neat way to get an excellent grasp of the net effect: Look at the in-room steady-state frequency response.
Floyd Toole indicates that speakers + room = a "minimum phase system" at low frequencies, which in turn implies that what we perceive is predicted by the in-room frequency response. From page 200 of the first edition of his book:
"At subwoofer frequencies the behavior of room resonances is essentially minimum phase (e.g., Craven and Gerzon, 1992; Genereux, 1992; Rubak and Johansen, 2000), especially for those with amplitude rising above the average spectrum level. This suggests that
what we hear can substantially be predicted by steady-state frequency-response measurements if the measurements have adequate frequency resolution to reveal the true nature of the resonances." [emphasis Duke's]
Let's look at the bass region as a whole first, and then come back to the crossover region. The two are of course related, but understanding the former facilitates understanding the latter.
Recall that the ear has poor time-domain resolution at low frequencies, but heightened sensitivity in the SPL domain. (The latter is indicated by the way equal loudness curves bunch up south of 100 Hz. A 5 dB change in SPL at 40 Hz is perceptually comparable to a 10 dB change in SPL at 1 kHz. Arguably, this is huge.)
So what matters most to the ears is what's happening in the SPL domain at low frequencies, which in turn is dominated by room interaction. Thus when we have optimized the room interaction for smooth frequency response, we have solved the problem that matters most to the ears.
The phase of the individual subwoofers can play a role, but it's probably not the role you are thinking. Let me introduce a term that describes a highly desirable but highly counter-intuitive property of an in-room low-frequency sound field:
Decorrelation.
What we want is for the in-room bass energy to be decorrelated; that is, for it to NOT be all neatly in-phase. When the in-room bass energy is highly correlated, we have huge modal peaks and dips. When it is highly decorrelated, we have smooth bass. Room size plays a role: The larger the room, the more room-interaction peaks and dips we have (i.e. the greater the decorrelation), and the closer together they are and therefore the more perceptually benign they are. This is why large rooms usually have more natural-sounding bass than small rooms.
The ear will sum peaks and dips which are close enough together, rather than hearing them separately. This is what happens at shorter wavelengths (higher frequencies) in normal home audio listening rooms: A measurement of the in-room response includes what looks like "grass" because of all the reflection-induced peaks and dips, but we don't hear the individual blades of grass. Instead we hear a continuum (with some weighting due to arrival times, which is beyond the scope here). Our home audio rooms are far too small to achieve "grass" at low frequencies, but a distributed multi-sub system can be a worthwhile step in that direction because the summation of multiple dissimilar peak-and-dip patterns results in
more peaks and dips, as well as
smaller peaks and dips. In other words, a distributed multi-sub system is a way to get a small room to behave sort of like a larger room in the bass region.
And the mechanism by which a distributed multi-sub system does this in a small room does this is,
decorrelation. It accomplishes decorrelation by spreading the bass sources around in the physical domain (locations). This spacing facilitates the outputs of the subs summing in semi-random phase throughout the room.
We can further increase decorrelation of the in-room bass energy by dialing in phase differences between the different subwoofers, which in turn can result in further smoothing of the in-room response.
At frequencies below the room's modal region (in the "pressure zone") the bass wavelengths are long enough for the outputs of the distributed subs to sum essentially in-phase, which typically results in a rise in the in-room response, which in turn can sound boomy. By introducing a significant phase variation between the different subs, we can extend semi-random phase summation to below the modal region as well, which can prevent that undesirable boominess.
Hopefully by now we have shown that the in-room frequency response is what matters most to the ears, and that decorrelation is a key to achieving smooth in-room frequency response (and I have nothing against also using EQ).
Now let's turn to the crossover between subs and mains. Typically that's at 80 Hz or less, so it's well within the region where what matters most is the in-room frequency response. I have nothing against getting a nice neat time-and-phase coherent transition in the crossover region, UNLESS doing so compromises what really matters: The frequency response.
On the other hand if we prioritize getting the in-room frequency response right, including in the crossover region between subs and mains, then we are solving the problem that matters:
"
At subwoofer frequencies... what we hear can substantially be predicted by steady-state frequency-response measurements"
Phase aligning between drivers (in this case between sub woofer and main woofer) is simply a procedure that needs to be done around any XO point you create.
As you are speaker designer I'm sure we can agree on that.
If you had said, "Identifying and prioritizing the issues of greatest perceptual consequence (in this case between sub woofer and main woofer) is simply a procedure that needs to be done around any XO point you create", THEN I could agree with you.