<Please bear in mind that I am not a speaker designer, have not built my own in years, and those grad acoustics classes were decades ago.>
Another simple question (albeit a very good one!) with a complex answer. The basic idea is for all frequencies to arrive at the listener at the same time, i.e. in alignment, assuming they all started together at the source. The crossover can be tuned and phase/delays adjusted electrically and/or mechanically to provide correct time alignment whatever crossover order is used. A first-order crossover does not ensure perfect time alignment; far away from the crossover, the highs and lows will be out of phase (sometimes you will see drivers wired "out of phase" in the box and, assuming nothing is wrong, that is one reason why). The drivers and box design (sealed vs. vented, choice of size, etc.) also affect time alignment. Some designs stagger physical positions of the drivers to improve time alignment, but that is also affected by the crossover, since the crossover (as well as drivers and boxes) influence phasing. There is only so much you can do with driver positioning since the phase varies over frequency. Broadband correction these days means DSP since filters and delays can be readily designed in the digital domain to correct crossover and mechanical influences.
How much time alignment matters, i.e. how audible it is, is debatable. I have leaned towards speakers with good time alignment but they have other attributes that I prefer so I really could not say if my preference, or how much of it, is due to time alignment.