What's weirder to the eye vs the ear, is that even if the alignment of the harmonic waves is lost, the steady state sound heard is the same.
This because the waveform is still strictly symmetric. I did tests with a fundamental and a 2nd or 3rd harmonic with varying amounts of start phase offset in 22.5° increments. While the 3rd showed neglegible change of timbre vs. start phase (position of the harmonic on the fundamental) when stepping through the 360° range, the set with the 2rd harmonic shows a slightly wandering timbre.
There are two offsets that have maximum contrast of timbre and those are 180° apart, and there is a second set where the difference is minimal. This set again is 180° apart but rotated by 90° vs the first set. The base offset depends on your playback system and when we have flat zero phase in the range of the frequencies this offset is zero. Inspection shows that for the maximum set the wave-shape is maximally asymmetric and for the minimum set it is symmetric in that flipping the wave is identical to revert it in time.
With a linear-phase highpass (say, a closed box with 2nd inverted phase highpass filter to form a 4th-order linear -phase alignment) and the fundamental close to or left of the tuning frequency the waveshape remains almost fully intact, the fundamental just has reduced amplitude (to the point of being cancelled).
In the waveform this shows up as the flat tops of the square going to a concave sine shape, reducing the fundamental, as shown by
@UliBru in
https://www.audiosciencereview.com/...ds/time-domain-measurements.12951/post-455827
And when the content has asymmetric waveforms, a linear-phase high-pass target for the acoustical transfer function fully preserves the relationships in the waveform.
The drawback is that we have a pre-ringing with exponential rise, being the mirror image of the post-ringing in the step response which is certainly audible with some signals.
In a crossover, things are different. On-axis we have a flat and zero-phase sum and the ringing cancels out. Canceling also happens in a minimum-phase corssover, just that the off-axis ringing is post-ringing only. The point is that we are just more prone to perceive pre-ringing than post-ringing due to different masking. IMHO linear-phase XO works best with coaxials or symmetrical W-M-T-M-W placement, or beamy transducers/horns.