There is nothing wrong with the design of the Sierra-1V2. You should not be using a CSD waterfall plot to evaluate frequency response. CSD plots are for examining resonances. All of the many different frequency response measurements should be used to evaluate frequency response.
You see a dip in the CSD fundamental because the NFS is not capable of producing anechoic CSD plots unless the user is also using an additional module for the NFS known as ISC (In Situ Room Compensation), and even then it is not truly anechoic. This NFS module did not exist when Amir first started measuring speakers, it is also an expensive add-on. We own it, but don't use it very often so I am not sure if it is worth the expense for Amir.
As such, in order to minimum room effects in the CSD measurement, Amir takes this measurement using a mic distance of only 1/3 meter (about 1 foot) with the mic at the reference axis (the tweeter) At only 1 foot away, there isn't enough distance for the woofer and tweeter to properly blend so you see that dip at the crossover frequency. With Sierra-1V2, you need ~18 inches of distance for the tweeter and woofer to properly blend.
Again, you should not be trying to evaluate frequency response from a CSD measurement's fundamental.
Hope this explains it for you!