Good on you for trying. I wanted to do this all week but things kept getting in the way. But I have been thinking about it and I was going to start a thread to report my results. Looks like you beat me to it
First things first - I realised that there is probably a lower frequency limit where this method will work. This limit is determined by the dimensions of your room and the position of the DUT and microphone. This determines whether you are able to capture the entire envelope of the wavelet before the reflection arrives and contaminates it. I am ignoring floor, ceiling, and side wall bounce because those reflections are in phase and in the same direction as the direct sound from the subwoofer (in my room anyway!), I am only concerned about the rear wall. Although the reflection itself is in phase, the distance it has to travel might make it out of phase by the time it arrives at the observation point.
So I did a little calculation. In my case, x = 3m, and y = 9m. Using the speed of sound (343m/s) and the equation (t = d/1000c), x = 8.74ms, and y = 26.2ms. This means I have (26.2ms - 8.74ms) = 17ms of reflection-free window.
However, the wavelet is 6.5 cycles long, and the time of the wavelet needs to be accounted for when we calculate the reflection-free time. Since my XO frequency is 50Hz, I need (number of wavelets * 1000/frequency) 130ms for the wavelet to fully emerge from the DUT. This is much longer than my reflection-free window, which means that I should see the rear reflection contaminate the wavelet after only 17ms.
If you paid attention to that video, they used a free-field measurement for an elevated speaker array, and a ground plane measurement for the sub. That is why the measured impulses looked so clean - no reflections! If we want to translate this method to a domestic listening room, we MUST account for reflections. It will change the shape of our waveform and stretch it, and make the results very difficult to interpret.
I concluded that if I want this to work, I need a shorter wavelet (fewer cycles) and a higher frequency. And also a larger listening room to delay the reflections, but that ain't happening. I was starting to have my doubts if this method would work at all in a listening room. It can't be used for polarity or phase if the waveform is distorted by the reflection. Neither can it be used for time alignment because of the slow energy build-up of the subwoofer which gives the appearance of pre-ringing. All I needed was an experiment to confirm/deny it.
Now let's come to your graphs. I think that it is heavily contaminated by reflections. Both your L and R graphs have an obvious reflection, you can see the coke bottle shape of the measurement - more obvious in L, more subtle in R. But what disturbs me is that the amplitude of your left speaker is much lower than the right. Either the reflection on the right is arriving so early that it increases the apparent amplitude, or you have some kind of channel imbalance.
Anyway, thank you for doing this. You have saved me the trouble! Even before I started, I had concluded that the method was unlikely to work. It should be great if you are a pro and you want to time align speakers and subs in a stadium, but maybe not so useful for normal speakers and normal listening rooms. Of course, I haven't done anything except think about it a bit. I don't have my own experiment to show.