Okay, that's not what I thought you meant. (The confusion here is because you've been using "actual" to describe two different numbers).
Now you seem to be saying the real, actual tape distance is 3.68m, but you've put an inflated 4.21m into MultEQ-X to try make it send 3.68m to the AVR.
That's now undoing the correction work for the odd speed-of-sound constant.
You say you have REW, but I'm not clear what measurements you're making. If you are sending sweeps to individual channels, you should be able to see the relative timing of them.
If your real FL and SR measurements are 3.68m and 2.21m, then the real flight delay difference should be 4.3ms. ((3.68-2.21)/343).
If you set them to have the same distance (eg 2m for both) in the AVR, so no relative correction delay, REW should show FL arriving at the mic 4.3ms later than SR.
With your original MultEQ-X uploaded measurements of 3.37m and 2.10m in the AVR, the AVR would have arranged a 4.2ms delay. ((3.37-2.10)/300). This is pretty close. With that correction, REW should show FL arriving 0.1ms later than SR. The basic granularity of the adjustment is only 0.1ms (=3cm), so it's hard to do better.
With your setup of putting real measurements 3.68m and 2.21m into the AVR, the AVR will arrange a 4.9ms delay. ((3.68-2.21)/300). This is an overcompensation. REW should show FL arriving 0.6ms earlier than SR. (Equivalent to an 18cm measurement error).
If you put your real measurements 3.68m and 2.21m into MultEQ-X, MultEQ-X will set the AVR to 3.22m and 1.93m. The AVR will arrange a 4.3ms delay ((3.22-1.93)/300), which is correct. REW should show FL and SR arriving simultaneously (or within 0.1ms). And that's what we're aiming for.
You should be using REW to confirm that speakers are aligned. Easley, who did most of the investigation on this, had these before and after REW measurements:
The top "before" graph is what happens when you put real distances into the AVR, and that's what I expect you to be getting now. (A bit better, because your speaker distance spread is smaller - he gets a 800us spread, while you'd have 600us).
If you put real distances multiplied by 0.875 into the AVR (or MultEQ-X does it for you because you give it real distances), I would expect to see results like the bottom graph. (The 100us grid lines represent match the resolution of the adjustment, so bottom graph is pretty much as good as it gets - all speakers within one step).
I believe your current setup, if I've interpreted correctly (it's possible I've still misunderstood), is worse than where you started, purely in terms of measuring what we were trying to achieve, which is perfect time-alignment of the sounds arriving at the listening position.
If you are preferring the misaligned sound arrival, then we're into psychoacoustics. There may be a reason misaligned speakers are preferred. The assumption has always been that we want the sound to arrive
simultaneously from all speakers, with equal level. Hence the distance and level adjustments.
Your misalignment is systematic - it's based on distance, and it's an overcompensation. More distant speakers arrive earlier than closer ones.
Without any adjustment, closer speakers arrive earlier than distant ones, and the brain spots that, and kind of tunes out the distant ones, and steers direction judgement of images towards the closer.
Maybe with inverted distance cues (and balanced SPL), some other uncorrected-for spatial cue is overridden. The brain starts tuning in more on the more distant speakers that arrive earlier, and this compensates for something about them being more distant, on top of the SPL correction?
I've not heard of anyone really researching this. You're welcome to try.
When experimenting, the adjustment to make is the multiplicative one - multiplying all distances by a constant.
If entering directly into the AVR (which is easier), the cases are:
real distance * 1.something: more overcompensation
real distance: overcompensation (distant speakers earlier)
real distance * 0.875: perfect compensation (speakers simultaneous)
real distance * smaller fraction: undercompensation (closer speakers earlier)
all distances equal: no compensation (natural flight delay)
I'm not convinced by the reports so far of people playing with the 0.875 constant, so comparing the second and third cases. I've seen people say it's better, but then there were also people doing early tests who made a change in the
opposite direction (when there was confusion about which way to adjust), thus to the first case, who said
that was better!
I suspect people are just hearing what they want to hear - I have no idea whether there's anything's detectable in a blind test. I just know this improves the measurements, and achieves what we were trying to achieve - perfect time-alignment.
Even if something in spatial cues might be helped by misalignment, proper alignment should benefit a whole bunch of things, helping sub integration and avoiding weird frequency response behaviour on stereo images between distance-mismatched pairs.