I think I didn't explain my idea properly. I'm not sure the idea itself is sound, but from your post, as a start I believe I didn't explain it well, so let me try again:
Measuring phase:
Let's assume a single 500Hz tone is played by the speakers. I see 2 things happening there and both of them involve both, room and speaker. In fact, I don't think they need to be separated but rather treated as one, as both of them are affecting what I hear sitting at my LP.
So, first: the tone that should make 80dB of SPL at my LP actually is making 77dB as a result of speaker response not being linear and the room modes. Second: along with 500Hz tone few additional tones have been unintentionaly created and played by the speaker (summary effect of which we are calling THD). For the sake of this analisys let's assume there were only 2 of them, 2nd and 3rd harmonic. Their amplitude depends on the SPL of baseline 500Hz tone and their frequency is always 2*freq of base tone for 2nd harmonic distortion element and 3*freq of basse tone for 3rd harmonic distortion element.
Processing phase:
Let's assume we have created a lookup table which contains 2 independent variables (frequency and SPL/amplitude of the base tone) and 2 dependent value (SPL of 2*freq and SPL of 3*freq of base tone) which we measured.
Lets assume our table has resolution of 5dB for the SPL base tone variable.
Let's also assume our table has some appropriate freq resolution, probably in log scale for practical purposes.
We feed that table into convoluton engine in the same manner we feed it with FIR filter. At one slice of time convolution engines sees that it has to process 2 tones: 522Hz at 73dB and 1875Hz at 57 dB. Engine calculates amplitude correction for 522Hz and 1875 Hz base tones based on FIR filter as it normally does and modifies signal slice acccordingly. In addition, engine looks ap the "distortion response" table at the closest points (say it is 500Hz/75dB for the first tone and 2000Hz/60dB for the second tone). Engine reads from the table what are expected distortion harmonics amplitude of the tones to be generated by the speaker/room for those 2 base tones and inserts into signal slice the same tones as harmonic distortion components of those 2 base tones would be, but with the opposite phase (or applies some other more clever cancelling mechanism). Engine moves to next slice of time. And so it goes..
So, do you think it would work?