It seems like a good approach: you can level match very precisely after the fact and use an automated ABX software to avoid any sighted bias. Be sure to always use the same playback conditions (same sound level, same speaker, same mic position and settings, etc.) not to introduce any other variables.
I am however wondering how good the recording side (mic, ADC, etc.) has to be in order to capture the very small differences that the two amplifiers may introduce?
I guess that apart of corner cases (e.g., close to saturation) there should be no to very little audible differences. I fear that the latter will be difficult to capture.
[edited typos]