Firstly, .3 db is enough to make the louder one sound better even though it won't sound louder to you. The caveat being there is no other difference. If one is really brighter as in different frequency response that would be measurable without a listening test. You want to get within .1 db to be sure. .2 db might work, but more than that can confuse a result.
I don't know how a microphone would be better in a scenario with amplifiers or anything else as long as the same speakers are used. Phone apps or even good SPL meters are more difficult. If you have to use a microphone, using your Umik, use the readout on something like REW by looking at the FFT. Just using the SPL function other noise can make it hard to get precise readings. Especially using lower frequencies like 100 hz low frequency noise even from a truck a block away could change the reading by a small amount using raw SPL numbers. You also could have standing waves at 100 hz in your room meaning you need to be extremely picky about maintaining the exact microphone position for setting levels. Using filtered pink noise with a microphone or SPL meter is probably a better choice.
I've done testing just to see about matching with a sound level meter. On tones even at 400 hz you could stand in a different position of the room (leaving the meter in the same position) and the disturbance of where you were varied the level .3 to .4 db. Where you were altered the standing waves a bit.
Sorry for the extremely late response.
I am aware that volume level is crucial, so please don't take anything I write here as an attempt as saying that volume level doesn't matter. I've "remastered" a lot of music by running it through an equalizer, so I've seen first-hand on an almost daily basis that matching the volume level to compare two things is crucial.
Nevertheless, I don't think the volume level was the issue in my case, although yes, this is to some extent an assumption. When one is louder, unless it's too loud, it will sound better in the way of fuller, richer, more highly resolved, more energetic, etc.
The three amplifers I compared were a Naim NAC 202/NAP 200, Arcam SR250, and a Nord 1ET400A. It seemed to me that the Arcam had a slight midrange boost around 300-800 Hz or so, whereas the Nord had a colder, steely sound, which was perhaps less of a dip in the lower treble or less roll-off at the very top. That was the amp I ended up turning down a notch, but after turning it down it still sounded colder. The sound of the Naim was a bit more difficult to put into words.
All of this sounds like frequency response to me, although I would say that in the case of Naim perhaps a bit of reverberation was added to make the sound more "lively". On one song the Naim sounded a bit less bright/shrill than the Arcam, but was still more fatiguing somehow, and it sounded seemed like more reverberation perhaps. I know this is not a very scientific term, but on several songs I found the Arcam to sound more "calm" and "controlled" than the Naim, which was more "lively", "bouncy", "wild", etc.
Although it varied a lot from song to song, overall I liked the Arcam the most, so I kept that one, returned the Nord, and sold the Naim, which was my previous amplifier. I would, however, say that the differences in sound were all rather small, but with the Naim I had often felt a piercing knife-like sensation in my ear, which went away when I switched to Arcam. Once, I found out where the volume knob had to be for the Naim to clip, and I never played that loud anyway, so the "knife" probably wasn't distortion.
Of course, all of these differences in sound could just be imagined. Nevertheless, what I saw was that I heard some small differences in all or almost the songs I listened to, and although I couldn't always decide which one I liked the best, at least I could sense some small differences (with the Nord the difference was the greatest on essentially every song), but all this changed when I then put on room correction from my computer.
I only did the preparatory measurements on one of the amplifiers, so in theory there should still be audible differences when I used the same room correction on the two other amps. But after putting on room correction I couldn't hear any differences between the three amps.
Previously, I had spent a couple of days listening back and forth, hearing small but noticeable differences. Now I couldn't hear any, and after switching between all three amps several times I just gave up. It seemed completely pointless to even switch.
Usually, these "enormous" differences that people report between components I find grossly exagerrated, so usually I either hear no difference, or I hear small differences. So I'm not a subjectivist telling a story about how a new super expensive purchase easily blew everything previous of the water. I actually ended up keeping the cheapest amp (but nor am I someone who says "€200 is enough to spend on any audio component").