Nothing was missed in the measurements. Measurements showed a difference. Question is how audible it is.
I'm not claiming you missed anything, I feel your tests capture what is important. I also believe anything that could be heard can be measured. If errors show up in the measurements, performing a subjective test cannot be used to confirm the measurements.
By the link you referenced, there is no question about how audible it is. There are charts for that. And what is audible to you may not be to someone else or vice versa, making your subjective evaluation moot.
It would be like measuring the temperate of water at 10 deg C, and then telling me you confirmed that the water isn't frozen.