As a follow up to my other post, an insidious thing about mismatched levels is the listener is really perceiving differences. How often do subjective listeners say they really heard a difference? Well with a level mismatch your perception of a difference is real. If blind tested with those conditions you'd score well. Same with frequency response difference, the listener is hearing something that would hold up under blind testing as the difference perceived is real. Once you line up levels and fix any frequency response issues you only very rarely have anything left to sound different.
Now with all the ways humans can be biased, even with level and FR matched, you still might think you are hearing differences, but in my limited experience with a few friends doing those two things to control the listening experience reduces these differences even under sighted conditions. It doesn't eliminate false positives, but it reduces them.