How do you level match them if they don't have exactly the same frequency response? If you try taking some average, you will end up being able to still tell them apart. Even a small frequency response difference can be pretty audible. Better yet if you actually EQ one to match the other, then I bet you absolutely would not be able to tell them apart - unless one has some high pass or low pass filter and the other doesn't. FR, I am going to go out on a limb here on ASR, has by the far the largest impact on what we "hear" and "perceive". Assuming distortion levels are reasonably comparable as well (as in one doesn't have high levels of odd order harmonics).
If the difference between them can be fixed with EQ, all the subjective terms you used before are pointless ... buy the cheaper one and do the EQ for your liking.
However... level matching digital content is something "common" in audio production. Maybe I'm wrong, but is easy to find "how-to's" in the internet (https://ledgernote.com/columns/mixi...-levels-in-home-recordings-for-even-playback/)