I'm late to the thread and I haven't read every post. However, I have seen enough to give me the impression that the thread starter is experiencing the Dunning-Kruger effect.
https://en.wikipedia.org/wiki/Dunning–Kruger_effect
I should just point out that a speaker can measure flat on-axis while having poor time alignment between drivers, box resonances, port huffing, rub & buzz, high IM distortion and a host of other maladies. There are many ways to measure frequency response. They may differ in speed, resolution, other information provided as well as their suitability for different measurement environments. Each technique has its strengths and weaknesses. For instance, the Farina chirp is very popular for its speed and ability to provide THD information. However, one can easily get fooled by the results if one doesn't appreciate nuances of the measurement's windowing function and gating. Even its rub & buzz is limited on the types of rubs is can find. And, if you run the chirp too fast, it may not find the buzzes either. This is why measurements have so many parameters you can adjust. The more you know, the more you realize that measurements don't tell you how a device sounds. They only provide insight as to why it sounds the way it does.