Minimum phase means that the zeros and poles of the system TF is on right side of the s-plane. It is basically saying the system is stable (for example the output doesn’t increase to infinity) and causal (output don’t depend on future input).The system's excess phase is close to 0 per frequency. You can find excess phase measurements of few headphones, it's usually flat or close to flat. (Oratory have some) The headphones as an isolated system is linear. As long as both headphones have the same leakage tolerance, if you EQ them to the same target next to the eardrum of one individual, they'll sound same.(ignoring distortion here).
leakage tolerance is how FR of headphones change when their air seal is broken. In this case EQ setup I described above, only leakage tolerance between headphones can make a difference.
The use of the idea of minimum phase and frequency response to describe the entire system assume the system is linear and time invariant. This allows superposition to be true. That is, if I input signal 1 and get signal A as output and input signal 2 and get signal B as output then if I input signal 3 which is just the sum of signal 1 and 2, the output will be the sum of signal A and B.
This allows the frequency response curve to be useful. Because you can break down the input to its frequency components, and the output will be the addition of those frequencies multiply by the frequency response curve.
This superposition property is possible when the system is linear and time invariant.
Due to the complex physics of the headphones, there are non linearities. With the driver having a voice coil or linear motor, and uneven force constant due to the magnetic flux densities in the air gap, the geometry of the diaphragm which is of a complex shape as it vibrates, resonance of the headphones physical construction, the friction, the acoustic of how sound wave travel from the driver to the ears, AB amplifier has cross over frequencies etc.
If it is LTI, like you have said, you can transform the output of a headphones to match another. So a simple experiment would be to choose one planar headphones, one dynamic and one electrostatic just to make the difference very obvious although I don’t think you need to go even to this length to get the result. And like what you suggested EQ then to make them sound the same. I’ve tried, it’s impossible.
Part of it is detail retrieval, some headphones has higher resolving capability especially apparent in electrostatics. In an LTI system you should be able to EQ this detail, for example increase the treble and should hear the details. But you can try, it’s very obviously impossible.
So if a frequency response curve does not describe the headphones because it’s not LTI, can we base our buying decision solely on it, and how we adhere to the Harman curve? We can’t.
And that’s why we get conclusions like the IEM sounds better than the Susvara. If you have listened to Susvara you’ll know how very unlikely this is.
Just to be transparent, do I think Susvara is overpriced. Yes, I personally think there’s so much subjectivity in sound that we should not have these ultra expensive headphones even from DCA for that matter. But that doesn’t detract from the idea that the way we evaluate headphones by relying only on these measurements is just not good enough.
The measurements are just not sufficient to predict how good a headphone sounds. It’s more than frequency response curve.