I essentially agree with you here
What I disagree with is that the Harman target is the correct
degree of tilt for a neutral (i.e. what you've been calling "flat") response. FYI, the most recent Harman target (2017) has a downward tilt of approximately 11dB with respect to a flat diffuse field response.
Moreover, equally importantly in terms of the discussion of perceived neutrality, the Harman "tilt" is not linear (i.e. it is not a straight downward-sloping line). Rather, it includes a little dip in the lower-midrange and a little peak in the mid-bass (relative to a straight downward-sloping line). FWIW, this dip/peak is a result of the Q of the filters used in the Harman study (if it's not clear what I mean by this, please let me know and I'll explain).
But anyway, I do agree with you that
some degree of downward slope in a headphone target response is likely necessary for subjectively neutral sound. But I would disagree with the idea that the Harman target is the most neutral. In particular, given the Harman research investigated listener preference and not perceived neutrality, I'm not sure you could argue that the Harman target should even sound (on average) neutral to most people; but, as per the goals and methodology of the research, it is very likely to be (on average) among the most preferred.