The Harman study is only about finding the preferred tonality the majority of people seems to like. In order to educate their own engineers and ensure better sales in the future.
It may confuse you but the 'well known Harman target' you see used is fixture dependent. And it differs from the Harman one as they used a different pinna.
The familiar 'Harman curve' (the one with the bump at 3kHz) actually consists of a correction for the fixture itself and on top of that the 'found EQ settings from the Harman research' are added to that correction curve for that specific fixture.
For me (engineer) I see the 'Harman curve' as the EQ settings they arrived at during the research - the actual response of the used headphone (which was measured on a non standard fixture with their own fake pinna). As not every listener arrived at the same settings (and probably song dependent as well) an 'average' was created.
I see it as measuring the output of a DAC (which is linear in response) by using an ADC that has a bunch of filters between it that differ in response depending on what DAC you connect to it and also differs at different times in a day.
Translate different response as: different pinna activation, seal, positioning, product variance.
Now if we want to measure 10 different DACs then all of them would measure differently as someone is messing with the filters before the measurement (within a certain range for each filter).
When we were to look closely and even listen to the DAC the measurement squigly of each DAC would differ. Now... If I were a manufacturer of said 'measurement device' I would want those filters standardized and ideally not having a guy inside changing the settings depending on that guy seeing what DAC is going to be measured.
The goal is to measure as closely as possible to equal amplitude for all frequencies. This, however, is not possible because of the filters in front of the otherwise linear ADC used.
One HAS to use filters that change here and there. The technical solution is to add a compensation for the (varying) filters.
The manufacturer does this by performing a shitload of (standard) measurements using a well characterized 'reference'.
They get a bunch of different squigly's and average them out. This is the 'average' of the filters used and is offered as 'compensation'.
The idea behind this is that all similar performing DAC's measured on that device have a very similar response but not the same. The average is all that is available.
Now... with speakers in a dead room at 1 single standard position this works fairly well and the guy altering the filters only adjusts the filters slightly.
No one likes to listen in a dead room nor do they own one. Such a speaker sounds different in a 'standard' room (which differs from your own room with 100% certainty)
So people measure how the overall 'tonal balance' changes and correlate that with what they hear.
Then they do the same but with headphones measured on an 'uncertain' device.
The researchers give people so 'filters' they think should be 'correct' ones needed to make the sound they like.
They find settings that differ within a range. They select an average (probably by excluding outliers and averaging the medians) and arrive to EQ needed to mimic the flat measuring speaker in a room.
To me that EQ opposite the sound of the speaker measured in the dead room is the 'Harman target'. That's what needs to be done to the 'flat in the dead room' device.
The 'flat in the dead room device' is the equivalent of the DAC measurement device with the changing filters + the 'average' compensation they found.
To make that sound good (preferred by the majority of people, not universally nor everyone) that 'found and averaged EQ' needs to be added to result of the not so trustworthy measurement device in order to come to a squigly that has a high correlation to how we would perceive a good speaker in a 'standard' room.
Now ... all DAC measurement device manufacturers use somewhat to substantially different filters and all also arrive at some 'average but not exact' compensations.
The 'Harman target' (the found average EQ) is always the same.
The result is the 'raw measured squigly' differs on all DAC measurement devices and adds the 'Harman target' to it in order to arrive at the 'most ideal sound'.
For this reason (the DAC measurement devices manufacturers that all add the same 'Harman target' end up with different raw squigly's and
attempt to use an 'average' filter response they found.
All these manufacturers make something that use a 'standard' DAC and manufacturers of the DAC measurement gear do their best to get the guy, that sets the filters in a random way, to set the filters in a specific way so that when that DAC is used the results (DAC + alteration + compensation) shows a 'flat' response (an agreed upon (standard) speaker in the dead room at the same distance) and add the 'Harman correction' to it and say.... this is industry standard.
And indeed when the measurement gear from manufacturer A goes to the manufacturer B the results are remarkably similar on the same exact conditions (DAC).
Yet... secretly the guy that comes with the DAC measurement gear, as soon as another DAC is measured (different HP, different conditions) is allowed to play with the filter settings at will.
So while the 'standard' DAC measures a nearly perfect response (after the compensation) it won't be the case with other DACs. Some are closer than others yet... we don't really know which it is as the guy changing the settings keeps it to himself.
That's why the 'Harman target + correction curve' (often referred to as 'THE Harman curve) is a fine 'starting point' but there IS an uncertainty we do not know and that could be substantial (dB's) at different frequencies.
So I agree with this:
I think it makes sense to EQ headphones accurately to the Harman Curve as a starting point in EQ
It does make sense. What I am trying to convey here is that the 'Harman curve' is not the exact curve to follow for each headphone when the goal is to get 'perfect sound quality' and that 'average' line we see and love in all the plots of the 'raw squigly' is not necessarily 'correct' as in 'exact'.
Trying to get 'close to that average' in general will bring the tonal balance closer to what has been found in the research (in this case Harman).
Now consider that aside from the guy that secretly operates the filters and the 'Harman EQ' being added there is a second guy playing with, yet again, a different set of filters that have some resemblance to the ones used in the DAC measurement device also plays with his filters (your ear canal, pinna) and there is an automated device (your brain) inside your head 'undoing' that second guy's filtering based on an ever (slowly) changing reference that is based on perception of real sounds around you.
This means... headphone measurements and actual perception does have some but not an exact correlation (2 sets of filters and one 'correction' being fixed and the other semi fixed') it is a mess and the 'somewhat randomized and highly averaged measurement' squigly in-between the DAC and brain is not an 'exact' one. It is merely an indication.
Some are more accurate than others within a certain (limited) frequency range.
So don't put too much faith in a highly averaged correction + EQ 'line' as being holy and final. It isn't but understand most people assume it is (has to be).
Headphone measurements are indicative
at best. It is not as accurate as measuring a DAC without any, not accurately compensation-able filters between it.
So the 'THE Harman curve' for a specific HATS or other fixture is also just 'an average and far from exact to be followed' curve.
That is the point I am trying to get across. Not to discredit standards, nor to question Harman's research. Just that 'folllowing THE Harman curve' and an EQ based on that particular fixture, while at least giving improvements over none (in most cases) does not mean the result is 'exactly' correct. Most likely it isn't even though a nice plot shows it is based on potentially not correct 'measurements'.