Yes, I agree that the raw measurements you compare differ up to 20dB.
That's what I meant with you want to measure the sound coming from a driver (20mm away from your head) and you route these soundwaves through multiple resonating filters (ear canal and Pinna) which, in order to get a good relation to perceived sound, would have to be undone.
The reason it has to be undone is that we hear using Pinna and ear canal so the brain already compensates for this based on daily 'calibration'.
Daily calibration exists as when we get older our hearing gets worse yet we don't experience it that way.
We do not - and should not - measure the driver of a headphone (well, except when we do, but not for purposes of assessing consumer headphones). The headphone on a head is an acoustic system which begins at electroacoustic transduction and ends at our eardrum, and it is this system's net response which is significant to our perception of sound.
I must note that there is no single fixed function for our ears - our brains will filter out HRTF effects from timbre, but as Theile demonstrated, only as appropriate to the perceived sound source. If you lack an AES subscription to see the previous link,
here is a separate hosting of Theile's seminal paper. Theile and Olive's work appears to imply that diffuse field HRTF inverse filtering gets closest (note that the "basis" of the Harman response, an in-room measurement in a fairly diffusive room,
is quite close to a smoothed DF.
So you have a sound source, run it through a Pinna which reacts differently to each headphone (see Rtings research) and then through an ear canal.
Both of these are 'standards' which differ per manufacturer it seems.
I actually would argue that different pinnae do interact differently with different headphones, following
Völk's paper here, but I'm somewhat confused that you've taken that away from anything on RTings.
Presuming you are making reference to
RTings' PRTF test, wherein a pinna-less and pinna-equipped version of their HMSII.3's ear plate are compared, you can surely see how the data falls into alignment with what I am saying: when we compare the state of being without a pinna to having one, headphones vary in inconsistent ways. However, a pinna in the earpad is the normal state of operation for headphones, so I would argue that it makes far more sense to regard this as reflecting the variability of impacts of removing that key part of the acoustic system. Supporting that would be the fact that putting
any pinna on the measurement system yields convergence of results with my HATS, even non-IEC ones like the EARS' - while there are still variations, they are smaller than what we observe on the earless flat plate.
When that compensation is an average one based on free field or whatever field with speakers as point source(s) a few meter away then you can be absolutely assured that the measurement is completely and irreparably flawed. Simply because the compensation used does NOT belong to that headphone and thus the measurement result differs.
That's why different HATS also differ in raw response and need their own compensation.
That compensation is NEVER made with a headphone.
So... even when compensated the result may differ a LOT from what came of the driver. And as a result we perceive the sound differently from what is shown by the output of a compensated plot from a HATS.
The compensation should not be a headphone unless the goal is to show the disparity in response between that headphone and some other source for the HATS in question. I use diffuse field as my target because, following from Theile's research, I feel that the data supports that a diffuse field flat headphone will be perceived as accurate (perhaps slightly modified with broad shelf filters for preference purposes) - people who prefer to compensate to Sean Olive's target likely feel the same, or prefer to target to the result which is most commonly preferred.
While HATS do vary,
the variations between them are far lower than the variation I show in the comparison of flat plate and HATS data (section 5.2 has overlays of all three major HATS models). There is definitely still room for interpersonal variation in interactions with headphones, and indeed I think this is a potential vector for the variations in frequency response perception that exist vs. some of the data we have, but the fact that work like Olive's can produce strong predictions of preference based on measurements of headphones on systems with pinnae strongly implies that whatever differences do exist are not massive.
I will also point to the various research of Dorte Hammershøi and Henrik Møller on HRTF and headphones, including most pertinently
this particular paper. While sadly it does not compare individual diffuse field HRTF to headphone FR on head - which would answer many questions - it nonetheless shows relative response consistency, and notably the range of variation is similar to
their other work on HRTF.
Let me invite you to take a look at the the
K712 and HD650 at Rtings.
I own both and according to the (compensated) plots these headphones should have similar tonal balance right ? There are some 5dB differences locally but the average tonal balance would be the similar.
[...]
These irreparably flawed measurements of mine indicate the K7XX is noticeably less 'forward' and is 'warmer' sounding and has less clarity. Also the more ragged treble response of the K7XX indicates the 'coarser' treble compared to the HD650
[...]
Now the proof is in the pudding here. Compare these headphones directly (sort of level matched in the lower mids) and LISTEN to these headphones. I own both so is easy to do. I can assure you these headphones sound totally different tonally and has a very high correspondence with my irreparably damaged measurements and not like the ones from Rtings.
I was honestly prepared to suggest here that your ears were lead by your gear - a problem which has befallen me many times before before when I leaned too hard upon poorly-made measurements - but upon comparing the plots, I am uncertain of your issue with RTings' measurements (other than, of course, their rather strange and ill-grounded house curve). RTings measurements reflect a drop on the center of the midrange for the K712 vs. the HD650, the iconic rise around 2khz which most of those AKGs feature, and a generally uneven response characteristic in the treble (which I would put at least some faith in as far as its internal details, as RTings made the commendable-albeit-weird choice to keep the shape of the DF HRTF past around 1.8khz, while merging it with a rise into ear resonance which was apparently made from whole cloth), which also seems to cohere with Innerfidelity's measurements of the K7XX and HD650.
I suspect that RTings' tendency to align headphones somewhat arbitrarily (has the alignment their tool uses ever been explained anywhere? I haven't seen it) gives you a misleading perception of how similarly they measure there - the two headphones' response looks different enough to be substantially subjectively different to me, and, indeed, in alignment with my memory of how the two compare for the most part. We are very sensitive to variations in FR in the midrange in particular, and the two diverge substantially there.
Pick 2 headphones that are measured using a HATS and by me (please don't do this with plots from other flat plates, they don't measure the same as mine) now apply EQ based on the different plots.
Different flat plates do have some differences - flush vs. protruding vs. inset microphone mounting being a major impact - but my flat plate which I used for this measurement is fairly analogous to your own, to my understanding: a flush-set microphone in a plate covered in closed-cell foam. Regardless, no difference of earless flat plate construction will resolve the problems we see here - we observe the same in IEC60318-1 couplers.
Then one can truly say which measurement method is closest to 'reality' and which is the most irreparably damaged.
Do these experiments.. your ears can be trusted with such test... and then you know
Don't take my word on it !!! nor someone else's.
That's all I have to say about it. Graphs should have a relevance to the perceived sonic differences
If we are making our final statements here, I will offer a short summary for mine:
- Disparities in response exist between different systems for measuring headphones which vary based on the headphones measured - that is, cannot be consistently compensated.
- The systems without pinna display the greatest disparities from the systems which most closely resemble the acoustics of a living human's head.
- Measurements conducted on the systems which most closely resemble the acoustics of a living human's head can predict subjective perception with at least fairly good accuracy.
I feel that the implications for what measurement systems are suitable follow fairly clearly from here.
As an aside, it is odd to me that we still have these sorts of dialogues in 2019.
In 1984, Floyd Toole already wrote of the measurement of headphones on anthropomorphic measurement fixtures as a necessity, including comparisons to in-ear measurements of humans. A recurring theme of technical discussions that I've had in the world of headphones lately has been that I find their exact subjects referenced in literature from before my birth afterwards - it seems that in the world of headphones, at least, there is nothing new under the sun.