What body of science? There is none to correlate diffused field response of speakers to listener preference. Indeed, power response that is derived for that shows negative correlation with listening tests.
I think it's important to specifically delineate
why this is - power response is reflective of solely radiated sound power, rather than the response at the listening position, which is a function of that, listening distance, and room energy retention. However, listening position response is
also not sufficient to predict preference, which is a drum that Sean has been (very rightly) beating for a very long time. It is possible for the average response at the listening position to approximate what has been found to be preferred by the virtue of "two wrongs (not) making a right" when off-axis and on-axis response are both deficient
This of course is why we care about directivity in speakers - it is not sufficient that the sum at a point in space match a given target, it is necessary that the directional reflections not be timbrally corrupted. Tests under CEA2034 - including the exceptional data you provide on your Klippel - give us a window into the sound power of a speaker, the directivity, its axial response, and how its response is likely to translate to a "typical" listening situation, and we need all of those inputs (well, maybe less power response, but it does matter for listening position response) to produce a good model of what sounds good.
This is because humans attribute source positions to sounds - they do not arise "out of nowhere", and a lot of the cues we use in localization are frequency response. You're unlikely to be convinced that a speaker in front of you equalized with the difference of your 0 and 180 degree HRTFs (with zero elevation/on axis to the ear) is in fact behind you, but the response will sound
truly abhorrent.
Headphone sound is by its nature nonlocalizable (in the absence of DSP or binaural recordings) - we have two sources closely coupled to our ears which track perfectly with our heads as they move, and with stereophonic content we have no directional cues from either interaural differences or our own HRTFs. This is the condition of a diffuse field: sound without direction, and it's why Theile's gestalt model calls for viewing headphone timbre in the context of the head's DFHRTF. Analyzing the sound of a headphone with another compensation is as incorrect as analyzing the sound of a speaker with a frontal or 30 degree FFHRTF - you would certainly get
a result there, but it would imply that a "good" speaker would have a response that was very different from what we know listeners actually prefer.
A few papers using diffused field for headphone research doesn't remotely come to same level of research we have from Harman. Diffused field testing of HATS was started when we knew much less about listener preference than we do today. Using it as an element for anything is just wrong. And certainly doesn't point to a "wider body of science." Nothing is remotely as researched as Harman's work across a dozen papers and countless
This really seems to do the work that Theile, Hammershøi, & Møller, among others, have done, a disservice. Their work on headphone targets and binaural sound reproduction wasn't targeted to produce the same outputs as
@Sean Olive's work (a metric for "will this headphone typically be preferred"), and so the structure and nature of their experiments were different. Sean and his team have done an immense amount of fantastic work assessing listener preferences, and controlling for a pretty formidable range of variables in the process, and any future work will absolutely be indebted to them for it. The Harman papers are extremely good science, and the disrespect paid to them online is unconscionable.
But that doesn't mean that they are the only interesting avenues of research on headphones, nor does it mean that we should simply discard other significant results because they don't intuitively align with the Harman results. Particularly I want to harp on that "intuitively" - there's a tendency to assume that past research that doesn't instantly click in with the current research programme was erroneous, but highly heterogenous results from Sean and co, to Gaetan Lorho, to Theile, all actually align with a coherent model of how humans assess headphone sound. Treating them as inherently in opposition is in my opinion a really dire mistake.
To use that, you need to use their GRAS fixture. Going anywhere else means you are on your own with no body science defending you other than a couple of papers...
This presumes that by using a different fixture, we are necessarily casting aside all of their work and all of its useful conclusions! This is by no means the case, and by this logic, the same would apply to fixtures not using Todd's modified pinna:
By necessity and
for the better we extrapolate from results rather than treating them as gospel, and ideally we engage in open dialogue with our peers (I've certainly spent far too much time nattering at poor Sean and Todd at this point) to make sure those extrapolations are reasonable.