When the front volume isn't completely sealed in over-ears (usually the case in real use) like it is for IEMs, frequencies above 10-12 kHz are created by excursion instead of acceleration.
Looks like you have this the wrong way round - sound pressure depends on driver excursion in pressure chamber conditions, which occur in a sealed volume at lower frequencies; sound pressure depends on driver acceleration in free-field conditions, at higher frequencies. Also, not sure where you got the 10-12 kHz figure from - that corresponds to a wavelength of ~2.9-3.4 cm, which will be significantly shorter than the depth of the front volume (which includes the ear canal, typically ~2.5 cm), so the 'transition frequency' from lower-frequency excursion-dominated to higher-frequency acceleration-dominated driver behaviour would be much lower than 10 kHz.
My stance is that psychological factors play a much larger role in perceived spatial qualities over sonic properties.
I suspect both are significant factors - along with open backs allowing environmental sound in, large, deep earcups provide the psychological similarity (compared to smaller, shallower cups) of listening to speakers with no physical deformation or touching of the pinna, but they also allow for more acoustic reflections off the entire pinna (which plays a part in the HRTF you're eponymously enthused by), contributing to what I'd call 'authentic soundstage', plus internal earcup reflections, what I'd call 'fake soundstage'. As an example, the HD600 has relatively shallow pads (especially used), with the depth to the driver quite short, and is known for its narrow 'in your head' soundstage, whereas the HD800 has larger, deeper earcups that fully encompass and take the driver further away from the ear, and is as we know renowned for its expansive soundstage (and this is all true, maybe to a lesser extent, for the Arya). Of course all this ultimately shows up in the frequency response measured (or heard) at the DRP.