There's a subtlety that you've missed.
Almost all
directional microphones exhibit proximity effect. The select few that don't have to pull off a lot of trickery to avoid it. See: EV RE20.
Directional microphones are all somewhere along the line of pressure (omnidirectional pattern) and pressure gradient (aka velocity, fig-8 pattern) transducers. Cardioid is the half-way point, and likely the most common directional pattern among microphones. Sub/wide-cardioids are closer to omnidirectional/pressure, while super/hypercardioids are closer to fig-8/velocity.
Omnidirectional mics do not exhibit proximity effect, but fig-8 mics do. As expected, all the mics along that scale exhibit proximity effect to greater or lesser degree.
The very important thing you've missed in your reasoning is this: our ears act like pressure-based microphones. ie, omnidirectional, ie, do NOT exhibit proximity effect.
Anyone caught up in the microphone-related-proximity-effect discussion, then, is automatically wrong. It's not happening here.
What IS happening here is this: baffle step.
"Baffle step" (and its associated treatment, Baffle Step Compensation) is a step-down in the frequency response that occurs when you take a loudspeaker driver with a flat response on a large baffle, and put it on a small baffle. Over a frequency range related to the dimensions of the baffle, the low-frequency output drops by up to 6dB. This is because the radiation has essentially gone from hemispherical to omnidirectional. Half the pressure going forwards means 6dB is lost on-axis.
This is why close-up loudspeaker measurements require additional processing before they're useful: when you put a microphone in the acoustic nearfield, you capture the signal before 6dB is lost by wrapping around the cabinet and going backwards. Pull the mic further away, and the change in frequency response is obvious.
So, when somebody whispers in your ear and it sounds all bassy, it's because their voice is no longer suffering from baffle-step losses.
Finally, we can go back to headphones.
Headphones have a few different regions of operation. At the mid-high range, you've got a relatively large source pointing at your outer ear, and that's going to cause all sorts of interesting effects. The desirable frequency response is non-linear, for sure.
We're concerned about bass here, though. Thankfully, a helpful forum member has pretty much solved this for us:
https://www.audiosciencereview.com/...or-people-wearing-glasses.24574/#post-1759664
When glasses are worn, the headphone driver is essentially still in the same place relative to the ear. The big difference is that the acoustic chamber containing the headphone driver, air space and our ear becomes leaky.
Below the modal region of a room, it acts as a pressure vessel, which will exhibit a 12dB/octave rising response towards the LF. Real-world rooms are leaky, so you won't get all twelve of those decibels, but you'll probably get some.
Headphones work in the same way, except obviously the pressure zone starts at a much higher frequency.
@solderdude 's post shows this quite clearly.
Now, there's all sorts of caveats about how well-sealed our headphones will be, and even things like hair between the earcup and skin will show some loss in the bass. Chances are that some averaging has to be done for manufacturers to work out what's going to work for most people, most of the time.
Hopefully this clears up a few things.
Chris