Mixing two-channel stereo allows for total precision for side-to-side and front-to-back placement, but there's no knob or fader marked "up and down" - that dimension is outside of the mixer's control.
Sidebar - occasionally you get random phase effects that combine with a speaker's vertical lobing to produce results on the vertical axis, but they're unpredictable and accidental. There are complex EQs that produce a comb-filter effect to battle the natural comb filter at the listener's head - like Q Sound - but again, the end result, while often intriguing, is fundamentally unpredictable per listener.
In my experience, the result you inquire about is simply great tracking - i.e. microphone placement was great and the vocal was recorded really well, with chest, throat and head sounds all present and correct, such that the live illusion is preserved so well that the listener's brain thinks, yeah, that's a real singer, and therefore pattern recognition and expectation places the apparent mouth at a real-life distance from the floor.
In other words, yes, it's an illusion - convincing, for sure, but not electronically created. It's your brain saying, "That sounds like a person, and most people are between 5' and 6' tall."