I've read that no speaker can be perfect - every speaker is an optimization to a specific goal, with necessary compromises.
So if the goal was to have a speaker through which playback of a vocalist *alone* was indistinguishable from a live, unamplified singer, to a blinded listener, what measurement characteristics would that speaker have, and how would it be best approximated with real components?
It's a damn good question. The human voice typically uses less than half the spectrum we need for much music. And it typically requires only a part of the dynamic range we tend to think we need. But there are some major issues with reproducing it, to wit:
Voice recordings are often badly distorted and falsely embellished when the microphone is very near the mouth. There are effects of proximity -- the false emphasis on certain frequencies and bands, and wide variations in transient reproduction.
This closeness has some (or a lot) of the result that what is recorded is as if the singer/speaker were 3 inches from your ear, and not heard from several feet away, minimum, as you would usually hear them live without amplification. Sort of like having a singer sing into your ear while you wear hearing protection to keep the volume below pain. The typical mic does strange and wonderful things to an up-close singer. The result is unreal.
[It's why classical concert vocal soloists are generally miked at least 3-4 feet away, while pop concert singers are often holding a mic at 1-3 inches.]
Then there's Autotune. In classical concert recordings, we never use Autotune on singers. At pop concerts and studio pop recordings, we always use Autotune. -Always. When a singer is singing 3" from your ear, pitch becomes important. Autotune actually becomes part of the vocal character of the singer.
Then, there are a ton of producer-friendly studio tricks available on pop music recordings: Compression (attack, hold, decay), limiting, reverb, EQ, Pultecing, and a sea of other tricks up the producer's sleeves. If they succeed, the singer sounds 'better' than life to most, but at the cost of non-reality.
And, there's commonly a 'singing booth' used in pop recording. It offers its own brand of jams and jellies to the bread and butter of the voice.
I understand your wishes. But with the vagaries of voice recording, it's a tough errand to find a solution. I'll be interested in others' comments, especially those who practice the art of vocal recording.
Sidebar: I once worked for a studio owner/producer/recordist who typically recorded female vocalists with a U87 (male vocalists with maybe a 44B or other). Especially with the U87, he could 'hear' the slightest off-axis position of the mic. I would regularly get told to " go in there and rotate the vocal mic 3 degrees to her left" or such. Sure enough, the vocal character would change noticeably. Such is the perception of a musician. With such as the 44B, his concern was more about the backside of the mic.