long time question here, apart from the famous "natural" line: "it sounds so natural" which can mean anything,
Well...the use of "natural" shouldn't be totally puzzling. If someone reports listening to a well recorded vocal track on two speakers and simply said "the vocals sounded more natural on speaker A" you should at least understand what that means in a general sense: It means like voices sound "in nature" rather than "artificially created" - in other words, more like the Real Thing.
The person may be wrong, but you should get the gist of what they mean.
Of course more detail always helps (and very often if I've seen the term "natural" in reviews it's usually surrounded by more details about why it sounds more natural). So for instance one could say the vocals sounded more "natural" on speaker A because the sibilance was not exaggerated and hardened in an artificial way, like on speaker B.
the second vague thing is timbre, it gets the timbre so real, like real piano, but different pianos got different sounds/timbre in theory.
so, what timbre is good timbre? how to percept in a correct way
Timbre is intriguing.
Even on old phones you could recognize the voice of someone you know. There was enough of the timbre translated (along with recognizable articulations) for you to know "this is grandma's voice." And yet you'd never mistake that sound coming through the phone for the real thing - the person actually talking in front of you.
Likewise you can have a recording of a clarinet and it can be played through all types of playback systems, even crappy speakers on a laptop, on a kid's crosby turntable, through a smart phone speaker...enough timbral/articulation signature comes through to identify it's a clarinet. And yet it doesn't actually sound like a real clarinet would sound. It's missing all sorts of richness, complexity, tonal color.
The challenge in reproducing something like real instrumental timbre was really brought home to me via an experience in the late 90's. I'd visited one of the big Audio Stores in NYC during a big speaker search. In one large demo room they had the gigantic flagship Genesis speakers (1.2, I believe). I was given a demo and I chose to play some orchestral pieces. It blew me away: the detail, the sized and scale of the sound. This was the first time I'd encountered 2 channel speakers that could produce something like the scale of a real orchestra, with amazing clarity.
And yet, when I closed my eyes to listen, as I often did attending orchestral concerts, something was wrong. The fact the system got closer in all aspects made the one missing stand out more: The timbral quality, complexity and variety of all the instruments wasn't there. Just like there is enough information to identify voices I knew through a telephone, there was enough information to identify all the instruments playing. But none actually sounded timbrally like the real thing.
It was like listening to the equivalent of a giant Ansel Adams black and white photo of an orchestra layed out before me: gobs of detail allowing you to see all the instruments, but missing the tonal colors of the real thing. Almost like all the instruments had been replaced by plastic replicas, homogenizing the tonal color...and homogonizing it to something more black and white.
I left wondering if it was just a fool's errand ever expecting a handful of driver materials to be able to replicate the harmonic complexity found in real life sound sources.
And most systems to me produce this same type of homogenization - like the Kii Audio speakers I listened to yesterday - all the detail was there to let me recognize instruments, easily hear all the production choices, reverbs etc. But timbrally like the Genesis speakers, compared to the real thing it sounded colored, homogenized, reduced to black and white. I think there is a reason "tonal color" is a synonym for "timbre." It's clear that, at least for a lot of people sound induces something akin to a sensation of "color." When I close my eyes and listen, colors and hues arise in response to what I'm hearing. I get certain hues from live sound sources that I rarely get when closing my eyes listening to sound systems.