I totally agree with this. As said, I think "musical stuff" are somethings at higher abstract level, above "audio/sound" in general.
But so, even a system of 100€ can satisfy those great higher level. Otherwise, it means that audio take part of it, and thus its directly linked to it; thus, a change of it, change the "music".
But lets examine some of your "objects" example:
- a guitar and a singer
- a singer and an audience
- a singer and a church
- a sad voice and a happy voice
- a woodwind section and a string section
- a lone voice in a silent room
- a melody and a harmony
- a melody and percussion backing
- a sweet harmony and an angry voice
- a modern sound and an old-fashioned melody
All of them don't require at all any 10k systems or advanced environments. Brain will process it and recognize every "objects" you have listed.
I would call them "macro objects".
The interesting things happens when you are listening (i.e. recognize) "micro objects".
Fortunately you have listed one of them (but there are many of them, and they will increase the more you are into music and stuff):
- an increase in tension and a resolution
Tension (and dynamics, in general) will change every time you play it, whatever different systems or also the same, with different setup/values.
If you introduce this concept (as well of timbre, and such), than the object itself is identify from "what audio you are processing".
Thus, the more you fall into details and "low-level" objects, the more they are related to the source.
And here start the paradox of searching the setup that minimize distortion for a sort of "objective listening", when the simple fact of choosing it introduce differences (else, you won't choose it at all). After all, if no discriminations happens, you won't choose it.
Easily: you will reach a point where take a different setups won't "improve" objects, just change them.
Just think to a different frequency response of two QUALITY speakers (take whatever you want, over 10k each if you want): the timbre you got will differs a bit.
Or take the speaker you think is "better" than the other: it still sounds different in different situations. ALWAYS.
Here's the trap I don't get... hehe