These are some of the very same questions I have asked myself. Keep in mind that I am not as well-versed in the technical aspect as many others here, but I do try and apply scientific thinking to the whole equation.
With that said, I keep going back to the origin. That is, defining what the source of "neutrality" is. To me that would be fidelity to the original source, not the engineer's idea of neutrality. If the engineer has manipulated the recording away from the natural sound of the source, then neutral cannot exist.
This, however, leads to a circular argument, and once again we come back to "what is neutral"? Take a recording of orchestral music, for example. Already, any and all recordings will sound nothing like sitting in the audience of a music hall. Instead, the sound is all from on the stage. Is that fidelity to our "neutral"? Probably not. Is it fidelity to the neutrality of the instruments? More so, I say. What if then we change venues? Recording an orchestra in one hall as opposed to another will lead to different timbre and ambiance, within a certain threshold. We can all identify what a violin or clarinet sounds like no matter the venue; but the reflections and positioning will change our perception of so much of what we hear. When recording, do we take the venue into consideration, and is that then a "neutral" to draw a baseline from?
These are not easy answers even for myself who proposed them. I come back to this consistently. It's interesting from a scientific aspect; but sometimes I just want to enjoy the music. However I can do that has merit, in my opinion. But I want the sound as uncolored as can possibly be. In that case, it is faith to the recording that becomes foremost.