I've wondered about the OP's question from time to time. When I was in grad school a long time ago, we were studying pattern recognition and we spent some time learning about some of the mechanics of human perception. My thesis advisor had published a theory that the brain takes what it gets from the optic nerves and essentially performs a 2D Fourier Transform of the image, that is, the human visual system operates in Fourier space. This drastically reduces the storage requirements for visual information. This theory explained alot of things about our vision. Like, why optical illusions work they way they do. Like, why you can look at a bunch of vertical lines from a certain distance where you can determine that you are looking at a bunch of vertical lines, but you can't actually "see" any individual line. And many other question about how the brain does certain things so efficiently. (For those willing, this paper discusses the top,
Physiologically-Based Pattern Analysis - ScienceDirect ). Anyway, we used 2D Fourier transforms, plus some other algorithms (based on what we thought the brain would do) to come up with pattern recognition algorithms that could recognize thngs like cars, planes, ships, tanks, and such no matter the view angle, the size, or the color, etc, like the brain does.
To me, this is a fascinating concept. Does the brain have similar algorithms for audio. How do we remember sounds? We certain don't store a recording of everything we've heard in our brain. That would cost too much. But we must store some encoded/transformed version of what we hear, similar to how vision works.
So when you look at the common optical allusion of two lines that are actually the same length, however one line has outward pointing arrows and the other inward pointing arrows, you can absolutely measure that and determine the lines are the same length. But after you do a 2D Fourier transform on them, one is clearly longer than the other and it is the same one that humans will say is longer (the illusion) when they look at them.
So, it could be that human hearing, like vision, is not just about what frequencies can be perceived and at what levels, but also how the signal is perceived after the brain does the transform on it, which it seems that it does. I'm sure if we knew exactly what the brain was doing we could measure what happens when we simulate that transform. But of course we don't. We know some kinds of noise/distortion are more audible than others, perhaps related to what happens in transformation. And perhaps there are combinations of things that are not well handled by our physiological encoders, be it for good or bad.
Another phenomena that has occurred to me is that a sound coming from dead center in a stereo image (phantom center) will not sound the same as the same sound played through the center channel assuming that the speakers and gear are all perfect. They might measure the same using a normal mic that does not have the gain patter of a pair of ears. That is due to the actual angle the sound is entering the ear and its gain pattern. This is why you just cant get a phantom center to sound like a center channel speaker, no matter what. Simply rotating or heads in any direction, even slightly, alters the sound. And how the sounds reflecting around the room really effect the sound.
So, yeah, I think there are things that cannot (or are not) measurable in practical terms that effect what we think we hear.