A point about measurements: we can run hundreds of audio tests if we wanted. And if someone identified something we are not measuring, we can trivially create a measurement for that as well. I only run half a dozen tests in general because more doesn't do more to quantify how linear a device is beyond very end of the diminishing returns tail.
So it is not that "we don't measure everything" but that "we don't need to measure everything" to know how linear and transparent a device is. Yes, we leave 0.0001% to chance. Show us a real test case where that loss matters and I will add a test for it. Indeed, I have done that over the years I have been testing, growing the set to what you see.
Ultimately, we want to get an idea of how well designed and engineered an audio device is. A product that has gone through rigorous process of producing extremely low noise and highly transparent operation, is going to do similarly well across many tests. Conversely for a poorly designed one.
Now, there is a class of audio technology where measurements are rendered mostly useless. The aforementioned lossy codecs is one. Such a pipeline makes decisions to degrade audio on frame by frame basis (a frame is usually from 0.5K to 8K samples) so we can't characterizing it with a simple sine wave. The system has "intelligence" to vary its compression ratio and logic dynamically. So we rely almost exclusively on listening tests for evaluation and development of lossy codecs.
Audio devices in contrast are totally dumb. They have no past history other than say, in a filter. And that filter performs its function statically, not varying its operation based on any logic. There can be some memory effect if you drive a system beyond its bandwidth with things like artificial impulse functions. Such functions though are "illegal" audio signals as they require infinite bandwidth. Net, net, there is little mystery in operation of audio devices as to dispense with measurements and require mandatory listening tests.
But even if they did, you better perform controlled tests. This is very true for developers of audio technology. They can easily get wedded to their own solutions. For this reason, when my codec team at Microsoft developed new encoders or algorithms, they would ask me to listen to it to verify they were not fooling themselves. And for developments of standardized codecs, blind testing was mandatory.