It shouldn't be surprising that individual measurements don't tell the whole story for real world performance. I have done various kinds of computer benchmarking over the years, and I assure you that unless what you are measuring is a good simulation of how you intend to actually use something, the numbers can be very deceiving.
As a specific example, measurements of how a computer performs when reading a single very large file may tell you almost nothing about how it performs when reading a large number of teeny-tiny files that add up to the same size, or how it performs when writing files instead of reading files.
In the realm of something physical like bicycles, there has been some controversy the last few years because laboratory procedures for testing the efficiency of bicycle tires favor skinny, highly inflated tires for measuring rolling efficiency - the ultimate efficiency would be from a hard steel hoop because it loses no energy to rolling resistance. Yet such a tire would prove to be horribly inefficient for a real human riding on a surface that isn't perfectly smooth, because a rigid tire tends to cause hysteresis in the whole bike/rider system - rattling the rider around, and wasting energy by transforming forward momentum into vertical displacement every time there is a bump or dip on the road. A softer, wider tire absorbs the irregularities in the road surface as deformation of the tire and actually conserves forward momentum much better - even if it performs poorly on a laboratory test rig.
Amir's tests are measurements of specific things, and they are valuable in sussing out shabby engineering, but they measure simple values: the music we listen to doesn't consist entirely of 1kz tones and sine waves.
While I can agree with the initial opening statement. The elaboration I disagree wholeheartedly. There are benchmarks for those "teeny tiny" files. Unlike computer benchmarking though, audiophiles talk about how measurements don't pain the full picture because benchmarks for the things they're hearing don't exist now (and some insinuate "never" could due to no way of replicating the brain, and other such nonsense).
Also, not every outlet that does testing, is mandated to employ every single benchmark in existence in order to prove a point. If you have 25% THD for example, why would anyone need to worry about frequency response characteristics at that point in an Amp or Dac? Likewise with computers, no one cares for the 4k random write speeds of an NvME based SSD, if your system is running with a 500Mhz processor.
Likewise for bike manufacturers testing efficiency. They're trying to grab a certain understanding, and apply it consistently accross their testing. While tests can always be sub par, and deserving of better/more sensible standardization of benchmarking that argues points that would translate to better real-world grasp of the bike "overall". That is a far cry of saying individual testing couldn't tell a complete enough story for certain consumers. And also, no one is just looking at individual measurements. This is why THD isn't the only spec people should be looking out for. Things like power ratings and such also have their place, and is why they're included in any sensible spec sheet.
The error of people in this hobby is a disastrousl critical thinking lapse. And it goes something like this:
"Measuring the THD doesn't pain the full picture of how our ears perceive sound, so forget all this measurement nonsense."
Basically translated as: Because this doesn't serve me as much as I wanted it, or betrayed my subjective experience, I am going to ignore most/all measurements like this as they all are now possibly deceptive to me. Oh and also going to ignore all the proponents of this approach because they base their whole outlook on this stuff, so even if they say something I have no opinion on, I will be default ignore that as well just for precaution.
As if to somehow indicate having less information could ever paint a clear picture about something you're inquiring about. This is a massive logically fallacious behavior to have.