It shouldn't be surprising that individual measurements don't tell the whole story for real world performance. I have done various kinds of computer benchmarking over the years, and I assure you that unless what you are measuring is a good simulation of how you intend to actually use something, the numbers can be very deceiving.
As a specific example, measurements of how a computer performs when reading a single very large file may tell you almost nothing about how it performs when reading a large number of teeny-tiny files that add up to the same size, or how it performs when writing files instead of reading files.
In the realm of something physical like bicycles, there has been some controversy the last few years because laboratory procedures for testing the efficiency of bicycle tires favor skinny, highly inflated tires for measuring rolling efficiency - the ultimate efficiency would be from a hard steel hoop because it loses no energy to rolling reistance. Yet such a tire would prove to be horribly inefficient for a real human riding on a surface that isn't perfectly smooth, because a rigid tire tends to cause hysteresis in the whole bike/rider system - rattling the rider around, and wasting energy by transforming forward momentum into vertical displacement every time there is a bump or dip on the road. A softer, wider tire absorbs the irregularities in the road surface as deformation of the tire and actually conserves forward momentum much better - even if it performs poorly on a laboratory test rig.
Amir's tests are measurements of specific things, and they are valuable in sussing out shabby engineering, but they measure simple values: the music we listen to doesn't consist entirely of 1kz tones and sine waves.
As a specific example, measurements of how a computer performs when reading a single very large file may tell you almost nothing about how it performs when reading a large number of teeny-tiny files that add up to the same size, or how it performs when writing files instead of reading files.
In the realm of something physical like bicycles, there has been some controversy the last few years because laboratory procedures for testing the efficiency of bicycle tires favor skinny, highly inflated tires for measuring rolling efficiency - the ultimate efficiency would be from a hard steel hoop because it loses no energy to rolling reistance. Yet such a tire would prove to be horribly inefficient for a real human riding on a surface that isn't perfectly smooth, because a rigid tire tends to cause hysteresis in the whole bike/rider system - rattling the rider around, and wasting energy by transforming forward momentum into vertical displacement every time there is a bump or dip on the road. A softer, wider tire absorbs the irregularities in the road surface as deformation of the tire and actually conserves forward momentum much better - even if it performs poorly on a laboratory test rig.
Amir's tests are measurements of specific things, and they are valuable in sussing out shabby engineering, but they measure simple values: the music we listen to doesn't consist entirely of 1kz tones and sine waves.