EXACTLY!! BINGO!!! And THIS is why the demonstrated relationship between measurements and listener preferences may not be fully generalizable to different types of music. As you point out, the type of playback music influences listener preferences of the same speaker. Scroll up to see my rap/hip-hop illustration.
I don't think anyone here (including me) disagrees that measurements can give you an idea of what a speaker sounds like. For instance, you can easily figure out "no bass" or "bright" from FR charts, particularly if they are obvious. But more precisely, what we ultimately want to do is to predict perceived sound quality from measurements. Harman chose to target sound quality by means of blinded listener ratings (or "preferences"). I agree with this. Perhaps you know better than Harman?
Obviously. Bottom line is that Harman research demonstrated that computerized analysis of a series of spinorama measurements was only able to account for 74% of the variation in listener preference scores. If anyone is claiming to be able to predict sound quality with even greater confidence while only eyeballing a single FR chart, then they are making a claim that is way beyond what Harman suggests is possible. Such an astronomical claim "should" give people pause, yet it doesn't. And that, in my opinion, represents a measurement bias here, supporting the OP's original assertion.
Why are you making a strawman argument that ASR folks are "eyeballing a single FR chart"?
@amirm 's speaker reviews show on-axis and off-axis FR, multiple horizontal and vertical directivity measures, and distortion performance at various SPLs. All of that might not be sufficient - but no one has to establish that it
is sufficient in order to demonstrate that your critique of "eyeballing a single FR chart" is irrelevant since that's not what anyone's doing.
As for "the demonstrated relationship between measurements and listener preference may not be fully generalizable to different types of music," that is indeed possible. But I also think it misses the point. Bass performance very well might be more decisive with hip-hop and other bass-heavy genres. But that doesn't mean that it's useful or particularly meaningful to call a speaker whose F3 is 35Hz but has a broad suckout from 1-3kHz and poor directivity "a better sound quality performer for hip hop" than a linear, well-behaved speaker whose F3 is 50Hz. (Not to mention, saying a speaker is better for hip-hop because of its bass extension is precisely "predicting perceived sound quality from measurements,")
It's not about listener preference and its relationship to measurements. It's about how we philosophically or conceptually want to approach the question of evaluating speakers. That poorly performing F3-35Hz speaker offers "better sound quality for hip-hop" than the F3-50Hz speaker only if our only two choices are those speakers.
And the entire point of the kinds of speaker measurement Amir (and Erin, and other measurement-oriented reviewers) does is to help everyone see what the options are out there in the market, so folks can find an F3-35Hz speaker for similar money that is more linear and has better directivity. Or so they can determine whether or not it's possible to obtain such a speaker at a given price point, or in a given size, or if they need to spend more or use a subwoofer.
The way you are using listener preference and musical genre in this argument seems to ignore this fundamental issue of performance benchmarks that - as all reviews and measurements must do - take into account the possibility of the end user listening to all manner of different genres, not to mention different individual recordings within each genre since those vary widely in production style and spectral content.
So we can, in fact, understand quite a lot about a speaker's quality by examining the measurements. The link between those measurements and listener preference might very well be weaker - but contrary to what you have been repeatedly asserting, that does not mean that our claims about speaker quality are foolish, biased, or misguided. Speaker quality, like any audio-gear benchmarks, is about establishing a reference. If we cannot achieve that reference then sure, we can talk about how different listeners who prefer different genres might make different trade-offs depending on personal preference or genre tendencies. But they are trade-offs
from the ideal of that quality reference point.
So my view is that many of your points are quite valid in and of themselves - but I don't see them as persuasive evidence against the utility and validity of the kinds of measurements that comprise the speaker reviews here.