If you haven't done so, you need to read my book. I just dipped into this thread after a long absence and this popped up. The resolution of your dilemma is that listeners in these tests show more evidence of responding negatively to flaws, than responding positively to virtues. As you correctly point out, the listeners have no knowledge of how a recording "should" sound. But, it turns out that most listeners have an instinct about recognizing how a recording "should not" sound - responding to the characteristics of reproduced sound that are not "normal" for live sounds.
In the multiple-comparison tests I started in 1966 at the NRCC and that have continued since at Harman (A vs B vs C vs D, not just A vs B) it is easy to recognize and separate the timbral characters of different loudspeakers as distinct from the common factor, the recording, whatever it is. This is why the most revealing program material exhibits wide bandwidth and a dense spectrum (complex instrumentation vs. simplicity; wide bandwidth vs simple spectra like voice or solo instruments).
So, as I say in the book, evidence is that listeners tell us that the highest rated loudspeaker is the least flawed, not the most virtuous, although that is precisely what is meant. Looking at decades of listener response sheets yields enormous volumes of critical comments, some quite colorful, and slim volumes of compliments, mostly versions of "sounds good". Of course subjective reviewers have added to the verbiage with terms that often are meaningless, but poetic. "High resolution" loudspeakers turn out to be the ones with the fewest timbral distractions - it is not an independent variable.
Further analysis showed that the dominant flaw has been resonances, which alter the timbral signature of whatever sound is being reproduced.
The Harman listener training (which you can download and experience yourself) has ONLY to do with recognizing and describing resonances so that useful information can be fed back to the designers, helping them to find and fix audible problems. So, if there is a bias introduced by such training, it is that those trained listeners are very adept at hearing and describing loudspeakers that are not timbrally neutral. Is this a problem? I think not.
Hi Floyd, thanks for your comment. I just ordered your book, I've been meaning to read it for months.
I think the issue of why people prefer "neutral" speakers is pretty strange, or how it is they can identify flaws in speaker reproduction, without having access some kind of original reference.
Three possibilities come to mind:
1- The reference point is the human voice, we have an innate perspective on how it should sound, and can recognize deviations.
2- The reference point is the body of recorded music with which we have cultural familiarity.
3- The reference point is that recording engineers/producers have an innate sense of what sounds good and the ability to craft a signal that sounds the best when played on a neutral system.
I'm a producer/musician/engineer and have worked on many recordings over the years (nothing famous). There is a long, painful, learning curve on crafting mixes that "translate" well. I've gotten much better at this, and the improvement has both conscious and unconscious elements.
It's a very bizarre experience to be in the studio and compare your mix to a famous mix that you know sounds killer everywhere. Both famous mix and your mix sound great on the studio monitors: but with yours it only sounds great on the studio monitors!
I experience it as having a sense of "how something should sound." The first half of the battle is getting it that way on the local monitors, the second is getting it to translate. I sometimes think that mixers, the best of them anyway, gain a collective sense of how to make mixes that translate well, based on the world of playback systems that are out there. So the goal of the mix is a kind of "meta object", a reference point that is not dependent on any particular playback system.
One of the main elements of getting a mix that translates well is to make sure the resonances, especially peaks, in the mix are well controlled, which often requires extensive dynamics and equalization, and dynamic equalization. The problem with having a resonant peak in the frequency response is that while it may sound good on the studio monitors, which have a "high resolution" and are usually pretty well built, when the mix is played on a system that has a resonant peak in the same place as your mix does, it will "blow out" the speaker and become obnoxious.
So having a mix with a relatively smooth frequency spectrum is a safe bet. But, it does not work to put your mix on the spectrum analyzer and process it until it has some kind of desired frequency spectrum. The frequency shape of the mix cannot be separated from the musical ideas embodied in it.
So it might be that there is this kind of "ideal" version of the mix that is crafted with the goal of having a well controlled frequency response, and having this specific frequency "shape" reproduced well is required to hear the best version of the mix.
(For anyone that wonders what I'm babbling on about, the question is this. If I'm listening to a speaker that has a resonant peak at 5khz, why would this be a negative parameter? 5khz is a very common frequency to boost in mixes. So why would it be bad if the speaker introduced it versus having an engineer introduce it? How would the listener unfamiliar with the material know that the resonant peak shouldn't be there?)
So maybe when the listener hears such a mix (and the reference material I've seen you have used is pretty well recorded) the listener can tell that what they hear, while not problematic or bad necessarily, deviates from the "idealized" meta-mix intended by the artists.
One reason I was so surprised by these results is that rarely is anything in pop music, with a relatively dense mix, without extensive EQ, including the voice.
So just the idea that we know what a natural voice should sound like and can hear deviations doesn't seem sufficient to account for the listener preferences.
Of the three possible explanations I gave above, of course it could be a combination, passed through a complicated linkage of productions to playback.
i.e. The engineer knows what sounds good based on innate knowledge, so they can craft a signal that retains this essential quality despite extensive processing, and they know how to craft this in a way that passes all the way to the listener on the other end.
Anyhow, my mind has been relatively blown since coming across your research, because I do not like the way the vast majority of studio monitors sound, and I know at least some of the monitors I have used "measure well" relatively speaking. But they sound "wrong" to me in comparison to my favorite "hi-fi" speakers. The handful of studio monitors that I've used that I think "sound good" have not tended to be such great monitors!
I became frustrated by having to work for hours on end on speakers that "sound bad."
I have not been able to account for my perceptions here.
Last edited: