But, of course, both as manufacturers, and as consumers, there's a good reason to see what people prefer on the sound side of thing when all else is excluded - and that's precisely what we get from research like Toole and Olive's. It won't perfectly predict consumer behavior, because sound isn't the only thing people buy for (and consumers don't have perfect information, sadly), but it's a good guide both for designers and for people, like us, who are really interested in the sound (perhaps to the exclusion of all else).
I think this sentiment is largely correct...if one is designing a speaker as a commercial project, then all else equal (roughly), favoring designs with relatively flat frequency response and consistent off axis is a good idea.
I watched the video of Toole's lecture above. He has some excellent arguments for what makes a "good" speaker, including the evidence from the listening tests. But there are some assumptions and matters of opinion he injects into the discussion along the way. (Look at medical research for contrast, researches are careful not to extrapolate the implications of the research beyond what can be reasonably concluded. So if a researcher is testing a cancer drug
in vitro they report the results of that experiment. They can't jump to the efficacy of the drug in humans...)
I would be very interested to know what the correlation is between the objective aspects of sound, size, cost, visual design, and sales are!
Some considerations:
- Nobody listens to speakers blinded. You can only draw limited conclusions about a product if you don't test in relatively real-world conditions. Blinded tests cannot be used to fully test phenomena that involve visual elements. Visual input has an effect on sound, and this is not just placebo. (Prove this to yourself by sitting in front of your speakers listening closely, then close your eyes.) I'm pretty confident that visual elements of subjective speaker quality could be ascertained to some level of repeatability. As Toole points out, the effect is so strong that it can't be overcome by the intention of the listener.
- Blinded tests have limited meaning when testing subjective phenomena. For example, if you are testing medication, there will be an objective measurement that will prove efficacy or not. There is no objective way to prove that a person had a better subjective experience (Maybe you could do an MRI). You can only rely on the subjects reported experience. In the case of the blinded speaker tests, the repeatability of the results is a proxy for an objective result. This cannot invalidate a given person's subjective experience in the way a blinded speaker wire test could. The differences in speaker sounds are real, and easily determined.
- The classic "audiophool" objection to AB/X testing is that differences can't be picked up in short term tests. This is indeed a ridiculous objection, when subjective reports of sonic improvement are often superlative, implying a substantiality that should be discernable quickly.
When considering speakers, I don't think you can as confidently extrapolate from short term listening tests. (Harmon may have done long term listening tests, and the results might correlate exactly, I don't know.) I mention this because when evaluating speakers I can usually make a quick determination as to what I like that's close, but I have found that my impression changes over time.
- There's a big element left out of simply asking what makes a speaker "good" which is good for what? Different kinds of music, from different eras, are flattered by different types of speakers. So if someone listens to a wide range of styles, then a relatively neutral speaker might be the best. In my own case, for fun I listen to primarily rock music from the 60s, 70s, 80s, 90s.
As Toole mentions, studio monitoring deviated from some objective "norm" of quality for most of this time. The most common monitors would be large soffit mounted monitors or wooden boxes with 2 or 3 way systems. The engineers and producers of the time also mixed to the speakers that would be found in the "real world," which were largely wooden boxes! It is my opinion that the rock music of these eras sounds the best on speakers that have these characteristics.
- There is a subtle reason for this, but essentially it goes back to the idea of "transparency." Toole is "old school" and the concept of "hi fi" is rooted in the ability of a playback system to represent the original musical event faithfully. So he talks about an ideal studio monitor being one that would be "transparent" to...what? If you record a violin or voice it's pretty "clear." We know what these sound like, so we can evaluate whether it is being faithfully reproduced. But most modern music exists as an a complete acoustic event only at the output of the studio monitors. So if you wanted to be "accurate" in your playback representation, then your best hope is having similar speakers.
Because this is not a realistic possibility, having relatively neutrally voiced speakers is a good strategy for having a speaker that on average does a good job across a lot of different recordings. But many people have a definite taste, and I think different styles of speakers lend themselves to different styles of music. As a simple example, if you have a speaker that can reproduce acoustic instruments with exquisite accuracy, that speaker will likely sound abrasively bright on recordings that have an excessive amount of high-frequency information in them. Which is a lot of recordings.
- I know rock music the best, and one of the huge issues with this is that rock music is produced by loud instruments, often highly distorted. Reproducing this accurately is
not possible unless playback volume approaches a similar level (which is loud as fuck.) Recording engineers and producers work to create mixes that translate well across the playback systems that exist in the real world. But this is an illusion.
Even with rock, because of the need for mixes to translate, it is preferable to have a relatively neutral speaker as a monitor. But that is not because it is "accurate." It's because it gives you a chance to make a reasonably balanced recording that will translate well. I have a sense of how rock music should sound, and it is shockingly hard to get this effect in a well controlled, modern, flat speaker. On playback, a speaker cabinet with resonance helps embody the sound and can create the illusion of "rocking" more convincingly. I think part of this is that when a speaker is turned up enough to resonate the cabinet, that translates as "loud" on a psychoacoustic level.
- Toole also makes a comment that if you have a "good speaker" and the playback "sounds bad" you can be confident that it's a "bad recording." Well,
bad recordings are the rule of the day. Some speakers can have a distinct voice that can sort of homogenize recordings, and mask deficiencies. This could be a very desirable quality. It's also the kind of thing that would require more extensive, long term testing, with specific inclusions of bad recordings to figure out what objective characteristics contribute to this.
I believe in science, and am in favor of improving the overall sound experience of the world in general. But when it comes to my personal listening, I have taste, and I hate listening to "accurate" speakers. The Neumann KH120 were so horrific to my ear, that even though I think I could have made them work as studio monitors, there is no reason to subject myself to this sound if I don't have too. I sold those suckers.
At our studio we also have a set of Genelec 8030's with a sub in our main recording room. I'm not sure if these measure well, but this system is strikingly accurate in that sound in recording space is represented in the control room to an uncanny level. These are very easy speakers to work on, they are relatively uncolored, so I can kind of stop worrying as much about what I'm hearing is what I'm getting. They are also pleasant sounding.
But I would never choose such speakers to listen to for fun. They have no discernable character. The sound is boring. The cabinets are so completely damped that you can barely hear them at all. This leads to a kind of "disembodied" sound which is unnerving. My pet theory is that we are just not evolved to hear disembodied sound. Without the sense of a resonant body, there is no "medium" there. We don't hear sounds, we hear things. If the speaker "disappears" what we are left with is often incoherent.
As far as the future of recorded and reproduced "music" is going, the horses are out of the barn. The notion of "hi fi" reproduction is ever less relevant. Producers of pop, rap, EDM mix to target specific playback systems which have no relationship to accuracy.
Music itself has crossed a singularity, in which it is decreasingly the product of muscles making movements, in real time, translating energy into physical mediums. Until we can jack the bitstream directly into the cerebral cortex, the need to translate these bits into acoustic energy will be relevant. Which makes for an interesting world