What are your thought on some facts which imply that a flat /smooth frequency /sound power response is not always ideal for a good loudspeaker.
With conventional forward-firing loudspeakers one cannot simultaneously have both flat frequency response (meaning on axis) and flat sound power response. In terms of what is important to listeners in normal small rooms, the direct sound is dominant, and this is well described by the on-axis response. 57 years of double-blind listening tests in a variety of rooms using hundreds of listeners and loudspeakers confirms it.
You asked for my thoughts on other things as well, so I revised an old summary of the scientific work that I initiated 57 years ago - that makes me really old! It probably is overkill, but you might find it interesting.
As the author of “the book” (Sound Reproduction, 3rd edition) it seems clear to me that not everyone in this discussion has read or understood it. I don’t wish to be repetitive, and apologies to those who already know what follows. The topic is not new to this forum, but some clarification and repetition might help. Humans are complex creatures, and several results from our investigations came as surprises to me and my colleagues. They may to you as well.
The first surprise, beginning with the initial crude blind listening tests in 1966, was that most people, most of the time, agreed on the loudspeakers they preferred. The sound quality scores were highly repeatable in successive randomized presentations. Where was the much touted “personal preference”? People who volunteered their loudspeakers for these early tests sometimes ended up disliking the high-priced “audiophile” products they had been living with. They had clearly adapted to the colorations sufficiently that music was enjoyed. They ended up changing their loudspeakers, having heard ones that were more appealing. As documented in JAES papers and “the book” test methodology has evolved to the point where subjective ratings from properly conducted listening tests can be treated as useful data to be correlated with technical measurements.
The second surprise was that the choice of program material mattered much less than was anticipated. Initial thoughts were that “natural”, mostly classical, recordings would be essential for listeners to be able to recognize excellence. It turned out that popular, studio creations, were as good or better in allowing listeners to express consistent ratings. What were they listening for? They could not recognize excellence in something they had never heard before, and which was created with close miking, equalization, and effects, all monitored, mixed and mastered using loudspeakers and rooms of unknown properties.
The clue came in a visual inspection of the associated anechoic measurements. All the highly rated loudspeakers exhibited flattish, smooth frequency responses on and off axis, meaning what? Meaning an absence of resonances. Resonances are the building blocks of musical sounds, including voices. The timbre of voices and instruments is fundamentally determined by the resonant structure. If this is changed by resonances in loudspeakers, they become monotonous colorations that are added to every sound that passes through them. Once they are revealed, they can become very hard to ignore, and they are heard in program of many different kinds, even in unfamiliar sounds – like pink noise, which is the most revealing of all sounds. Resonances are easily identified in comprehensive anechoic measurements, like spinoramas, but most are invisible in steady-state room curves, so much confusion exists. Room curves are a result of, not a target for, loudspeaker performance.
So, the important surprise was that listeners were rating loudspeakers according to the absence of audible problems more than the recognition of virtues. Eliminate the distracting colorations and distortions and the sound quality improves: “realistic”, “high-resolution”, “air”, and many other flattering adjectives apply. They would write short essays, often including profanities, to describe what they did not like about lowly-rated loudspeakers, and offer only brief compliments about the good ones. Program material that had wide bandwidth (bass can be 30% of an overall sound quality rating) and a dense spectrum (complex orchestration and reverberation) turned out to be most revealing of resonances. Solo voices and instruments, not so much. See Section 3.5.1.7 in the 3rd edition for more info. There is music for demonstration and music for examination – they are different, but often get confused. “Audiophile” music tends towards the “demo” category, as it sounds good through many loudspeakers – not revealing their problems.
As an aside it is worth noting that such “neutral” loudspeakers had a tendency to “disappear” behind the visually opaque screen used in the double-blind tests. Resonances were identified with the loudspeaker, not the program and when absent, the program was more clearly revealed – there was depth. Tests were mostly done in mono, where the effect was quite apparent. When repeated in stereo, the high sound quality ratings remained, but diluted by the fundamental flaws of stereo itself – much discussed elsewhere in this forum.
Luck played a role in this, because in my very first blind listening test I used an equal-loudness four-loudspeaker, randomly switched experimental method. Loudspeaker positions were randomized between repeated sessions to attenuate room effects. This multiple-loudspeaker method allows listeners to quickly separate the timbre of the program (constant) from that added by the loudspeaker (variable). As we have learned since, listeners quickly learn to “listen through” rooms to a significant extent, as we do in everyday listening and in live performances. More adaptation. The “take it home and listen to it” used by consumers and reviewers cannot compete – they often attribute their adapation to the product “breaking in” – more rubbish from unscientific methodology. But doing proper tests requires work and apparatus, as exemplified by the OP of this thread. Again, congratulations!
The next logical step was to investigate the audibility of resonances, which itself was a learning experience. Toole, F. E. and Olive, S.E. (
1988). “The modification of timbre by resonances: perception and measurement”, J. Audio Eng. Soc.,
36, pp. 122-142. Section 4.2 in the 3rd edition.
Those who participated in the experiments found themselves hearing resonances in daily life that had previously been ignored. This enhanced sensitivity faded, fortunately, but it emphasized just how important it was to eliminate resonances in loudspeakers. Later Sean Olive developed a training program for listeners that improved their ability to recognize and identify resonances. It is (or was, I don’t know now) available for download. Such listeners became the “trained listeners” in Harman listening tests. They arrived at their opinion of sound quality quickly, and those opinions agreed with those of untrained listeners who simply took longer to form consistent opinions. All listeners were screened for normal hearing.
What about listener’s life experiences? Are musicians and recording engineers better able to offer definitive opinions than the great unwashed? I discuss this in detail in “the book” and in the 1985/86 JAES papers. None of this is new.
Toole, F. E. (
1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc.,
33. pp. 2-31.
Toole, F. E. (
1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc.,
34, pt.1, pp. 227-235, pt. 2, pp. 323-348.
The answer is maybe, maybe not. Musicians? If they are also audiophiles, if not all bets are off. Audio professionals/recording engineers? Another surprise, as described in detail in the 1986 paper, and also in the earlier editions of “the book”, was to see measurable effects of hearing loss in the subjective sound quality ratings. It involved an elaborate collaboration with the Canadian Broadcasting Corporation (CBC/Radio Canada) to select a family of monitor loudspeakers, small, medium and large for use across the national network. They provided most of the listeners from their staff, some of whom exhibited high variability in their repeated sound quality ratings. It turned out that they had hearing loss, an occupational hazard in the audio industry, especially when loud monitoring sessions are combined with recreational or professional musician activities. Those professionals with normal hearing all preferred the same loudspeakers as the audiophiles in the same series of tests.
An interesting observation: a couple of the recording engineers stated that they had never heard such good sound before – and it was coming from both pro and consumer products, all having similar looking, good, anechoic measurements. Knowing what they were listening to in their professional lives explained it. Back in the 80s there were some truly dreadful pro monitor loudspeakers in common use – see Sections 12.5.1 and 18.3 in the 3rd edition for some examples. Nowadays, there is less difference between the domains – a good loudspeaker is a good loudspeaker – but a professional loudspeaker must not break. “Dead air” is to be avoided, so there is an extra challenge in designing pro speakers, which makes It even more impressive when one finds pro monitors that compete with the best consumer loudspeakers in terms of timbral neutrality and overall sound quality. It can be done. It is a relevant fact that mainstream loudspeakers, including some little wireless “smart” devices, increasingly exhibit quite neutral performance. Active loudspeakers have a huge advantage over their passive equivalents. Notions that recordings need to be “detuned” for mass consumption are misguided. Headphone listeners – the majority? – can find superb sound quality at modest cost these days.
Chapter 17 in the 3rd edition describes some of what is now known about hearing loss and it is not good news. In the context of listening tests, we lose the ability to form consistent opinions – liking things and then, later, disliking exactly the same sounds. I am not immune to such effects. In my youth I was an excellent listener, delivering sound quality ratings with small standard deviations. With age things changed, and around age 60 I realized that judgements were not coming with the same ease, and this was confirmed in my sound rating statistics. I retired from the listening tests. Figure 3.6 shows my hearing thresholds as I age, along with examples from the CBC test population. These people simply are not hearing all the sound. I still have opinions, articulately described, but they are now relevant only to me, not for public consumption. Fortunately, we now have spinoramas, from which a neutral loudspeaker can be recognized.
The remaining factor is spectral balance, the broadband frequency response trends that are easily heard, especially at low frequencies, where the equal-loudness contours crowd together. See Sections 4.4, and 4.4.1 for elaboration on the meaning of the contours and a discussion of “loudness controls”. Spectral balance is very important to listening satisfaction and this is a situation where tone controls or easily accessible equalization are essential for fussy listeners. Different programs, for many reasons, exhibit different spectral balances. Most often this is in the bass region, which is also affected by playback sound level - the equal-loudness curves. It is unrealistic to think that one setting, one “calibration” will sound similarly good with all programs at all playback sound levels.
However, if it is all to come together to provide state-of-the-art sound reproduction in our homes, the process must begin with “neutral”, resonance-free loudspeakers. They cannot be reliably identified in steady-state room curves, only anechoic data - or elaborate double-blind listening - can tell the tale.