Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

thecheapseats · Apr 14, 2023

Postlan said:
Interesting, but I don't know him. Contact amir

as you do this for a living (which you mentioned) questioning or asking for clarifications of his statements about mixing and mastering while offering your own real world experienced, accurate definitions were obviously made with precision... I was initially confused by what he said - after reading it again a few times it was clearer - but not necessarily correct - only sorta' correct...

Floyd Toole · Apr 14, 2023

test1223 said:
What are your thought on some facts which imply that a flat /smooth frequency /sound power response is not always ideal for a good loudspeaker.

With conventional forward-firing loudspeakers one cannot simultaneously have both flat frequency response (meaning on axis) and flat sound power response. In terms of what is important to listeners in normal small rooms, the direct sound is dominant, and this is well described by the on-axis response. 57 years of double-blind listening tests in a variety of rooms using hundreds of listeners and loudspeakers confirms it.

You asked for my thoughts on other things as well, so I revised an old summary of the scientific work that I initiated 57 years ago - that makes me really old! It probably is overkill, but you might find it interesting.

As the author of “the book” (Sound Reproduction, 3rd edition) it seems clear to me that not everyone in this discussion has read or understood it. I don’t wish to be repetitive, and apologies to those who already know what follows. The topic is not new to this forum, but some clarification and repetition might help. Humans are complex creatures, and several results from our investigations came as surprises to me and my colleagues. They may to you as well.

The first surprise, beginning with the initial crude blind listening tests in 1966, was that most people, most of the time, agreed on the loudspeakers they preferred. The sound quality scores were highly repeatable in successive randomized presentations. Where was the much touted “personal preference”? People who volunteered their loudspeakers for these early tests sometimes ended up disliking the high-priced “audiophile” products they had been living with. They had clearly adapted to the colorations sufficiently that music was enjoyed. They ended up changing their loudspeakers, having heard ones that were more appealing. As documented in JAES papers and “the book” test methodology has evolved to the point where subjective ratings from properly conducted listening tests can be treated as useful data to be correlated with technical measurements.

The second surprise was that the choice of program material mattered much less than was anticipated. Initial thoughts were that “natural”, mostly classical, recordings would be essential for listeners to be able to recognize excellence. It turned out that popular, studio creations, were as good or better in allowing listeners to express consistent ratings. What were they listening for? They could not recognize excellence in something they had never heard before, and which was created with close miking, equalization, and effects, all monitored, mixed and mastered using loudspeakers and rooms of unknown properties.

The clue came in a visual inspection of the associated anechoic measurements. All the highly rated loudspeakers exhibited flattish, smooth frequency responses on and off axis, meaning what? Meaning an absence of resonances. Resonances are the building blocks of musical sounds, including voices. The timbre of voices and instruments is fundamentally determined by the resonant structure. If this is changed by resonances in loudspeakers, they become monotonous colorations that are added to every sound that passes through them. Once they are revealed, they can become very hard to ignore, and they are heard in program of many different kinds, even in unfamiliar sounds – like pink noise, which is the most revealing of all sounds. Resonances are easily identified in comprehensive anechoic measurements, like spinoramas, but most are invisible in steady-state room curves, so much confusion exists. Room curves are a result of, not a target for, loudspeaker performance.

So, the important surprise was that listeners were rating loudspeakers according to the absence of audible problems more than the recognition of virtues. Eliminate the distracting colorations and distortions and the sound quality improves: “realistic”, “high-resolution”, “air”, and many other flattering adjectives apply. They would write short essays, often including profanities, to describe what they did not like about lowly-rated loudspeakers, and offer only brief compliments about the good ones. Program material that had wide bandwidth (bass can be 30% of an overall sound quality rating) and a dense spectrum (complex orchestration and reverberation) turned out to be most revealing of resonances. Solo voices and instruments, not so much. See Section 3.5.1.7 in the 3rd edition for more info. There is music for demonstration and music for examination – they are different, but often get confused. “Audiophile” music tends towards the “demo” category, as it sounds good through many loudspeakers – not revealing their problems.

As an aside it is worth noting that such “neutral” loudspeakers had a tendency to “disappear” behind the visually opaque screen used in the double-blind tests. Resonances were identified with the loudspeaker, not the program and when absent, the program was more clearly revealed – there was depth. Tests were mostly done in mono, where the effect was quite apparent. When repeated in stereo, the high sound quality ratings remained, but diluted by the fundamental flaws of stereo itself – much discussed elsewhere in this forum.

Luck played a role in this, because in my very first blind listening test I used an equal-loudness four-loudspeaker, randomly switched experimental method. Loudspeaker positions were randomized between repeated sessions to attenuate room effects. This multiple-loudspeaker method allows listeners to quickly separate the timbre of the program (constant) from that added by the loudspeaker (variable). As we have learned since, listeners quickly learn to “listen through” rooms to a significant extent, as we do in everyday listening and in live performances. More adaptation. The “take it home and listen to it” used by consumers and reviewers cannot compete – they often attribute their adapation to the product “breaking in” – more rubbish from unscientific methodology. But doing proper tests requires work and apparatus, as exemplified by the OP of this thread. Again, congratulations!

The next logical step was to investigate the audibility of resonances, which itself was a learning experience. Toole, F. E. and Olive, S.E. (1988). “The modification of timbre by resonances: perception and measurement”, J. Audio Eng. Soc., 36, pp. 122-142. Section 4.2 in the 3rd edition.

Those who participated in the experiments found themselves hearing resonances in daily life that had previously been ignored. This enhanced sensitivity faded, fortunately, but it emphasized just how important it was to eliminate resonances in loudspeakers. Later Sean Olive developed a training program for listeners that improved their ability to recognize and identify resonances. It is (or was, I don’t know now) available for download. Such listeners became the “trained listeners” in Harman listening tests. They arrived at their opinion of sound quality quickly, and those opinions agreed with those of untrained listeners who simply took longer to form consistent opinions. All listeners were screened for normal hearing.

What about listener’s life experiences? Are musicians and recording engineers better able to offer definitive opinions than the great unwashed? I discuss this in detail in “the book” and in the 1985/86 JAES papers. None of this is new.

Toole, F. E. (1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc., 33. pp. 2-31.

Toole, F. E. (1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc., 34, pt.1, pp. 227-235, pt. 2, pp. 323-348.

The answer is maybe, maybe not. Musicians? If they are also audiophiles, if not all bets are off. Audio professionals/recording engineers? Another surprise, as described in detail in the 1986 paper, and also in the earlier editions of “the book”, was to see measurable effects of hearing loss in the subjective sound quality ratings. It involved an elaborate collaboration with the Canadian Broadcasting Corporation (CBC/Radio Canada) to select a family of monitor loudspeakers, small, medium and large for use across the national network. They provided most of the listeners from their staff, some of whom exhibited high variability in their repeated sound quality ratings. It turned out that they had hearing loss, an occupational hazard in the audio industry, especially when loud monitoring sessions are combined with recreational or professional musician activities. Those professionals with normal hearing all preferred the same loudspeakers as the audiophiles in the same series of tests.

An interesting observation: a couple of the recording engineers stated that they had never heard such good sound before – and it was coming from both pro and consumer products, all having similar looking, good, anechoic measurements. Knowing what they were listening to in their professional lives explained it. Back in the 80s there were some truly dreadful pro monitor loudspeakers in common use – see Sections 12.5.1 and 18.3 in the 3rd edition for some examples. Nowadays, there is less difference between the domains – a good loudspeaker is a good loudspeaker – but a professional loudspeaker must not break. “Dead air” is to be avoided, so there is an extra challenge in designing pro speakers, which makes It even more impressive when one finds pro monitors that compete with the best consumer loudspeakers in terms of timbral neutrality and overall sound quality. It can be done. It is a relevant fact that mainstream loudspeakers, including some little wireless “smart” devices, increasingly exhibit quite neutral performance. Active loudspeakers have a huge advantage over their passive equivalents. Notions that recordings need to be “detuned” for mass consumption are misguided. Headphone listeners – the majority? – can find superb sound quality at modest cost these days.

Chapter 17 in the 3rd edition describes some of what is now known about hearing loss and it is not good news. In the context of listening tests, we lose the ability to form consistent opinions – liking things and then, later, disliking exactly the same sounds. I am not immune to such effects. In my youth I was an excellent listener, delivering sound quality ratings with small standard deviations. With age things changed, and around age 60 I realized that judgements were not coming with the same ease, and this was confirmed in my sound rating statistics. I retired from the listening tests. Figure 3.6 shows my hearing thresholds as I age, along with examples from the CBC test population. These people simply are not hearing all the sound. I still have opinions, articulately described, but they are now relevant only to me, not for public consumption. Fortunately, we now have spinoramas, from which a neutral loudspeaker can be recognized.

The remaining factor is spectral balance, the broadband frequency response trends that are easily heard, especially at low frequencies, where the equal-loudness contours crowd together. See Sections 4.4, and 4.4.1 for elaboration on the meaning of the contours and a discussion of “loudness controls”. Spectral balance is very important to listening satisfaction and this is a situation where tone controls or easily accessible equalization are essential for fussy listeners. Different programs, for many reasons, exhibit different spectral balances. Most often this is in the bass region, which is also affected by playback sound level - the equal-loudness curves. It is unrealistic to think that one setting, one “calibration” will sound similarly good with all programs at all playback sound levels.

However, if it is all to come together to provide state-of-the-art sound reproduction in our homes, the process must begin with “neutral”, resonance-free loudspeakers. They cannot be reliably identified in steady-state room curves, only anechoic data - or elaborate double-blind listening - can tell the tale.

Thomas_A · Apr 14, 2023

Floyd Toole said:
First of all, with conventional forward-firing loudspeakers one cannot simultaneously have both flat frequency response (meaning on axis) and flat sound power response. In terms of what is important to listeners in normal small rooms, the direct sound is dominant, and this is well described by the on-axis response.

I have looked around for some controlled tests of omni vs conventional speakers and where the on-axis is the same (linear). In anechoic rooms, there should be no obvious difference, but in normal rooms, is the omni perceived brighter or is there no difference (given the direct sound dominance)?

Newman · Apr 14, 2023

Floyd Toole said:
As the author of “the book” (Sound Reproduction, 3rd edition) it seems clear to me that not everyone in this discussion has read or understood it. I don’t wish to be repetitive, and apologies to those who already know what follows.

I promise that very few, of those who have read and DO understand your book, will object to you being repetitive in this forum in presenting the information, given its importance to proper understanding of preference between loudspeakers!

Floyd Toole · Apr 14, 2023

Thomas_A said:
I have looked around for some controlled tests of omni vs conventional speakers and where the on-axis is the same (linear). In anechoic rooms, there should be no obvious difference, but in normal rooms, is the omni perceived brighter or is there no difference (given the direct sound dominance)?

The closest I can come to answering this involves a personal experience. At the National Research Council we rented our facilities and services to industry, including in this case magazines. We conducted double-blind listening tests and performed anechoic measurements on products being reviewed by two Canadian audio magazines, now defunct (RIP). Among the products evaluated was the Mirage M1, possibly the first well-designed bipole loudspeaker. Over much of its frequency range it was close to being omnidirectional. When compared directly with conventional forward-firing loudspeakers in the (acoustically typical) listening room, it competed very well. Well enough that I bought a pair. My experience is described, with anechoic and room measurements in Section 7.4.6 of the 3rd edition. In my home (long gone) I had two listening rooms, a home theater and a large, irregularly shaped, high ceilinged classical music "concert hall". The omni M1s served well as a substitute orchestra in that space with me listening 22 ft away. I liked that space and that experience, but now I have an elaborate multichannel system that I would say is even better and much more flexible for stereo upmixes.

MatthewS · Apr 14, 2023

@Floyd Toole thank you for continuing to weigh in on this discussion (as well as all the research over the years!)

I was wondering if you know if tests that only contained very well performing speakers were conducted? If this was covered in the book and I've forgotten, I apologize. I'm imaging a mix of items like the Neumann and the JBL 305p, etc. I'd consider everyone that participated as untrained (I have a copy of the software Sean Olive developed but I have hestitated to use it simply because I was worried I'd lose the ability to listen without disecting flaws. It sounds like maybe this isn't as big of a concern.) We found that picking out the flaws or expressing our preference between the very well measuring studio monitors was quite difficult. In our first test, we had some real dogs: https://www.audiosciencereview.com/...stening-test-results-kef-jbl-revel-osd.25818/ and they stood out like a sore thumb. Listening to an good speaker and then a very poor speaker makes the flaws in the poor speaker much more obvious.

Were tests conducted that only contained excellent speakers? Were trained listeners able to pick out the slight imperfections in the various loud speakers? I my guess is that yes, after enough listening.

Floyd Toole · Apr 15, 2023

You anticipate correctly. At Harman we spent many sessions comparing the best loudspeakers we could lay our hands on - which was many, but obviously not all - for competitive analysis. Winning was the only goal of the design engineers. However, the top performing loudspeakers were, as might be expected, caught in a "photo-finish" race. There were small fluctuations in the ratings, correlated with program material, listener attentiveness, and who knows what else, but in some instances, at the end of the exercise the average scores of the best loudspeakers were too close to declare an absolute winner - a statistical tie. Is this a "win"? Not really, because perfection, as indicated in measurements, might need more work. Is it good enough? Probably, because if in a multiple-loudspeaker double-blind a clear winner cannot be declared, it is improbable that a hidden flaw or virtue would be discovered in real-world listening in a private home.

As we progress to being able to design and build (consistently!) more neutral loudspeakers the differences will progressively disappear, as they have in well designed electronics. The room resonance effects at low frequencies become the dominant factor in real-world installations, and these can only be dealt with in-situ. Another topic, which is covered in Chapter 8. Overall room acoustics will always be an audible factor, but as discussed in Section 7.6.2 it is clear that humans adapt to "normalize" much of the room effect. To a substantial extent we are able to separate the sound source from the room sound. It is why we enjoy live music in many different venues - the room sounds are distinctive, but the Steinway, the Stradivarius and the vocalists are still timbrally intact.

Ralph_Cramden · Apr 15, 2023

Floyd Toole said:
Among the products evaluated was the Mirage M1, possibly the first well-designed dipole loudspeaker.

Uh-oh. Peter J Walker's ESL-63 wasn't a well-designed dipole?

Blumlein 88 · Apr 15, 2023

Ralph_Cramden said:
Uh-oh. Peter J Walker's ESL-63 wasn't a well-designed dipole?

Well Mr. Walker's wonderful design didn't do well in the tests vs some other speakers. BTW, the Mirage speakers were bipoles. Meaning the sound from front and back were in phase. Panels are dipoles where the front and rear sound is out of phase. That is why the M1s were almost omni-directional over some of the frequency range.

And in case you think me anti-panel or Quad, I owned ESL-63s for a decade as well as a few other ESL panel speakers.

Floyd Toole · Apr 15, 2023

Ralph_Cramden said:
Uh-oh. Peter J Walker's ESL-63 wasn't a well-designed dipole?

Uh-oh, I goofed - actually I just discovered that spell check does not like what I typed: "bipole" The M1 was a bipole - equal sound same polarity radiated front and back. Thanks for noticing, I corrected it.

The Quad was a good design. I knew Peter Walker; a clever man and a true gentleman with whom I shared a few Scotches in his living room.

MediumRare · Apr 15, 2023

@Floyd Toole said: However, if it is all to come together to provide state-of-the-art sound reproduction in our homes, the process must begin with “neutral”, resonance-free loudspeakers. They cannot be reliably identified in steady-state room curves, only anechoic data - or elaborate double-blind listening - can tell the tale.

Could you please expand on this point? I was under the impression that we could use REW, close-miked, to approximate the response of a speaker, down to the individual drivers if one mikes closely enough. From a medium distance (say, 1 meter) can we not get enough information (whether with a sweep or with pink noise) to identify a speaker’s problems? Or at least, at the primary listening position, see the interaction of the speaker and the room to identify FR issues, modes and anti-modes, to correct? I have clearly heard (as @amirm also reports) resonances during sweeps. Presumably differences between speakers would be readily apparent, no?

Floyd Toole · Apr 15, 2023

MediumRare said:
@Floyd Toole said: However, if it is all to come together to provide state-of-the-art sound reproduction in our homes, the process must begin with “neutral”, resonance-free loudspeakers. They cannot be reliably identified in steady-state room curves, only anechoic data - or elaborate double-blind listening - can tell the tale.

Could you please expand on this point? I was under the impression that we could use REW, close-miked, to approximate the response of a speaker, down to the individual drivers if one mikes closely enough. From a medium distance (say, 1 meter) can we not get enough information (whether with a sweep or with pink noise) to identify a speaker’s problems? Or at least, at the primary listening position, see the interaction of the speaker and the room to identify FR issues, modes and anti-modes, to correct? I have clearly heard (as @amirm also reports) resonances during sweeps. Presumably differences between speakers would be readily apparent, no?

In previous posts (same thread?) I have discussed the merits and problems of conventional "room curves" measured at the listing position, and environs. They are reliable only below about 500 Hz; in fact they are definitive data at the lowest frequencies. Above that frequency they are a complex sum of direct and reflected sounds, and what an omni microphone reveals is not reliably related to what two ears and a brain perceive. That information is in comprehensive on and off axis anechoic data. As I have said a few times, the steady state room curve is a result, not a target.

Your suggestion of using near-field measurements definitely gets one closer to the anechoic truths, but then we get to the matter: is it close enough? Much data shows that one needs about 1/20-octave resolution to reliably identify and evaluate the presence and audibility of resonances. Such measurements need to be made in the far-field of the source to be reliable - for consumer loudspeakers not less than about 2 m. Many resonances show up in the crossover regions where multiple drivers are active so being in the far field is necessary to see the true acoustical summation of the driver outputs. For this reason a better measurement alternative is to time-window the measurement to avoid corruption by room reflections. This will compromise the measurement resolution at middle and lower frequencies, depending on the size of the room and the placement of loudspeaker and microphone, but the data are "quasi-anechoic" and therefore more trustworthy.

Then there is the matter of off-axis performance which is necessary to separate evidence of resonances from evidence of acoustical interference. Interference ripples often occur in the on-axis response, only to disappear off axis; this is one virtue of using the "listening window" data in the spinorama in an assessment of the direct sound. Resonances show up as similar humps or bumps in all or many curves, while acoustical interference effects change with angle and are therefore attenuated in a spatial average.

In the end, "some" data is better than none. In the absence of spinorama or similar anechoic data one can definitely learn useful things about a loudspeaker using steady-state measurements, and much more using gated measurements, but it is necessary to guard against artifacts and the always attractive smoothed, limited resolution, curves at the lower end of gated measurements. A family of on- and off-axis curves can be very revealing, as was shown in my very early 1985/86 JAES publications, samples from which are shown in Figure 5.2 in the 3rd edition.

OK?

MediumRare · Apr 15, 2023

Floyd Toole said:
In previous posts (same thread?) I have discussed the merits and problems of conventional "room curves" measured at the listing position, and environs. They are reliable only below about 500 Hz; in fact they are definitive data at the lowest frequencies. Above that frequency they are a complex sum of direct and reflected sounds, and what an omni microphone reveals is not reliably related to what two ears and a brain perceive. That information is in comprehensive on and off axis anechoic data. As I have said a few times, the steady state room curve is a result, not a target.

Your suggestion of using near-field measurements definitely gets one closer to the anechoic truths, but then we get to the matter: is it close enough? Much data shows that one needs about 1/20-octave resolution to reliably identify and evaluate the presence and audibility of resonances. Such measurements need to be made in the far-field of the source to be reliable - for consumer loudspeakers not less than about 2 m. Many resonances show up in the crossover regions where multiple drivers are active so being in the far field is necessary to see the true acoustical summation of the driver outputs. For this reason a better measurement alternative is to time-window the measurement to avoid corruption by room reflections. This will compromise the measurement resolution at middle and lower frequencies, depending on the size of the room and the placement of loudspeaker and microphone, but the data are "quasi-anechoic" and therefore more trustworthy.

Then there is the matter of off-axis performance which is necessary to separate evidence of resonances from evidence of acoustical interference. Interference ripples often occur in the on-axis response, only to disappear off axis; this is one virtue of using the "listening window" data in the spinorama in an assessment of the direct sound. Resonances show up as similar humps or bumps in all or many curves, while acoustical interference effects change with angle and are therefore attenuated in a spatial average.

In the end, "some" data is better than none. In the absence of spinorama or similar anechoic data one can definitely learn useful things about a loudspeaker using steady-state measurements, and much more using gated measurements, but it is necessary to guard against artifacts and the always attractive smoothed, limited resolution, curves at the lower end of gated measurements. A family of on- and off-axis curves can be very revealing, as was shown in my very early 1985/86 JAES publications, samples from which are shown in Figure 5.2 in the 3rd edition.

OK?

Yes, thanks for that very helpful explanation. One last question about the above: Can the REW Waterfall display be used to identify resonances, especially if two different speakers are compared? (My thought is that would enable the identification of common room resonances, if any, so they can be removed from the speaker comparison.)

Floyd Toole · Apr 15, 2023

MediumRare said:
Yes, thanks for that very helpful explanation. One last question about the above: Can the REW Waterfall display be used to identify resonances, especially if two different speakers are compared? (My thought is that would enable the identification of common room resonances, if any, so they can be removed from the speaker comparison.)

Classic waterfalls - showing amplitude vs. frequency vs. time - simply confirm what is shown in the frequency response as transducers are minimum-phase devices. That is why smooth and flat curves are desirable - on- and off-axis. If there is a bump in the frequency response that is not acoustical interference it will show ringing. However it is not the ringing that is reliable evidence of audibility. See Sections 4.6.2, 4.6.3 and 4.6.4 in the 3rd edition for a full explanation.

To get closer to anechoic data you can go outdoors, aim the loudspeaker at the sky and make measurements at points on a 2 m radius. This will be useful at middle and high frequencies. To get lower, one needs a tower for the speaker and mic to get away from the ground.

thecheapseats · Apr 15, 2023

fascinating read - specifically for me, regarding three points... First, was your comment that persons' predisposition to finding faults or virtues of a speaker, once they qualified as well-trained listeners, was subtractive...

Floyd Toole said:
...So, the important surprise was that listeners were rating loudspeakers according to the absence of audible problems more than the recognition of virtues... ...Eliminate the distracting colorations and distortions and the sound quality improves... ...They would write short essays... ...to describe what they did not like... and offer only brief compliments about the good ones.

having had more than my share of sales reps and speaker designers asking to demo monitors in my studio control room in my first ten years (yes it was forty+ years ago - and your comment regarding the period's awful monitors is duly noted) it was usually a very quick listen to say "no, they don't sound right"... only occasionally did I agree to keep a pair for a few days so I might listen more closely...

Second point you made...

Floyd Toole said:
Are musicians and recording engineers better able to offer definitive opinions than the great unwashed?... ...None of this is new... ...The answer is maybe, maybe not.... Musicians? If they are also audiophiles, if not all bets are off. Audio professionals/recording engineers? Another surprise... ...was to see measurable effects of hearing loss...

I thoroughly agree - starting a career as a studio musician in L.A. (almost fifty years ago) - I certainly didn't know how to listen... and I didn't learn how to listen - what to listen for - and more importantly 'why' (at times) listening at lower volumes to hear details more clearly, actually worked - until spending a lot time on the 'other side' of the glass in recording studios with great engineers and notable mastering engineers - who also insisted upon not using 'bad speakers'...

as well, many of them were adamant about not monitoring at absurdly loud levels... to this day it's a physical reflex to adjust volume when working and then pulling the fader down - just because... the House Ear Institute doesn't need any more business - especially mine...

Third - very quickly...

Floyd Toole said:
...Notions that recordings need to be “detuned” for mass consumption are misguided...

just thank you... seriously, man - thank you... you may (I'm guessing) have an idea how great that comment is to hear...

be well...

MattHooper · Apr 15, 2023

Floyd Toole said:
With conventional forward-firing loudspeakers one cannot simultaneously have both flat frequency response (meaning on axis) and flat sound power response. In terms of what is important to listeners in normal small rooms, the direct sound is dominant, and this is well described by the on-axis response. 57 years of double-blind listening tests in a variety of rooms using hundreds of listeners and loudspeakers confirms it.

You asked for my thoughts on other things as well, so I revised an old summary of the scientific work that I initiated 57 years ago - that makes me really old! It probably is overkill, but you might find it interesting.

As the author of “the book” (Sound Reproduction, 3rd edition) it seems clear to me that not everyone in this discussion has read or understood it. I don’t wish to be repetitive, and apologies to those who already know what follows. The topic is not new to this forum, but some clarification and repetition might help. Humans are complex creatures, and several results from our investigations came as surprises to me and my colleagues. They may to you as well.

The first surprise, beginning with the initial crude blind listening tests in 1966, was that most people, most of the time, agreed on the loudspeakers they preferred. The sound quality scores were highly repeatable in successive randomized presentations. Where was the much touted “personal preference”? People who volunteered their loudspeakers for these early tests sometimes ended up disliking the high-priced “audiophile” products they had been living with. They had clearly adapted to the colorations sufficiently that music was enjoyed. They ended up changing their loudspeakers, having heard ones that were more appealing. As documented in JAES papers and “the book” test methodology has evolved to the point where subjective ratings from properly conducted listening tests can be treated as useful data to be correlated with technical measurements.

The second surprise was that the choice of program material mattered much less than was anticipated. Initial thoughts were that “natural”, mostly classical, recordings would be essential for listeners to be able to recognize excellence. It turned out that popular, studio creations, were as good or better in allowing listeners to express consistent ratings. What were they listening for? They could not recognize excellence in something they had never heard before, and which was created with close miking, equalization, and effects, all monitored, mixed and mastered using loudspeakers and rooms of unknown properties.

The clue came in a visual inspection of the associated anechoic measurements. All the highly rated loudspeakers exhibited flattish, smooth frequency responses on and off axis, meaning what? Meaning an absence of resonances. Resonances are the building blocks of musical sounds, including voices. The timbre of voices and instruments is fundamentally determined by the resonant structure. If this is changed by resonances in loudspeakers, they become monotonous colorations that are added to every sound that passes through them. Once they are revealed, they can become very hard to ignore, and they are heard in program of many different kinds, even in unfamiliar sounds – like pink noise, which is the most revealing of all sounds. Resonances are easily identified in comprehensive anechoic measurements, like spinoramas, but most are invisible in steady-state room curves, so much confusion exists. Room curves are a result of, not a target for, loudspeaker performance.

So, the important surprise was that listeners were rating loudspeakers according to the absence of audible problems more than the recognition of virtues. Eliminate the distracting colorations and distortions and the sound quality improves: “realistic”, “high-resolution”, “air”, and many other flattering adjectives apply. They would write short essays, often including profanities, to describe what they did not like about lowly-rated loudspeakers, and offer only brief compliments about the good ones. Program material that had wide bandwidth (bass can be 30% of an overall sound quality rating) and a dense spectrum (complex orchestration and reverberation) turned out to be most revealing of resonances. Solo voices and instruments, not so much. See Section 3.5.1.7 in the 3rd edition for more info. There is music for demonstration and music for examination – they are different, but often get confused. “Audiophile” music tends towards the “demo” category, as it sounds good through many loudspeakers – not revealing their problems.

As an aside it is worth noting that such “neutral” loudspeakers had a tendency to “disappear” behind the visually opaque screen used in the double-blind tests. Resonances were identified with the loudspeaker, not the program and when absent, the program was more clearly revealed – there was depth. Tests were mostly done in mono, where the effect was quite apparent. When repeated in stereo, the high sound quality ratings remained, but diluted by the fundamental flaws of stereo itself – much discussed elsewhere in this forum.

Luck played a role in this, because in my very first blind listening test I used an equal-loudness four-loudspeaker, randomly switched experimental method. Loudspeaker positions were randomized between repeated sessions to attenuate room effects. This multiple-loudspeaker method allows listeners to quickly separate the timbre of the program (constant) from that added by the loudspeaker (variable). As we have learned since, listeners quickly learn to “listen through” rooms to a significant extent, as we do in everyday listening and in live performances. More adaptation. The “take it home and listen to it” used by consumers and reviewers cannot compete – they often attribute their adapation to the product “breaking in” – more rubbish from unscientific methodology. But doing proper tests requires work and apparatus, as exemplified by the OP of this thread. Again, congratulations!

The next logical step was to investigate the audibility of resonances, which itself was a learning experience. Toole, F. E. and Olive, S.E. (1988). “The modification of timbre by resonances: perception and measurement”, J. Audio Eng. Soc., 36, pp. 122-142. Section 4.2 in the 3rd edition.

Those who participated in the experiments found themselves hearing resonances in daily life that had previously been ignored. This enhanced sensitivity faded, fortunately, but it emphasized just how important it was to eliminate resonances in loudspeakers. Later Sean Olive developed a training program for listeners that improved their ability to recognize and identify resonances. It is (or was, I don’t know now) available for download. Such listeners became the “trained listeners” in Harman listening tests. They arrived at their opinion of sound quality quickly, and those opinions agreed with those of untrained listeners who simply took longer to form consistent opinions. All listeners were screened for normal hearing.

What about listener’s life experiences? Are musicians and recording engineers better able to offer definitive opinions than the great unwashed? I discuss this in detail in “the book” and in the 1985/86 JAES papers. None of this is new.

Toole, F. E. (1985). “Subjective measurements of loudspeaker sound quality and listener preferences”, J. Audio Eng. Soc., 33. pp. 2-31.

Toole, F. E. (1986). “Loudspeaker measurements and their relationship to listener preferences”, J. Audio Eng. Soc., 34, pt.1, pp. 227-235, pt. 2, pp. 323-348.

The answer is maybe, maybe not. Musicians? If they are also audiophiles, if not all bets are off. Audio professionals/recording engineers? Another surprise, as described in detail in the 1986 paper, and also in the earlier editions of “the book”, was to see measurable effects of hearing loss in the subjective sound quality ratings. It involved an elaborate collaboration with the Canadian Broadcasting Corporation (CBC/Radio Canada) to select a family of monitor loudspeakers, small, medium and large for use across the national network. They provided most of the listeners from their staff, some of whom exhibited high variability in their repeated sound quality ratings. It turned out that they had hearing loss, an occupational hazard in the audio industry, especially when loud monitoring sessions are combined with recreational or professional musician activities. Those professionals with normal hearing all preferred the same loudspeakers as the audiophiles in the same series of tests.

An interesting observation: a couple of the recording engineers stated that they had never heard such good sound before – and it was coming from both pro and consumer products, all having similar looking, good, anechoic measurements. Knowing what they were listening to in their professional lives explained it. Back in the 80s there were some truly dreadful pro monitor loudspeakers in common use – see Sections 12.5.1 and 18.3 in the 3rd edition for some examples. Nowadays, there is less difference between the domains – a good loudspeaker is a good loudspeaker – but a professional loudspeaker must not break. “Dead air” is to be avoided, so there is an extra challenge in designing pro speakers, which makes It even more impressive when one finds pro monitors that compete with the best consumer loudspeakers in terms of timbral neutrality and overall sound quality. It can be done. It is a relevant fact that mainstream loudspeakers, including some little wireless “smart” devices, increasingly exhibit quite neutral performance. Active loudspeakers have a huge advantage over their passive equivalents. Notions that recordings need to be “detuned” for mass consumption are misguided. Headphone listeners – the majority? – can find superb sound quality at modest cost these days.

Chapter 17 in the 3rd edition describes some of what is now known about hearing loss and it is not good news. In the context of listening tests, we lose the ability to form consistent opinions – liking things and then, later, disliking exactly the same sounds. I am not immune to such effects. In my youth I was an excellent listener, delivering sound quality ratings with small standard deviations. With age things changed, and around age 60 I realized that judgements were not coming with the same ease, and this was confirmed in my sound rating statistics. I retired from the listening tests. Figure 3.6 shows my hearing thresholds as I age, along with examples from the CBC test population. These people simply are not hearing all the sound. I still have opinions, articulately described, but they are now relevant only to me, not for public consumption. Fortunately, we now have spinoramas, from which a neutral loudspeaker can be recognized.

The remaining factor is spectral balance, the broadband frequency response trends that are easily heard, especially at low frequencies, where the equal-loudness contours crowd together. See Sections 4.4, and 4.4.1 for elaboration on the meaning of the contours and a discussion of “loudness controls”. Spectral balance is very important to listening satisfaction and this is a situation where tone controls or easily accessible equalization are essential for fussy listeners. Different programs, for many reasons, exhibit different spectral balances. Most often this is in the bass region, which is also affected by playback sound level - the equal-loudness curves. It is unrealistic to think that one setting, one “calibration” will sound similarly good with all programs at all playback sound levels.

However, if it is all to come together to provide state-of-the-art sound reproduction in our homes, the process must begin with “neutral”, resonance-free loudspeakers. They cannot be reliably identified in steady-state room curves, only anechoic data - or elaborate double-blind listening - can tell the tale.

Wow was that a great read! So much information packed in to a precise summary and narrative. Thanks again!

fineMen · Apr 17, 2023

Floyd Toole said:
You asked for my thoughts on other things as well, so I revised an old summary ....

Thank you so much for coming back to this. At least it clarified for me, that e/q for the PIR or the LW ain't that promising. The reasoning behind the listeners' preferrence appears to me personally a bit far fetched. How on earth could the ear detect resonances, while a 24bit/96KHz apparatus cannot?

Do we need some new method to evaluate the sound detected by a microphone, replicating the data compilation done by the ear? No more Fourier, but bands and all?

Anyway, I never found the reverse to happily adjusting a speaker by equalization to better ratings. What is necessary to make a 'good' speaker a 'bad' one? I'm especially curious because I would rate the winner in the OP's competition not that high. What if the second best was e/q'ed to mimic the first? Now as spinorama data is availabe routinely it may still appear as a waste of time to 'destroy', but for science's sake it could be worthwhile. I did it already with my new ones, unvoluntarily. A Q=7 peak @2kHz, only 3dB high and what a mess!

Thanks again for the summary.

MatthewS · Apr 17, 2023

fineMen said:
Anyway, I never found the reverse to happily adjusting a speaker by equalization to better ratings. What is necessary to make a 'good' speaker a 'bad' one? I'm especially curious because I would rate the winner in the OP's competition not that high. What if the second best was e/q'ed to mimic the first?

I'll start off with mentioning again that we did not have a large enough sample size to achieve statistical significance.

What I'm reading from this comment is that you don't personally like the JBL 305p MkII and so this result is causing you cognitive dissonance. In the video content, I clearly put forth a possible reason for the JBL slightly edging out the Neumann. The Neumann starts rolling off at 70hz. The JBL starts rolling off at about 55hz. As @Floyd Toole, has mentioned earlier, bass accounts for about 30% of the preference rating indicated by listeners.

I'll reproduce an image from the video here showing the estimated in-room response, the JBL advantage in the low frequency is obvious.

For what is worth, I scored the Neumann over the JBL in 2/3 songs, but picked the JBL in the other instance. (Turntable broke before I finished the rest). Looking at the Spinorama data, we can likely predict order of preference quite reliably if we collect a proper sample size.

No. 5 · Apr 17, 2023

fineMen said:
How on earth could the ear detect resonances, while a 24bit/96KHz apparatus cannot?

Well for one thing, the 24bit/96kHz apparatus is spatially ignorant but your ears are not. Or temporal windowing, if the aforementioned apparatus is lumping all acoustic events within 500ms together it won’t have a chance to “see” what the ear is perceiving.

Thomas_A · Apr 17, 2023

MatthewS said:
I'll start off with mentioning again that we did not have a large enough sample size to achieve statistical significance.

What I'm reading from this comment is that you don't personally like the JBL 305p MkII and so this result is causing you cognitive dissonance. In the video content, I clearly put forth a possible reason for the JBL slightly edging out the Neumann. The Neumann starts rolling off at 70hz. The JBL starts rolling off at about 55hz. As @Floyd Toole, has mentioned earlier, bass accounts for about 30% of the preference rating indicated by listeners.

I'll reproduce an image from the video here showing the estimated in-room response, the JBL advantage in the low frequency is obvious.

View attachment 279911

For what is worth, I scored the Neumann over the JBL in 2/3 songs, but picked the JBL in the other instance. (Turntable broke before I finished the rest). Looking at the Spinorama data, we can likely predict order of preference quite reliably if we collect a proper sample size.

Normalizing them to 1 kHz would also give the JBL as a bit more energy in the 100-200 Hz range, and with an increased clarity due to the higher energy in the >2 kHz range compared to the fundamentals of the voice range. That said, if music content had a lot of 50 Hz energy, you could also probably pick the JBL as better.

Blind Listening Test 2: Neumann KH 80 vs JBL 305p MkII vs Edifier R1280T vs RCF Arya Pro5

Addicted to Fun and Learning

Senior Member

Major Contributor

Major Contributor

Senior Member

Member

Senior Member

Major Contributor

Grand Contributor

Senior Member

Major Contributor

Senior Member

Major Contributor

Senior Member

Addicted to Fun and Learning

Master Contributor

Major Contributor

Member

Active Member

Major Contributor

Similar threads