• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The frailty of Sighted Listening Tests

D

Deleted member 17820

Guest
Well said. Engineers plow with the horses they've got, because otherwise nothing gets done. And it's the engineers, not their critics, who are making actual contributions.
Well, it's the critics who are promoting products to buyers, and buying products that pays the bills for engineers to design so how can you say creating market demand for industry is not a contribution?
 

Duke

Major Contributor
Audio Company
Forum Donor
Joined
Apr 22, 2016
Messages
1,555
Likes
3,860
Location
Princeton, Texas
Engineers plow with the horses they've got, because otherwise nothing gets done. And it's the engineers, not their critics, who are making actual contributions.

Well, it's the critics who are promoting products to buyers, and buying products that pays the bills for engineers to design so how can you say creating market demand for industry is not a contribution?

I don't think you and I are using the word "critic" the same way.
 
Last edited:

Floyd Toole

Senior Member
Audio Luminary
Technical Expert
Industry Insider
Forum Donor
Joined
Mar 12, 2018
Messages
367
Likes
3,905
Hello all. After a long forum abstinence I dipped in for a peek. I have not read this entire thread, but I fully understand the uncertainties about the ultimate predictive capabilities of the Olive model vs. real life preferences. Numbers and acoustical measurements are stable and repeatable, but may not be complete, or in the optimum raw or processed form to maximize correlations. Subjective ratings, the parallel "universe", are not stable. Add to that the reality that subjective ratings include the enormous variations in program material and it becomes clear that beyond a certain point (of physical differences between products in this case) there can be no clear "absolute" preference - no "best" loudspeaker.

What was learned with great certainty was that an absence of audible resonances is a critical requirement for good sound. Sean and I examined the detection thresholds in great detail, and there is a point below which listeners, statistically, are not aware of resonances - i.e. the curves need not be ruler flat, although that clearly is the ideal. Complicating this is the reality that the ability to hear disruptive resonances is program dependent. Some highly entertaining vocal and instrumental combinations are simply very forgiving, but at the same time very enjoyable. A lot of high-end demo material is actually not very revealing of these problems, which may be why it has been selected (smile).

Once resonances have been tamed, the next level of interest is broadband spectral balance. Here it is clear that bass level is very influential (partly because of the closer spacing of the equal loudness contours). Because of upward masking, bass level affects what is heard for octaves above the bass region. Many listeners over the years would complain about too much treble when the real problem was not enough bass (or the reverse) - it is a balancing act. This is why I keep on harping about the need for old fashioned tone controls. "Room EQ" can do it if one restricts it's capabilities, but once set it is permanent, unable to address changes in program, playback level or personal taste. Beyond a certain point one needs to get a speaker into the listening room, and listen to what one has. What the spinorama, double-blind listening and Olive's model have done is make it possible for manufacturers to consistently make fundamentally neutral sounding products, which are then amenable to spectral balance adjustments. That this is now possible at eminently affordable prices is the worthy reward. Most guesswork has been eliminated.

Product reviewers, for the most part, don't take these factors into consideration. It is assumed that the loudspeaker out of the box is what it is, with little regard for the physics of room resonances and the very real necessity for room mode controls of some kind. Tone controls are assumed to be the work of the devil. The parallel assumption is that recordings are flawless revealers of problems and virtues. Clearly they lack both knowledge and facilities to do what they do thoroughly. Technically mediocre loudspeakers still manage to get some rave reviews and well designed loudspeakers suffer criticism for the wrong reasons. Sadly, this lack of rigor and facilities also applied to many loudspeaker manufacturers over the years, but nowadays there is no excuse.

Forum discussions of this calibre are a useful antidote . . .
 
Last edited:

Duke

Major Contributor
Audio Company
Forum Donor
Joined
Apr 22, 2016
Messages
1,555
Likes
3,860
Location
Princeton, Texas
Apologies for going tangential, but Dr. Toole brings up something I'd like to ask about:

... bass level is very influential (partly because of the closer spacing of the equal loudness contours).


It seems to me this closer spacing of the equal loudness contours implies that in-room peaks in the bass region may be subjectively even worse than they appear at first glance. And conversely the subjective improvements from smoothing the in-room bass response may be greater than eyeballing the "before" and "after" curves implies.
 
Last edited:

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,785
Hello all. After a long forum abstinence I dipped in for a peek. I have not read this entire thread, but I fully understand the uncertainties about the ultimate predictive capabilities of the Olive model vs. real life preferences. Numbers and acoustical measurements are stable and repeatable, but may not be complete, or in the optimum raw or processed form to maximize correlations. Subjective ratings, the parallel "universe", are not stable. Add to that the reality that subjective ratings include the enormous variations in program material and it becomes clear that beyond a certain point of (physical differences between products in this case) there can be no clear "absolute" preference - no "best" loudspeaker.

What was learned with great certainty was that an absence of audible resonances is a critical requirement for good sound. Sean and I examined the detection thresholds in great detail, and there is a point below which listeners, statistically, are not aware of resonances - i.e. the curves need not be ruler flat, although that clearly is the ideal. Complicating this is the reality that the ability to hear disruptive resonances is program dependent. Some highly entertaining vocal and instrumental combinations are simply very forgiving, but at the same time very enjoyable. A lot of high-end demo material is actually not very revealing of these problems, which may be why it has been selected (smile).

Once resonances have been tamed, the next level of interest is broadband spectral balance. Here it is clear that bass level is very influential (partly because of the closer spacing of the equal loudness contours). Because of upward masking, bass level affects what is heard for octaves above the bass region. Many listeners over the years would complain about too much treble when the real problem was not enough bass (or the reverse) - it is a balancing act. This is why I keep on harping about the need for old fashioned tone controls. "Room EQ" can do it if one restricts it's capabilities, but once set it is permanent, unable to address changes in program, playback level or personal taste. Beyond a certain point one needs to get a speaker into the listening room, and listen to what one has. What the spinorama, double-blind listening and Olive's model have done is make it possible for manufacturers to consistently make fundamentally neutral sounding products, which are then amenable to spectral balance adjustments. That this is now possible at eminently affordable prices is the worthy reward. Most guesswork has been eliminated.

Product reviewers, for the most part, don't take these factors into consideration. It is assumed that the loudspeaker out of the box is what it is, with little regard for the physics of room resonances and the very real necessity for room mode controls of some kind. Tone controls are assumed to be the work of the devil. The parallel assumption is that recordings are flawless revealers of problems and virtues. Clearly they lack both knowledge and facilities to do what they do thoroughly. Technically mediocre loudspeakers still manage to get some rave reviews and well designed loudspeakers suffer criticism for the wrong reasons. Sadly, this lack of rigor and facilities also applied to many loudspeaker manufacturers over the years, but nowadays there is no excuse.

Forum discussions of this calibre are a useful antidote . . .

Good to see you back Dr. Toole! :)

What would in your opinion be the ideal solution for tone controls? Would that be some modern software version of the old equalizers which would apply gain at predetermined set of frequencies using PEQ filters with some appropriate Q?

Do I understand correctly that you wouch for using them to modify playback of recorded material to your personal taste and/or to adjust the overall ballance of the response to the playback volume on top of standard room EQ filters applied below Schroeder?
 

Floyd Toole

Senior Member
Audio Luminary
Technical Expert
Industry Insider
Forum Donor
Joined
Mar 12, 2018
Messages
367
Likes
3,905
This is very likely true that we are more sensitive to bumps at low frequencies, although I know of no definitive study of it. Chapter 8 in the 3rd edition of my book has much discussion on dealing with this problem, including section 8.3 "Do we hear the spectral bump, the temporal ringing, or both". The results of some significant research efforts may be surprising - we seem not to pay much attention to physical ringing, although our perceptions suggest otherwise.

As for the optimum implementation of tone controls, it is likely that the old analog Baxandall bass/treble tilts are not optimum, albeit probably better than nothing. More recent work by my colleagues at Harman shed some light on this (e.g. Figure `12.7 in my book). It is not easy to do these subjective tests because there is a strong interaction between overall loudness and spectral variations that operate over a significant part of the frequency range. In other words, boosting the bass may be interpreted as turning up the volume, which of course it is. Youthful listeners are sometimes easily impressed and may prefer louder reproduction than mature (experienced?) listeners - in Figure 12.7 the untrained listeners boosted both treble and bass.
 

Duke

Major Contributor
Audio Company
Forum Donor
Joined
Apr 22, 2016
Messages
1,555
Likes
3,860
Location
Princeton, Texas
This is very likely true that we are more sensitive to bumps at low frequencies, although I know of no definitive study of it. Chapter 8 in the 3rd edition of my book has much discussion on dealing with this problem, including section 8.3 "Do we hear the spectral bump, the temporal ringing, or both". The results of some significant research efforts may be surprising - we seem not to pay much attention to physical ringing, although our perceptions suggest otherwise.

Thank you for replying. Yours is the only book of which I own multiple editions.

In promoting my subwoofer product (four small subs, called the "Swarm"), my marketing department claims that "smooth bass" = "fast bass", both subjectively and objectively, the latter based on the assumption that subs + room = a "minimum phase system" at low frequencies. However if this is incorrect, please let me know. My marketing department has been known to get carried away.
 

MattHooper

Master Contributor
Forum Donor
Joined
Jan 27, 2019
Messages
7,288
Likes
12,193
Some highly entertaining vocal and instrumental combinations are simply very forgiving, but at the same time very enjoyable. A lot of high-end demo material is actually not very revealing of these problems, which may be why it has been selected (smile).

Yeah, there are some recordings that have the tendency to "sound impressive on any speaker system" (or on any remotely competent speaker system). I tend to be suspicious when such tracks are chosen for demos.

I like to include (like most here I'm sure) a good variety of familiar content. I also include some recordings that are "fairly good" but have some problems (e.g. very mild distortion of vocals) to see if a system exacerbates or hides the issues.
 

richard12511

Major Contributor
Forum Donor
Joined
Jan 23, 2020
Messages
4,335
Likes
6,702
Yeah, there are some recordings that have the tendency to "sound impressive on any speaker system" (or on any remotely competent speaker system). I tend to be suspicious when such tracks are chosen for demos.

I just wish they'd play more normal music that non audiophile's listen to. Just going by what you hear at an average audioshow, you'd think that Rebecca Pidgeon is more popular than The Beatles, or Queen.
 

preload

Major Contributor
Forum Donor
Joined
May 19, 2020
Messages
1,559
Likes
1,703
Location
California
I just wish they'd play more normal music that non audiophile's listen to. Just going by what you hear at an average audioshow, you'd think that Rebecca Pidgeon is more popular than The Beatles, or Queen.

I only listen to Hotel California on repeat, thank you very much.
 

bobbooo

Major Contributor
Joined
Aug 30, 2019
Messages
1,479
Likes
2,079
As for the optimum implementation of tone controls, it is likely that the old analog Baxandall bass/treble tilts are not optimum, albeit probably better than nothing. More recent work by my colleagues at Harman shed some light on this (e.g. Figure `12.7 in my book). It is not easy to do these subjective tests because there is a strong interaction between overall loudness and spectral variations that operate over a significant part of the frequency range. In other words, boosting the bass may be interpreted as turning up the volume, which of course it is. Youthful listeners are sometimes easily impressed and may prefer louder reproduction than mature (experienced?) listeners - in Figure 12.7 the untrained listeners boosted both treble and bass.

Hi Dr. Toole, maybe this is a question more appropriate for Dr. @Sean Olive who conducted the study, but as I understand, this means there was no loudness compensation to stay at a constant overall perceived loudness when the listeners increased or decreased tone controls? If not, are you aware of any study on spectral preference using tone controls that does control for overall loudness? Maybe an additional reason untrained listeners preferred boosted bass and treble more than trained listeners could be that the former may tend to listen at higher average volumes (outside of the listening tests), and so may be used to relatively boosted perceived bass and treble due to the equal loudness contours, hearing anything less than this as not 'correct'. Conversely, maybe trained listeners are more likely to be aware of the dangers of NIHL and so tend to and are used to lower volumes, and so less perceived bass/treble boost, which could (partially) explain their preference for lower boosts in the listening tests.

In your book you mentioned the untrained listeners simply chose 'more of everything', and this could be due to loudness issues, or actual preference. I think a plausible third factor could be a (subconscious) 'more is better' bias, in terms of simply turning the controls up more, not necessarily connected to loudness or actual spectral sound preference. I also see a potential inverse bias with trained listeners - they may be biased against boosting bass and treble too much, as liking this kind of 'boom and tizz' response is regarded by some as an 'unrefined' or even 'immature' preference, which could result in them being overly cautious in increasing either tone control (subconsciously or otherwise). I believe these effects were partially accounted for by randomizing the initial bass/treble level and using 'infinite turning' controls to obfuscate where the extreme values were, but I'm not sure this would fully eliminate these biases, as a simple extended turning down and up of the control and listening to the response would reveal the position of the extremes to the listeners' ears (if not their eyes).

The only way I see to obviate these potential biases and the loudness issues as much as possible would be to play several differing predefined, loudness-compensated EQed samples of a track to the listeners, and ask them to rate/rank them 'fully blind' i.e. not even knowing what kind of differences there are between the samples (or a step further, even if the differences are digital, analogue or acoustic in origin by e.g. performing the test in front of a covered 'speaker shuffler'). The preferred target could then be narrowed down by adding new EQ profiles within the bounds of, say, the top three rated targets in round one, and iterate the procedure until the difference between the EQ profiles approaches the lower limit of audibility.

Similarly though, there's the question of how you overcome these issues when tone controls are used in real-world listening. I can see how it would be possible to link some kind of automatic loudness compensation system to tone controls, but I don't see how the potential 'more (or less) is better' biases could be eliminated here, which just adds to the many, many reasons why standardization between the audio production and reproduction sides is paramount, in order not to need tone controls to correct for discrepancies between them in the first place, and finally break the circle of confusion for good.
 
Last edited:

Floyd Toole

Senior Member
Audio Luminary
Technical Expert
Industry Insider
Forum Donor
Joined
Mar 12, 2018
Messages
367
Likes
3,905
bobbooo asked: " are you aware of any study on spectral preference using tone controls that does control for overall loudness?" Not that I can recall. If you look at Section14.2 in my book you will see that simply finding a reliable metric for loudness has been a significant problem. So, we are looking at finding a generalizable spectral preference in experiments employing metrics that are unreliable.

Why not let listeners have the means to easily adjust spectral balance depending on the music they are playing, the mood they are in, the playback level and how anally retentive they are about having such controls in their systems. I don't have it. I wish I did. So, I spend time making knowing adaptations for things I wish sounded different, and if I had the means I could and would improve. When reviewers encounter such situations they begin imagining things to blame. It is simple reality, for which there seems to be no cure. "Regulation" (a.k.a. standardization) is a bad word in the lexicon of many people. But even with perfectly standardized monitoring and playback circumstances there remains the variable of the recording & mastering engineers and their preferences and hearing abilities. Remember, hearing loss is an occupational hazard in the professional audio industry.
 
Last edited:

Putter

Senior Member
Forum Donor
Joined
Sep 23, 2019
Messages
497
Likes
778
Location
Albany, NY USA
Yes, I think you were one of the members I was thinking of in writing that ;-)





Possible, of course. As we've all repeated here until we are red in the face: blind testing is the gold standard when you *really* want to be sure.

And, of course, absent blind testing you also can't presume the situations I'm describing were due to that selection bias. You have no reason to go on whatever I say. But from my perspective, I've seen quite accurate subjective descriptions of pretty much every speaker I own from reviewers and other audiophiles. Also it reminds me that I did a long thread on audiogon describing my more recent speaker hunt, where I auditioned a ton of different well known speakers and gave my subjective descriptions of their character. Many people replied to the thread and there was almost no dissent from my descriptions, with most saying essentially "I know X or Y speakers and you've described what I hear."

And as I have said before, I've also been led to speakers I adore from the subjective reports of others. I kept encountering from reviewers and audiophiles a consensus on the general sound character of Devor O speakers, and the characteristics described were very much what I was looking for. I was also aware of the reasons why Revel speakers were highly regarded and generally understood what type of sound they produced through both measured and subjective descriptions. When I heard the Revels it was "Yup, no surprises there, competent in all the ways I expected." When I heard the Devores they exhibited just the slightly eccentric characteristics I'd read about, and I found myself loving my music through them more than the Revels.

So in both cases the subjective reviews by others seemed to really capture the character of each speaker brand. And I was actually led to a speaker I really liked by subjective reviews. Though, for various reasons, I ended up with another speaker which, again, there was a high level of concensus on the sonic characteristics of that speaker, exactly what I hear from them at home. So...yes, I personally find carefully parsing reviews and reports from other audiophiles to be somewhat helpful.

Are there possible biases operating here? Of course. That's possible. But whatever may happen under blinded conditions, it remains the case that strictly using sighted listening and exchanging notes with others using the same method, has led to a sufficient level of consistancy to be of use to me. Whatever may change under blind conditions, under the sighted conditions I listen, the speakers I get continue to sound "the same" as when I first heard them, and "the same" as others have described them, making for satisfying purchases.




Really? A year and 1/2 on this site, and here we are 26 pages in to this thread and... I had no idea. ;)




Yes, just like pretty much everyone here.

Sighted listening is less reliable than blind trials.

But "less" reliable" is not the same as "wholly unreliable."

Sighted listening, even given the ever present possibility of bias, is not necessarily useless. There is an ever present possibility of bias in your every perception, your every inference, in judging your every action. Yet, somehow, without blinded protocols to vet every inference, you seem to navigate the world in predictable-enough manner.

While I don't completely disagree that there can be commonalities in sighted listening tests, I also think of reading astrology predictions where whichever sign you read, you'll see something that you think applies to you. If a reviewer says electric guitars sound good on this speaker, you're primed to hear that. This can be somewhat accounted for if the characteristics are more specifically defined such as there was some sibilance of certain recordings or male singers had too much chestiness.

Another thought on subjective testing is that any frequencies below 300 hz or so are room dependent so that descriptions are probably less useful unless you have specific knowledge of their setup. I seem to recall that Len Feldman of Stereo Review did his sighted evaluations of speaker with music above these frequencies.
 
OP
P

patate91

Active Member
Joined
Apr 14, 2019
Messages
253
Likes
137
While I don't completely disagree that there can be commonalities in sighted listening tests, I also think of reading astrology predictions where whichever sign you read, you'll see something that you think applies to you. If a reviewer says electric guitars sound good on this speaker, you're primed to hear that. This can be somewhat accounted for if the characteristics are more specifically defined such as there was some sibilance of certain recordings or male singers had too much chestiness.

Another thought on subjective testing is that any frequencies below 300 hz or so are room dependent so that descriptions are probably less useful unless you have specific knowledge of their setup. I seem to recall that Len Feldman of Stereo Review did his sighted evaluations of speaker with music above these frequencies.

I recently ask a reviewer if adding room measurements with his day to day speakers to have an idea about what he's used to or prefer. I think Dr Toole said something similar that room informations are missing in reviews.

At this point this is something really easy to do for reviewers.
 

bobbooo

Major Contributor
Joined
Aug 30, 2019
Messages
1,479
Likes
2,079
boobooo asked: " are you aware of any study on spectral preference using tone controls that does control for overall loudness?" Not that I can recall. If you look at Section14.2 in my book you will see that simply finding a reliable metric for loudness has been a significant problem. So, we are looking at finding a generalizable spectral preference in experiments employing metrics that are unreliable.

Why not let listeners have the means to easily adjust spectral balance depending on the music they are playing, the mood they are in, the playback level and how anally retentive they are about having such controls in their systems. I don't have it. I wish I did. So, I spend time making knowing adaptations for things I wish sounded different, and if I had the means I could and would improve. When reviewers encounter such situations they begin imagining things to blame. It is simple reality, for which there seems to be no cure. "Regulation" (a.k.a. standardization) is a bad word in the lexicon of many people. But even with perfectly standardized monitoring and playback circumstances there remains the variable of the recording & mastering engineers and their preferences and hearing abilities. Remember, hearing loss is an occupational hazard in the professional audio industry.

Thanks very much for the reply. Yes it seems loudness compensation is not easy to get right. I've just remembered @Sean Olive did investigate its effect on bass preference for in-ear headphones as part of this paper, which did not show a statistically significant change in bass preference with and without loudness compensation if I understand correctly. (I'm not sure how applicable this would be to speaker bass preference though, especially due to absence of tactile bass with IEMs that can however be felt with well-extended speakers / subwoofers, and has been shown to influence bass preference in a couple of studies, again involving Sean Olive.) On a related note, personally I have found 'dynamic EQ' (e.g. those found in AVRs), effectively the opposite of loudness compensation systems, to sound somewhat unnatural. My suspected reasoning for this is that due to the equal loudness contours, the brain might actually expect sounds with lower SPL detected by the ear to subjectively have less bass and treble, so artificially increasing these creates mixed messages - recognizable sounds that have the low SPL of a quiet sound, yet the frequency profile of a loud one. It seems a lot of these dynamic EQ systems don't just adjust EQ according to volume though, but increase relative surround channel level at lower volumes too, and can change dynamically with the content being played as well, so are not great for isolating specifically any correlation between loudness and frequency response preference.

I totally agree on tone/EQ controls, they are useful at the listening end, but the more the differences between the monitoring and playback circumstances can be minimized, the less frequently the tone control knobs would be needed to be fiddled with (which can become annoying when listening to a varied playlist for example). To maintain standards, perhaps recording and mastering engineers should be periodically required to have a hearing test, and pass level 8 on Harman's How to Listen software :) Of course, any remaining discrepancies such as hearing loss of the listener could then be adjusted for using tone controls, as well as adjusting for any mismatch in preference between the listener and the intention of the artist and recording/mastering engineers.
 
Last edited:

prerich

Senior Member
Joined
Apr 27, 2016
Messages
320
Likes
240
Klippel is not science. It is a set of measurements. Those measurements are very difficult to interpret as "buy don't buy" against countless other speakers with similar looking measurements. 1 dB peak at 600 Hz is not the same as 1 dB peak at 1.5 kHz. Yet the score may be identical. We need to bridge that gap so people can purchase speakers without listening to them which is the norm today.

If we had a scoring system where we could all stand behind it so much that if it said speaker a is better than b, that would be the "truth," then sure, I would not need to do listening tests. But we are not there. Scoring system is like a compass that shows you north. It is not a turn by turn navigation system for driving in the city.

Also, when I first started to do measurements, people kept asking me what I recommend. I refused to say so. We had a bunch of debate threads about them. Eventually I got tired of answering those questions in private and in public and added the recommendations. That has proven to be hugely popular and rarely controversial. Today, I cannot, without listening to a speaker, give such recommendations. So as much work and aggravation it has turned out to be, I listen and provide this as a factor in my recommendation.

And no, not all "human beings" are the same. Which one of you has been exposed to nearly 80 speakers in the last 7 months where you could compare and correlate measurements to what you could hear? The answer is none. In other words, I am not situated like any of you. There are many things that apply to you that don't apply to me and vice versa. We rely on informed opinion of experts in real life all the time. Not sure why it is such a big deal to follow the same in audio.
Wow!!!! I'm glad I found this link in the PS Audio speaker post. You've just articulated everything that goes on in my head! I'm also glad that you add your listening perceptions as well as your data. Bravo Zulu
 
Top Bottom