• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

A Broad Discussion of Speakers with Major Audio Luminaries

Recording engineers are not as good as people specifically trained to seek out artefacts.

It depends on which artifacts you are referring to and what you mean by ´trained to seek out artifacts´. Which particular education do you mean?

I agree that judging recordings and judging lossy codecs are pretty different things, for example. My experience with listening tests when it comes to speakers and rooms, is that recording engineers with a solid education can be pretty good in that. FYI I happened to be with an institution doing research on all the aforementioned fields.
 
Recording engineers are not as good as people specifically trained to seek out artefacts.

Recording engineers are not trained specifically on artefact identification. They pick some of it up in their training and experience on the job, but not all of it. Their knowledge and experience is focused on the artefacts created during media capture. These are important, but there are a whole bunch of artefacts they never have to deal with such as distribution, reception, reconstruction etc.
Can't agree more. The recording as a whole is an artifact created in the process. At that point in the chain all relevant properties are scrutinized. Is it, as a whole, any good?

If you think of artifacts as defects, well, if the sound engineer and all his fellows think they are irrelevant (once actually listening to the music), why should I bother? Laymen are easily set on track to 'perfect', but what is the engineer's 'good enough'? The latter is the harder nut to crack. My original question: are the properties revealed when listening mono relevant for the intended use case? I mean, when listening stereo invalidates a lot of subjective caveats raised with mono?

O/k, lots of music is consumed 'mono', but using two speakers out of the stereo triangle. Or with the (in)famous bluetooth cash&carry devices. But in both cases ultimate hifi isn't the goal either.
 
Great, then I never have to consider buying better speakers, since the difference is inaudible without them being side-by-side.

no.
 
Recording engineers are not as good as people specifically trained to seek out artefacts.
Critical listening skills are now being taught in schools.
Screenshot From 2025-06-11 09-33-03.png

 
Critical listening skills are now being taught in schools.
Nice quote, thank you. Only, if it isn't that proverbal 'publish or perish', the author is with the 'School of Music, Theatre & Dance', Michigan. What do musicians listen for, intonation, modulation and all to the most delicate degree. Now they are going to be trained in playback degradation? In my book, sorry, that's not sensible. To not trust the playback system is the BIGGEST obstacle with enjoying (in case of musicians: analysing, musically) recorded music, don't you think?
 
Last edited:
Nice quote, thank you. Only, if it isn't that proverbal 'publish or perish', the author is with the 'School of Music, Theatre & Dance', Michigan. What do musicians listen for, intonation, modulation and all to the most delicate degree. Now they are going to be trained in playback degradation? In my book, sorry, that's not sensible. To not trust the playback system is the BIGGEST obstacle with enjoying (in case of musicians: analysing, musically) recorded music, don't you think?
University music schools involve in more than educating only "traditional" musicians but also in music production.
1749660181519.png
 
University music schools involve in more than educating only "traditional" musicians but also in music production.
:D oh! Why is the traditional in quotes? All musicians have a quite unique focus when listening to music. That's only logical, but often forgotten, and so some hifi folks tell, they don't care. They care, but for different things. That was my point, listening is an active activity sometimes. The 'critical listener', a less attractive role model in hifi, eagerly tries to be unsatisfied with the playback. The musician, the 'analytical listerner' would do something about it, practice more for instance. Who am I in this context? Halfway between the chairs I long for musical modulation, plus a well crafted (artsy) recording, and admittedly speakers come last.

This differentiation you can apply to mono versus stereo speaker evaluation. The test panel is not less discriminative, but it switches focus to more relevant things, which somehow invalidates the effort. Why force people into confirming the engineer's better speaker? Better find the parameters that describe quality in stereo better, something else than perfect frequency response for instance.
 
Last edited:
OK, so then it must be possible to correctly discern meaningful differences when listening to a pair of speakers, sighted, without real-time switching to another pair.

Because if this is not the case, then I have no reason to ever buy "better" speakers than the ones I currently have. I won't hear the difference (and neither will you, nor anyone else) under normal listening conditions (no rapid switching to compare other speakers).
 
Quoting Dr Toole (emphasis mine):
...
I have a peculiar perspective on the topic, because it was blind tests that revealed the audible flaws with sufficient reliability and repeatability that it was possible trace them to measurable characteristics. It is not that sighted tests are useless, it is simply that blind tests yield substantially more repeatable (i.e. statistically useful), judgements from a wide population of listeners (most people with “normal” hearing). Nuisance variables had been attenuated. When I began in 1966, loudspeaker sound quality was all over the map, nothing I encountered was neutral. A popular demonstration of stereo “soundstage” was a train running from left channel to right channel. The “hole-in-the-middle” was a popular discussion topic. Primitive stuff.
...
Comparing sighted, you'll likely, eventually, after a long time and a large number of test runs, come to the same conclusions as what blind and fast switching tests will reveal much more quickly.
 
There’s obviously some truth in that, but I also find the liabilities can be a bit exaggerated. I’ve heard many of the same loudspeakers in different rooms and they maintained the same essential sonic characteristics.
I think you are vastly underestimating the role your brain plays in these listening sessions, and vastly overestimating the role your ears play. I truly think our brains contribute more to what we “hear” than our ears.

I’m sorry to have even brought the divisive issue up again, but it’s good to have someone on my side :)
Don’t worry about it, I think that most (me included) enjoy your long winded, well worded thoughts.
 
Quoting Dr Toole (emphasis mine):

Comparing sighted, you'll likely, eventually, after a long time and a large number of test runs, come to the same conclusions as what blind and fast switching tests will reveal much more quickly.

Yes I think that's basically what's going on. The case for both the OG question (evaluation by listening to a single loudspeaker) and this multi-speaker blind comparison is based on efficiency and repeatability.

The former is easy to do. We don't (or everyone doesn't) always do the latter because the logistics are considerably more difficult.

Misinterpreting this by assuming that we can't somewhat reliably hear or remember real divergences between loudspeakers at all is nonsense, of course. If we couldn't hear differences and remember things, we wouldn't have music at all. Not that we can't also imagine not-so-real differences, so we can't say listening is always reliable. But when people say 'zero value' this is idiomatic/rhetorical, not accurate/literal. And we can understand this.
 
The people who say it do seem to believe that it's accurate/literal.

Well, I said 'we' can understand this. I didn't say 'they' could. :p
 
I’ve been following the thread for a while, and although it might be a somewhat different topic including the Circle of Confusion, I’ve often felt a similar vibe in the industry I work in. (my job is commercial photo retoucher.)
So, a while ago, I received this kind of question a few times from someone relatively new in the same industry. They said they wanted to reflect their intent in everything—from basic monitor calibration to even the slightest color shifts during post-production (very subtle distortions in saturation and hue).
Of course, that’s important. But consumers’ displays are all different, and smartphones vary as well. Their TVs are all different, and printing materials and qualities differ, too. The lighting environment in which they view the content is also different.
So, I told him (or them) this:
The people consuming your content will be captivated by the feeling of it. Even if the colors are off from what you intended, most of them won’t notice or care. When people look at magazine photos, they focus mainly on the composition and the overall tone and manner.
Even if the image is color-shifted, a good image is still good.
A bad image won’t be good just because there is no color distortion.
That’s what I told them.
Of course, I’m just a deep hobbyist when it comes to audio. However, when evaluating speakers—considering that neither creators nor consumers can fully know or control everything—I cautiously think that saying “this is how it is from the creator’s perspective” or “this is how it is from the consumer’s perspective” might just add more individual variables (or confusion) into the mix.
There may be limitations or drawbacks to evaluating in mono, but how should the standard to replace this be established?
I’m still following the thread.
 
The correlations between predicted ratings and real double-blind listening scores were very high: 0.996 for bookshelf loudspeakers of similar size and bass extension, and 0.86 for 70 loudspeakers of many sizes, bass extensions and prices.
I know I'm a bit late to this thread, but this is a misrepresentation of what happened. You didn't predict listening scores with a correlation of 0.996. Rather, you produced a model that consumed the preference scores and the measurement information, which was able to optimize a set of weights to be applied to the measurement data that could RE-CREATE the preference scores with a correlation of 0.996. It's not at all the same thing. You fit a curve. You didn't test its predictive capabilities, which would require you to optimize the weights on one set of data, and then apply the weights to a new set of data, not included when calculating the weights. As far as I know, the only time you published information about scores produced by your model on the tuning data, and later on novel data, the novel correlations were significantly less.
 
OK, so then it must be possible to correctly discern meaningful differences when listening to a pair of speakers, sighted, without real-time switching to another pair.

Because if this is not the case, then I have no reason to ever buy "better" speakers than the ones I currently have. I won't hear the difference (and neither will you, nor anyone else) under normal listening conditions (no rapid switching to compare other speakers).
There will be actual and perceived differences 100% of the time (with whatever the reference is: My speakers back home if we are somewhere else or if we are at home the speakers we heard at [take your pick]).

100% of the time your brain will perceive the differences to be meaningful - for a number of reasons.

Science, through blind testing, has established that the perceived differences are rarely meaningful.

Measurements will confirm the differences 100% of the time.
 
I’ve been following the thread for a while, and although it might be a somewhat different topic including the Circle of Confusion, I’ve often felt a similar vibe in the industry I work in. (my job is commercial photo retoucher.)
So, a while ago, I received this kind of question a few times from someone relatively new in the same industry. They said they wanted to reflect their intent in everything—from basic monitor calibration to even the slightest color shifts during post-production (very subtle distortions in saturation and hue).
Of course, that’s important. But consumers’ displays are all different, and smartphones vary as well. Their TVs are all different, and printing materials and qualities differ, too. The lighting environment in which they view the content is also different.
So, I told him (or them) this:
The people consuming your content will be captivated by the feeling of it. Even if the colors are off from what you intended, most of them won’t notice or care. When people look at magazine photos, they focus mainly on the composition and the overall tone and manner.
Even if the image is color-shifted, a good image is still good.
A bad image won’t be good just because there is no color distortion.
That’s what I told them.
Of course, I’m just a deep hobbyist when it comes to audio. However, when evaluating speakers—considering that neither creators nor consumers can fully know or control everything—I cautiously think that saying “this is how it is from the creator’s perspective” or “this is how it is from the consumer’s perspective” might just add more individual variables (or confusion) into the mix.
There may be limitations or drawbacks to evaluating in mono, but how should the standard to replace this be established?
I’m still following the thread.
The research and literature on audio perception I have run across is pretty clear that audio perception is much more complex than vision/color perception. A standard for photo/video/movie editing monitors is much easier to come up with and apply then a standard for sound in movie theaters where the movie will be shown. There are almost no industry standards since late 40s. They are set by that industries’s production companies/studios (THX) or the movie sound recording/mixing software vendors (Dolby).
 
The research and literature on audio perception I have run across is pretty clear that audio perception is much more complex than vision/color perception. A standard for photo/video/movie editing monitors is much easier to come up with and apply then a standard for sound in movie theaters where the movie will be shown. There are almost no industry standards since late 40s. They are set by that industries’s production companies/studios (THX) or the movie sound recording/mixing software vendors (Dolby).
Yes. It’s a relatively less complex situation, but still not easy.
This is also a somewhat different topic, but as I became interested in binaural audio, I thought it would be great to have clearer sample examples for sound (although I actually gave up trying to create such examples). For instance, specific wall reflections, their intensity, timing, and so on.
It’s kind of like a product webpage. When we buy clothes or a refrigerator, we can immediately understand how they work.

So, while personalization is a big challenge due to the vast amount of data and the complexity of processing and modeling it, the most intuitive approach would be to allow users to directly set the layout of mono, stereo, or multichannel just like on a product webpage. They could also have options for the shape and size of the room, the reflectivity of each wall, each angle-toe inout, and even the listening distance and height, so they can listen for themselves. This would reduce the disconnect that ordinary people often feel when they listen to a hi-fi demo room and then listen in their own rooms. (Of course, the their in-room problem factor itself doesn’t change....)

# But... Who could do this? :facepalm:
 
First of all thanks very much for your comment. And “ long winded” will certainly follow me to my gravestone :)

I’m just going to reply to this, and if Rick finds it too off topic he can end up deleting it.

Yes, this is a particularly long one, and I certainly understand anybody taking a FRAT response. But I feel that laying out my case requires a number of examples, so I’m just putting it out there.. as a summation of my position on this..

I think you are vastly underestimating the role your brain plays in these listening sessions, and vastly overestimating the role your ears play. I truly think our brains contribute more to what we “hear” than our ears.

I don’t think I’m working with the level of naïveté you are inferring.

I have propounded as much as anybody here the nature and relevance of bias effects. Bias effects in audio are well demonstrated by the type of research cited here all the time. I’ve experienced my own bias effects vanishing under blind testing . (In fact my own work in sound involves exploiting bias effects).

So the conclusions I’m drawing are not in the context of ignoring bias effects, including my own, but they are trying to keep a coherent picture of what type of reasonable inferences we can still make in informal listening while considering the possibility of bias effects. Because most of the time we are operating in conditions where we cannot control for bias effects, we therefore have to arrive at practical, even if not fully certain conclusions.


So in my view, the following is reasonable:

1. If you are a scientist looking for scientific levels of certainty, then you may as well disregard any uncontrolled listening. You can even just throw it in the “bias” bin because it simply doesn’t offer the reliability for the confidence levels you are seeking in order to understand what’s going on.

2. An ASR member can simply disregard any uncontrolled listening reports from reviewers or other audiophiles and when choosing gear can even disregard his own impressions as untrustworthy. “ I’m looking for the most reliable information on which to make my decisions - uncontrolled listening won’t cut it, and I’m looking towards measurements or at the very least controlled listening tests” is a perfectly reasonable approach for somebody who just looks to measurements.

So “Listening in uncontrolled conditions, especially without supporting measurements, is unsuitable for gaining the level of confidence I’m seeking” is entirely reasonable.

What goes too far is the idea that listening in uncontrolled settings is always uninformative and nobody can be justified in making any inferences under such a situations.

THAT is what I push back against. And this level of scepticism - it’s probably just your imagination so I don’t have to take what you’re saying seriously - is often enough thrown around here when people don’t feel like accepting a claim, even for the sake of argument .

We can and do come to reasonable conclusions with lower than scientific confidence levels all the time. Otherwise, we couldn’t get through the day. I’m bringing mini instances of my own experience and asking what explanations make the most sense of those impressions.
And unfortunately, I find that sometimes the default move to “ it’s likely just sighted bias effects / imagination” starts to look fairly hand wavy when it gets to brass tacks.

So when it comes to putting any stock in my or some other audiophile’s informal, listening impressions I work on basic heuristics;

Extraordinary claims require extraordinary evidence. Is the impression of the gear implausible? Does it suggest something technically implausible? Or does it go against any known measurements of that gear?

And:

What is the best or most reasonable hypothesis or explanation for a given subjective impression?

So sticking just to evaluating loudspeakers:

If the hypothesis about somebody’s subjective impression is that it is a “ bias effect” then that suggests “ the loudspeaker doesn’t really sound like you think it does; it sounds different than what you perceive.

Well, if that’s going to be one hypothesis, its explanatory power should be put against another possible hypothesis “The sonic impressions are to some relevant degree accurate; because that really is what the speakers sound like!”

So let’s see these different hypotheses in action:

I auditioned the Revel Performa speakers a few times. I went in knowing how Revel speakers generally measure. So that could have biased my perception. Except the first audition didn’t go well - I was surprised by the substandard performance in which the speakers didn’t sound smooth, but somewhat rough in the highs and uneven in the bass. But I quickly realized this was due to a poor set up - as the speakers were placed too close to the back wall and one was near a large reflective wall of glass. The poor set up seemed to overpower any bias I might’ve had that the speakers would sound better.

However, in a different store the speakers were set up much better.

And I evaluated them in my normal way: listening to some of the same tracks I’ve used on countless different loudspeakers, and I evaluate the sound from further seating distances, middle and nearer field distances to see how the sound holds up - not every loudspeaker sounds coherent up close. I evaluate the sound in the vertical domain from kneeling down, sitting, standing to see if there’s any “venetian blind” effect or obvious changes, roll off in the highs or whatever. I walk around the speaker, listening from different angles to check off axis performance, both in terms of changes in tonality and changes in imaging - does the sound glom in to one speaker we’re moving off axis, or maintain a sense of spaciousness and imaging between the speakers? Etc.

What I perceived from the Revels was a beautifully balanced sound. Well controlled and even sounding from top to bottom. No obvious colorations. Even if there were room nodes they were not intrusive. Smooth off axis performance. Very neutral while being smooth and easy to listen to.

And all of this is predicted by and consonant with the way those speakers measure.

So what’s the best explanation for my impression? Was it just that I was biased to hear them that way and they didn’t really sound that way? And if they didn’t sound the way I perceive them how did they actually sound? What objective evidence disputes my impressions? It seems to be objective evidence in terms of measurements actually support my impressions.

It seems to me at least as reasonable, Occam's razor style , that the reason I had those impressions is because that’s how the speakers actually sounded. Very much as their measurements help predict.

The same could be said for the number of times I listened to the B&W 804 D4 (and D3) loudspeakers.

What I heard was a very open, very detailed spacious sound, with generally well controlled and not over-rich bass, quite “ free of the box” sounding from top to bottom, but also a lack of warmth in the midrange and some peakiness in the upper mids and treble region. It clearly wasn’t neutral, but instead that modern sculpted B&W sound. This was especially obvious when I heard them the same day that I heard the same tracks earlier on the more neutral Kii Audio Three speakers.

So what’s the best explanation for my impressions? Is it better explained as a bias effect because I do know how B&W tends to measure? Could be.

On the other hand… the measurements DO generally describe how the speaker will sound, and also can comport with my listening impressions.

So is the best explanation for why I heard the B&Ws to be less neutral and more peaky sounding in the upper frequencies than the Kiis a bias effect or… that I was actually hearing what they sound like, which is predicted by the measurements. Put that together with the fact that John Atkinson also report reported the same impressions about their lack of neutrality.

Again, it seems to me entirely reasonable to provisionally conclude I was perceiving the essential characteristics of that loudspeaker.

But then there’s the many other examples of where I listened to speakers before I was aware of any measurements.

I auditioned the Paradigm Personas when they were brand new in a local high-end store.

I found plenty to admire in terms of their amazing clarity, and they seemed generally very well, balanced, similar to the Revel.

With the exception that I kept noticing a sharp peak somewhere in the highs that over time was wearing me out. It didn’t seem to be showing up in vocal sibilance so much as being somewhere maybe higher up. At the end I found I want to keep turning the volume down and ultimately found my ears fatiguing, so I gave up on those speakers.

And later Kal reviewed those speakers in stereophile. His description was almost word for word in terms of the qualities I heard in those speakers INCLUDING his noting a peak in the highs that he guessed would be around 10 K. And sure enough in the measurements there it was. A sharp peak right at 10 K!

So what’s the best hypothesis for both my and Kal’s sonic impressions both made before seeing the measurements?

We just happen to have the same bias that produced precisely the same sonic impressions including a peak in the highs… and all the sonic impressions just happening to lineup with the measurements?

Seems to me something like Occam's razor allows the practical inference that we were simply actually hearing what that speaker really sounds like.

That happened again with the PMC Fact 8 speakers. My friend had those in for review and I listened to them. I was impressed by their open spacious sound, which sounded very clean and detailed.

But I was largely turned off overall because they sounded, to use that old term “ too hi-fi” - in the sense of exaggerating the artifice in recordings in the highs and not really sounding as natural as I like. In particular they sounded very “cool” and reductive, lacking warmth somewhere maybe in the lower mids or upper bass, I didn’t know, but it lacked body and warmth for male vocals and anything that usually had more body that I’m used to other speakers like my own.

After that I saw Kal’s review of the same speakers in stereophile and… again… his descriptions matched what I heard all the way down to him mentioning the same lack of warmth “ in the upper and mid bass.”

And there it was the Stereophile measurements! A bit of a roller coaster in the on axis frequency and JA’s comments only in room response he measured “….the PMC fact.8s' in-room response is shelved down in the lower midrange and bass and has significantly less presence-region energy.“

Again… best explanation for why Kal and I perceived the same characteristics and flaws in the speakers that turned out to be continent with the measurements?

Seems reasonable that our perception was relatively accurate to what the speaker actually sounds like.

I can keep going with all sorts of examples. The YG floor standing speakers that my friend reviewed, where he asked me over to listen without telling me what he thought.

These were sizeable floor standing speakers and I was expecting on my test tracks to hear something probably down to 35 Hz or so a similar presentation that I’ve heard from similar size speakers like my own. And yet I was shocked at the lack of bass, and also the consequent sense of emphasis in the upper mids and higher frequencies. These matched my friends own impressions exactly which is why he was having so much trouble with the speaker.

And when they were measured sure enough, the bass was very underdamped - they actually start sloping down around 200 Hz and fall down fairly steeply after 50hz.

And this was ameliorated by placing them near the corners.

Again… just a bias effect, or just some form of coincidence that my friend and I perceived the same characteristics, which were surprising to both of us given the size of the speaker, but which happened to be consonant with the measurements?

Seems entirely reasonable we simply heard what the speaker really sounded like in that room.

What about my own Joseph Perspective speakers? I didn’t even know those speakers existed let alone having read reviews before I heard them for the first time at the dealers. And my impressions have remained the same from that first time all the way to my owning them for years now. And they are consistent with what JA heard and measured in Stereophile (with the exception that the highs did not bother me in the first version of the model as much as they did JA, even though I could hear that they were tipped up).

Once I had my Joseph speakers dialled in at home and I had experimented with some acoustic diffusors I was blown away by the complete disappearing act of the speakers, the massive, enveloping sound stage (on appropriate record recordings), the richness yet tight quality of the bass, the beautiful clarity of the mids and the incredible smoothness and airiness of the highs. And especially with the diffuser, an amazing solidity and palpability to the images appearing in the vast sound stage.

Once I had that set up, I invited my reviewer pal over and just sat him down to listen and give me his own impressions before I told him my own. We do this kind of thing when we have new speakers, double checking our impressions with what the other guy hears.

He was completely shocked and basically said “ how the fuck did you do this?” When I asked him to describe the sound he described what I hear to a “T” - his first comments were the crazy sound staging and imaging, he noted that on tracks with the lowest bass the bass was a little bit rich but that he loved it anyway, because it was still really tight and rhythmic. And he commented on the general clarity and especially on the incredibly relaxed quality of the highs “ I can just keep turning up the sound and it feels like I could just listen to this all night without any fatigue… the highs are buttery smooth.”

Exactly what I perceived in the same recordings. Just coincidence? My friend listens to plenty of speakers that sound great, is there some reason that these speakers cause a particular bias effect where we both perceive the sound characteristics the same way?

And his stereophile review of the Perspective 2s JA noted wide full range sweep of sound as well as the clarity and smoothness in the highs even during complex passages, and also noted a slightly rich and yet still tight and punchy bass quality.

None of this seems disputed in the nature of the Stereophile measurements.

So are we all suffering the same bias effect?

It seems at least as reasonable to conclude, provisionally, that we are largely perceiving the actual characteristics of this loudspeaker.

So those are just a few of many different examples along the same lines, where I and other listeners impressions seem to converge fairly well on the general character of loudspeaker, and when measurements are available very often the impressions are consonant with the measurements, at least certain characteristics.

I’m not talking all the time perfection here. But it seems to happen often enough. So from my own experience, I conclude that even in the context of its known liabilities, informal listening CAN provide some useful information where the impressions are technically plausible. It can be at least reasonable to draw some conclusions, with lower confidence levels and caveats, even if such conditions won’t suffice for scientific levels of confirmation and insight. Sighted listening to loudspeakers is significantly less reliable than blind listening, but not necessarily wholly unreliable or wholly uninformative.

Cheers.

(I should print this on a scroll and be buried in this one…)
 
Last edited:
I know I'm a bit late to this thread, but this is a misrepresentation of what happened. You didn't predict listening scores with a correlation of 0.996. Rather, you produced a model that consumed the preference scores and the measurement information, which was able to optimize a set of weights to be applied to the measurement data that could RE-CREATE the preference scores with a correlation of 0.996. It's not at all the same thing. You fit a curve. You didn't test its predictive capabilities, which would require you to optimize the weights on one set of data, and then apply the weights to a new set of data, not included when calculating the weights. As far as I know, the only time you published information about scores produced by your model on the tuning data, and later on novel data, the novel correlations were significantly less.
Are you familiar with the concept of i.i.d. in statistics? Your statement is very likely true for any statistical model when you use it on samples outside the population/distribution used to build the model.
 
Back
Top Bottom