Sean Olive on Predicting Loudspeaker Sound Quality and Listener Preference

richard12511 · Sep 24, 2021

PierreV said:
While Dr Olive makes good points about the speakers being different and the controls not being equivalent between studies, a .99 correlation on a sample size of 13, with 5 parameters screams overfitting.

Overfitting Regression Models: Problems, Detection, and Avoidance

Overfitting regression models produces misleading coefficients, R-squared, and p-values. Learn how to detect and avoid overfit models.

statisticsbyjim.com

Which is why I was asking @preload originally about it. I wanted to know how much the smaller sample size affected things. He's good with stats, but he didn't really believe me about the .99 correlation thing(understandably so). I'm especially interested in how well the model can predict monopole speaker comparisons where bass extension is of little importance(ie using subs), and I know he's really good at that kinda stuff.

13 samples for 4 variables does seem quite small, but it's important to point out that it's not 13 comparisons, but 13 speakers that are all compared against each other many times with many different combinations.

It may be that the sample size of the first study doesn't need to be as large, since it's much more controlled, both in terms of bass extension and the type of speaker(only monopole(except 1?)). The larger study does bring in a larger sample size, but it also introduces a couple extra really important variables, namely widely different bass extensions and speakers of completely different types(panels, open baffles dipoles, etc.)

While the sample size of the larger study is nice, I have a hunch that it's a mistake to try to come up with a single model to predict preferences for all types of speakers. My hunch is that trying to do so is lowering the correlation, and it might be easier and/or better to try separate models for separate speaker types.

amirm · Sep 24, 2021

PierreV said:
While Dr Olive makes good points about the speakers being different and the controls not being equivalent between studies, a .99 correlation on a sample size of 13, with 5 parameters screams overfitting.

They are well aware of this and performed a Mallow's Cp analysis. Here are the results:

It is less than the independent variables so probability of overfit is low.

preload · Sep 24, 2021

richard12511 said:
Which is why I was asking @preload originally about it. I wanted to know how much the smaller sample size affected things. He's good with stats, but he didn't really believe me about the .99 correlation thing(understandably so). I'm especially interested in how well the model can predict monopole speaker comparisons where bass extension is of little importance(ie using subs), and I know he's really good at that kinda stuff.

13 samples for 4 variables does seem quite small, but it's important to point out that it's not 13 comparisons, but 13 speakers that are all compared against each other many times with many different combinations.

To be crystal clear, I have no doubt that Olive was able to generate a regression line with r=0.99 to fit the 13 speakers selected to be part of the preliminary sample. It was an elegant "proof of concept."

However, the issue is with generalizability.

If you're using the regression formula to predict listener preferences based on analysis of loudspeaker measurements, and said loudspeakers belong to that sample of 13 models (or ones with essentially identical characteristics), then the formula would have an extremely high predictive value.

However, what if you're trying to predict listener preferences for speakers that are not part of that original sample of 13? Well, then we could then look at how predictive measurements could be when Olive expanded his pool to 70 loudspeakers. This time, r^2 came out to 74%. Not nearly as good.

And so the NEXT question is what if you're trying to use measurements to predict preference scores for a speaker that wasn't part of that sample of 70? Well, then it follows that the predictive value should drop below 74%, and that's assuming you still used computerized analysis on that speaker's measurements.

And so the question after that is what if you decided that you would eyeball those measurements instead of running a computerized analysis? Well then you're way below 74% now. Which is why when people provide their "eyeballed" interpretations of spin charts (or worse, non-spin FR charts), and declare with great confidence how that loudspeaker will sound, I call rubbish. And anyone who thinks they can reliably eyeball a spin chart for a speaker outside that sample of 70 and do better than Olive's formula should call up Dr. Olive right now and tell him he needn't bother with his regression formula nonsense.

HooStat · Sep 24, 2021

I agree with @preload. The correlation is a measure of internal consistency. It doesn't directly address a model's ability to predict samples it hasn't seen.

In other posts on this forum, I have listed other limitations of the analyses. The score is a tool -- it is a descriptive model. It describes how the inputs (measurements) can be used to explain the outputs (preference scores). And that is extremely valuable because it gives us an idea of the things that make speakers "better" (as defined by subjective impressions). Just because the model has limitations doesn't mean it isn't impressive or useful. But it also shouldn't be used as a gold standard. It is a descriptive model, and not a predictive one.

HooStat · Sep 24, 2021

amirm said:
They are well aware of this and performed a Mallow's Cp analysis. Here are the results:

View attachment 155302

It is less than the independent variables so probability of overfit is low.

Mallows cp requires large sample sizes, so it doesn't hold much water here.

aac · Sep 24, 2021

ferrellms said:
Yes, why is listener preference more important than accuracy? How many of these listeners have actually heard an accurate system?

Because "accuracy" does not sell?
You have to get through years of training potential customers that data doesn't matter, all you need is to listen, etc.
Finding it really funny that Harman doesn't provide measurements for their products (at least for the most of them) with all that talk about science and research.
The only company I've seen providing data consistently is Neumann.

Sancus · Sep 24, 2021

Which is all fine, cause the score has been hugely useful as it is. Go to the speaker index and try to narrow down to 4 or 5 loudspeakers to try in a given price range, avoiding any obvious junk. Easily done, right?

Now try to do the same with the headphone index. So much fun, right? Now imagine both of these indexes have 1000 products in them after years. Good luck with the headphone one

thewas · Sep 24, 2021

richard12511 said:
For the quote I'm thinking of, he basically lists wider dispersion as either the second or third(memory is leaning more towards second) most important factor for predicting listener preference. He also uses that as the reason why he predicted the the Salon2 to beat the M2 prior to the shootout.

Which shoutout was (imho unfortunately) done in mono, would be interesting to see what had happened in stereo where the sparse data like from Toole's 1986 paper show that preference change drastically.
Would hope more research would be done in that direction.

Cadguy · Sep 24, 2021

richard12511 said:
I wish I could find where it was on AVS forums where Dr. Toole also talks about the importance of wider dispersion. It's somewhere in one of those 2-3 hundred page forum threads on AVS. I read through the M2 vs Salon2 shootout thread entirely again(looking for it specifically), and it wasn't in there.

For the quote I'm thinking of, he basically lists wider dispersion as either the second or third(memory is leaning more towards second) most important factor for predicting listener preference. He also uses that as the reason why he predicted the the Salon2 to beat the M2 prior to the shootout.

That said, if there is an optimal(meaning most preferred on average) dispersion width, I have a hunch that it's going to depend on the number of channels. I'm thinking optimal dispersion width will be much wider in mono than it will be for 11 channels. If that's true, you almost have to test with the configuration you're trying to predict for, at least with speakers that excellent otherwise. That's problematic though, given how much less discriminating we are as the number of channels increases. I guess you'd just need a much larger sample size? Otherwise, you risk getting an answer that doesn't match for the configuration you're trying to predict. Does that make sense?: Three speakers - A, B, C - that are equally flat on-axis, equally smooth off(lack of resonances), but with different dispersion widths. Speaker A (160°)wins the mono test, B(110°) wins the stereo test, and C(80°) wins the 11 channel test.

Last paragraph is pure speculation on my part, but I'd love to be proven wrong or right with further research. First hard part is finding speakers that have equally flat direct sound and lack of resonances, but very different dispersion widths. Perhaps something like the Beolab90? Second hard part is dealing with the fact that we get much less accurate with our ratings as the number of channels increase. Maybe with a large enough sample size, the wisdom of crowds might prevail?

I think Dr Toole's comment on the importance of wide dispersion is in the thread " How to chose a loudspeaker - what the science shows".
I'm not sure where in that 286 page thread it occurs.

https://www.avsforum.com/threads/how-to-choose-a-loudspeaker-what-the-science-shows.3038828/page-4

youngho · Sep 24, 2021

napilopez said:
Because there is no such thing as true 'accuracy' in traditional stereo speakers. You just can't fully recreate a 3D soundfield with two speakers, and unless your setup is identica to the recording engineer's, you won't replicate that sound either. The best we can do is a convincing illusion. And in listening tests, the speakers that are the most preferred are the ones that create the greatest impression of realism and accuracy.

The D&D's designer has explicitly said the 8C was inspired by the ideas in Dr Toole's book. Genelec and Kii speakers follow the same principles too. In fact, all the best studio monitors are doing precisely what is suggested by the Harman Listening tests: flattish on-axis and smoothish directivity. The principles for perceived accuracy are mostly the same at home and in the studio. Harman also has a whole pro line of studio speakers which are not at all intended for the home market. So I'm not sure your reasoning holds water.

Toole discusses "relatively constant, or at least smoothly changing" directivity, which suggest two slightly different goals to me, and different Harman designs seem to illustrate different approaches. For the relatively constant designs, like the large Revel speakers or JBL M2, there is a rather stair-step approach with relatively constant directivity from several hundred Hz to pretty close to 1 kHz or so, then another relative DI plateau until nearly 10 kHz, then rising beyond that. The D&D design extends the first plateau lower and the second plateau higher.

From what I can tell, Genelec (and Kef) speakers, on the other hand, seem to have a DI curve that is more like a nearly straight angled diagonal line with directivity steadily increasing with each octave. I don't have any proof, but I suspect that different personal preferences might align with one approach over another, but the audible effects are likely to depend heavily on set-up and listening environments. In any case, I wonder if this should add a level of complexity to the "narrow vs wide dispersion" discussion, e.g. "narrow dispersion with steadily but smoothly rising directivity" vs "narrow dispersion with relatively constant directivity."

napilopez · Sep 24, 2021

richard12511 said:
I wish I could find where it was on AVS forums where Dr. Toole also talks about the importance of wider dispersion. It's somewhere in one of those 2-3 hundred page forum threads on AVS. I read through the M2 vs Salon2 shootout thread entirely again(looking for it specifically), and it wasn't in there.

For the quote I'm thinking of, he basically lists wider dispersion as either the second or third(memory is leaning more towards second) most important factor for predicting listener preference. He also uses that as the reason why he predicted the the Salon2 to beat the M2 prior to the shootout.

That said, if there is an optimal(meaning most preferred on average) dispersion width, I have a hunch that it's going to depend on the number of channels. I'm thinking optimal dispersion width will be much wider in mono than it will be for 11 channels. If that's true, you almost have to test with the configuration you're trying to predict for, at least with speakers that excellent otherwise. That's problematic though, given how much less discriminating we are as the number of channels increases. I guess you'd just need a much larger sample size? Otherwise, you risk getting an answer that doesn't match for the configuration you're trying to predict. Does that make sense?: Three speakers - A, B, C - that are equally flat on-axis, equally smooth off(lack of resonances), but with different dispersion widths. Speaker A (160°)wins the mono test, B(110°) wins the stereo test, and C(80°) wins the 11 channel test.

Last paragraph is pure speculation on my part, but I'd love to be proven wrong or right with further research. First hard part is finding speakers that have equally flat direct sound and lack of resonances, but very different dispersion widths. Perhaps something like the Beolab90? Second hard part is dealing with the fact that we get much less accurate with our ratings as the number of channels increase. Maybe with a large enough sample size, the wisdom of crowds might prevail?

Hahah yes I believe I read the same post(s) at some point too. It makes sense. At some point in his book Dr. Toole talks about increased apparent source width generally being perceived as a good thing, and not just with speakers -- this happens in large acoustic venues like a symphony hall too.

youngho said:
Toole discusses "relatively constant, or at least smoothly changing" directivity, which suggest two slightly different goals to me, and different Harman designs seem to illustrate different approaches. For the relatively constant designs, like the large Revel speakers or JBL M2, there is a rather stair-step approach with relatively constant directivity from several hundred Hz to pretty close to 1 kHz or so, then another relative DI plateau until nearly 10 kHz, then rising beyond that. The D&D design extends the first plateau lower and the second plateau higher.

From what I can tell, Genelec (and Kef) speakers, on the other hand, seem to have a DI curve that is more like a nearly straight angled diagonal line with directivity steadily increasing with each octave. I don't have any proof, but I suspect that different personal preferences might align with one approach over another, but the audible effects are likely to depend heavily on set-up and listening environments. In any case, I wonder if this should add a level of complexity to the "narrow vs wide dispersion" discussion, e.g. "narrow dispersion with steadily but smoothly rising directivity" vs "narrow dispersion with relatively constant directivity."

Good observations, that's something I've noticed as well. I've mentioned in the past that perhaps my preferences are not so much for wide directivity but for constant directivity -- and it just so happens that wider speakers also trend towards constant directivity during the soundstage-critical 1-10khzish region. It would explain why the D&D 8C is arguably the only 'narrow' speaker I've ever truly loved. I personally do not tend to get along quite as well with the KEF/Genelec type of directivity. Not that I don't like them, but having heard the spatial presentation of other speakers, I just tend to prefer others.

That said I do still think I personally lean towards wider, but yes I do agree that it will depend on the setup and person.

phoenixdogfan · Sep 24, 2021

hardisj said:
For those who might be interested, @Sean Olive was gracious enough to join me for an informal live stream last night where he discussed his research into loudspeaker preference prediction. Some real good nuggets of info in here. I also gave an ASR shoutout and specifically to @pierre for his work in compiling the data and generating scores from mine and Amir’s data (specifically time stamp 48:20).

I gotta say, this was my favorite chat so far. I can’t thank Sean enough for coming on and giving some more context and also taking time to answer viewer questions at the end. A couple key ones:
@1:03:10 - someone asked specifically about directivity wrt wide vs narrow dispersion. I immediately thought of @napilopez and his opinion on this.
@1:05:50 - significance of an Olive score of 5.0 and 6.0

If you want them, you can find the link to his slides in my video description on YouTube.

Also, here is a link to Pierre’s site for those who may not be aware of it:

A collection of loudspeakers measurements

I don't see this interview embedded on your website. I really think it should be included if You Tube allows.

hardisj · Sep 24, 2021

Something Sean brought up (and I’m glad he did) is the limiting of active speakers and how it alters response at different outputs. I took the opportunity to bring up response variation of speakers with volume, in general (powered and passive speakers). This is something I am very adamant about testing because I have seen the response deviate a fair bit (1-2dB is common) between lower SPL and higher SPL. We know how to read a response curve. We know that lesser bass or altered response yields different takeaways from the objective evaluation. This is why I test for this, sweeping on-axis at 76,86,96 and 102dB. From this we can see how the response deviates with output and better determine the speaker’s dynamic range. SoundStage does something similar but I think they test at 76 and 86dB. Usually, I find the differences to be more noticeable when you get to the 96dB region and definitely so when you get to 102dB.

Sean brings it up at 1:09:41 which I’ve time stamped below.

pierre · Sep 24, 2021

@hardisj thanks for the callout.

This “scores” has many limitations but it has been pretty good at discriminating speakers. I have yet to see a better proposition.

After that it is only one parameter that describe a speaker.

Duke · Sep 24, 2021

richard12511 said:
That said, if there is an optimal(meaning most preferred on average) dispersion width, I have a hunch that it's going to depend on the number of channels. I'm thinking optimal dispersion width will be much wider in mono than it will be for 11 channels. If that's true, you almost have to test with the configuration you're trying to predict for, at least with speakers that excellent otherwise. That's problematic though, given how much less discriminating we are as the number of channels increases. I guess you'd just need a much larger sample size? Otherwise, you risk getting an answer that doesn't match for the configuration you're trying to predict. Does that make sense?: Three speakers - A, B, C - that are equally flat on-axis, equally smooth off(lack of resonances), but with different dispersion widths. Speaker A (160°)wins the mono test, B(110°) wins the stereo test, and C(80°) wins the 11 channel test.

I agree with the trend you describe, but don't have a solid explanation for it. I think the ear tends to like a lot of reflected energy (assuming it's spectrally correct), but not too much.

richard12511 said:
Last paragraph is pure speculation on my part, but I'd love to be proven wrong or right with further research. First hard part is finding speakers that have equally flat direct sound and lack of resonances, but very different dispersion widths.

For a while I manufactured monopolar and bipolar versions of essentially the same (controlled-pattern) speaker, using the same drivers and same crossover (with component values scaled for impedance), but with very different enclosures. So they had the same direct sound but very different amounts of reflected sound. I don't recall anyone who heard both preferring the monopolar version.

youngho said:
From what I can tell, Genelec (and Kef) speakers, on the other hand, seem to have a DI curve that is more like a nearly straight angled diagonal line with directivity steadily increasing with each octave... I wonder if this should add a level of complexity to the "narrow vs wide dispersion" discussion, e.g. "narrow dispersion with steadily but smoothly rising directivity" vs "narrow dispersion with relatively constant directivity."

One possible theoretical advantage of the latter is this: As those reflections which originate from the off-axis energy fade away, they probably remain recognizable as such for longer. On the other hand reflections which start out with a downward-sloping spectral balance will cease to be "signal" and become "noise" as the upper harmonics fade into inaudibility first. Ime one place where this difference shows up is in spatial quality, where the speaker which more closely approaches "constant directivity" does a better job of creating a credible "you are there" presentation (as opposed to the more common "they are here" presentation), and I think this is because the reverberant tails which are on the recording are better preserved by the more spectrally-correct reflections.

napilopez said:
Good observations, that's something I've noticed as well. I've mentioned in the past that perhaps my preferences are not so much for wide directivity but for constant directivity -- and it just so happens that wider speakers also trend towards constant directivity during the soundstage-critical 1-10khzish region. It would explain why the D&D 8C is arguably the only 'narrow' speaker I've ever truly loved. I personally do not tend to get along quite as well with the KEF/Genelec type of directivity. Not that I don't like them, but having heard the spatial presentation of other speakers, I just tend to prefer others. [emphasis Duke's]

See my "possible theoretical advantage" paragraph above for speculation on why the spatial presentation of constant-directivity types is superior.

napilopez said:
At some point in his book Dr. Toole talks about increased apparent source width generally being perceived as a good thing, and not just with speakers -- this happens in large acoustic venues like a symphony hall too.

The first ipsilateral reflections in a normal home audio listening room arrive early enough that they can degrade image precision and depth, so they are arguably a two-edged sword, BUT the research seem to say that the benefits of image broadening and increased spaciousness generally outweigh the downsides.

My impression is that the ear likes a fairly well-energized reverberant field (and yes I know this term is not correct for small rooms). Not too much reverberant energy, but also not too little (see the first richard12511 quote at the top of this post). A wider-pattern speaker will result in a higher reverberant-to-direct sound ratio, and presumably a more generally-preferred one, at least up to a point. So in addition to the increase in ASW, a good wider pattern speaker may be delivering a preferred reverberant-to-direct balance.

MattHooper · Sep 24, 2021

napilopez said:
Hahah yes I believe I read the same post(s) at some point too. It makes sense. At some point in his book Dr. Toole talks about increased apparent source width generally being perceived as a good thing, and not just with speakers -- this happens in large acoustic venues like a symphony hall too.

Good observations, that's something I've noticed as well. I've mentioned in the past that perhaps my preferences are not so much for wide directivity but for constant directivity -- and it just so happens that wider speakers also trend towards constant directivity during the soundstage-critical 1-10khzish region. It would explain why the D&D 8C is arguably the only 'narrow' speaker I've ever truly loved. I personally do not tend to get along quite as well with the KEF/Genelec type of directivity. Not that I don't like them, but having heard the spatial presentation of other speakers, I just tend to prefer others.

That said I do still think I personally lean towards wider, but yes I do agree that it will depend on the setup and person.

Lots of great input in to this thread!

napilopez, you've probably mentioned them before, but what are some of the "other speakers" that you find you like?

youngho · Sep 24, 2021

Duke said:
One possible theoretical advantage of the latter is this: As those reflections which originate from the off-axis energy fade away, they probably remain recognizable as such for longer. On the other hand reflections which start out with a downward-sloping spectral balance will cease to be "signal" and become "noise" as the upper harmonics fade into inaudibility first. Ime one place where this difference shows up is in spatial quality, where the speaker which more closely approaches "constant directivity" does a better job of creating a credible "you are there" presentation (as opposed to the more common "they are here" presentation), and I think this is because the reverberant tails which are on the recording are better preserved by the more spectrally-correct reflections.
...
The first ipsilateral reflections in a normal home audio listening room arrive early enough that they can degrade image precision and depth, so they are arguably a two-edged sword, BUT the research seem to say that the benefits of image broadening and increased spaciousness generally outweigh the downsides.
...
My impression is that the ear likes a fairly well-energized reverberant field (and yes I know this term is not correct for small rooms). Not too much reverberant energy, but also not too little (see the first richard12511 quote at the top of this post). A wider-pattern speaker will result in a higher reverberant-to-direct sound ratio, and presumably a more generally-preferred one, at least up to a point. So in addition to the increase in ASW, a good wider pattern speaker may be delivering a preferred reverberant-to-direct balance.

Yes, I had previously commented on my vague theory that audio reproduction should start with identification of a listener's audio preferences here with #4 being bass extension. On further reflection, it seems likely to me that proximity in the 2019 paper likely correlates highly with clarity in the 2016 one, also that #1 should have been divided into "Reverberation" and "Width and envelopment," as in the 2019 paper. Probably the easiest way to test the directivity shape hypothesis would be with simulated environments like in the 2019 paper where the identical direct signal could be paired with different simulated directivity curves.

To complicate things further, a listener's musical preferences might add another dimension entirely. See Toole's comments at https://gearspace.com/board/studio-building-acoustics/1329749-how-much-diffusion-small-room-3.html

Ultimately, perhaps we should consider having multiple setups--"horses for courses" and all that, haha.

Young-Ho

MaxRockbin · Sep 24, 2021

amirm said:
They are well aware of this and performed a Mallow's Cp analysis. Here are the results:

View attachment 155302

It is less than the independent variables so probability of overfit is low.

I'm no stat guru, so I will probably regret butting in,
but... at least in the slides it seems like Dr Olive made a quick gloss over the covariance of his variables. Wouldn't you expect LFX and LFQ to have significant correlation, for example? Just from my own experience, a little covariance can lead to some whacky results. And, as he pointed out himself, if your variables aren't orthogonal, it really messes things up. Also, how much does sample size impact the usefulness of Cp? Wikipedia: "Limitations
The Cp criterion suffers from two main limitations[5]

the Cp approximation is only valid for large sample size..."

Intuitively, that makes sense, I think, because a very simple model might fit a small sample set perfectly just by chance.

HooStat · Sep 24, 2021

MaxRockbin said:
I'm no stat guru, so I will probably regret butting in,
but... at least in the slides it seems like Dr Olive made a quick gloss over the covariance of his variables. Wouldn't you expect LFX and LFQ to have significant correlation, for example? Just from my own experience, a little covariance can lead to some whacky results. And, as he pointed out himself, if your variables aren't orthogonal, it really messes things up. Also, how much does sample size impact the usefulness of Cp? Wikipedia: "Limitations
The Cp criterion suffers from two main limitations[5]

the Cp approximation is only valid for large sample size..."

Intuitively, that makes sense, I think, because a very simple model might fit a small sample set perfectly just by chance.

Correlated variables are usually fine. Nothing is truly orthogonal (independent). When there is a problem with correlated covariates ("independent variables"), the variances of the coefficient estimates are what tend to get inflated. One would typically look at variance inflation factors to determine whether that was the case. But the coefficients themselves tend to be unaffected.

But in a situation with 13 data points, it is all about testing whether your view of the world (i.e., the variables that you think are meaningful) holds any water. It is a pilot study, used to justify a larger study. The larger study was also successful.

What is interesting is that we have detailed data, and estimated scores, on over 100 speakers. What would be useful are some "preference scores" to go with all of these objective measurements.

pozz · Sep 24, 2021

Duke said:
One possible theoretical advantage of the latter is this: As those reflections which originate from the off-axis energy fade away, they probably remain recognizable as such for longer. On the other hand reflections which start out with a downward-sloping spectral balance will cease to be "signal" and become "noise" as the upper harmonics fade into inaudibility first. Ime one place where this difference shows up is in spatial quality, where the speaker which more closely approaches "constant directivity" does a better job of creating a credible "you are there" presentation (as opposed to the more common "they are here" presentation), and I think this is because the reverberant tails which are on the recording are better preserved by the more spectrally-correct reflections.

This is pretty much how spatial upmixing works by calculating, separating and sending correlated and uncorrelated sound to different channels. I'd say your theory is right on.

Sean Olive on Predicting Loudspeaker Sound Quality and Listener Preference

Major Contributor

Founder/Admin

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Active Member

Major Contributor

Master Contributor

Active Member

Senior Member

Major Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Master Contributor

Senior Member

Active Member

Addicted to Fun and Learning

Слава Україні

Similar threads