Is Spinorama Valid?

preload · Jul 7, 2020

edechamps said:
There are a few technical reasons why someone eyeballing at the spinorama charts might be able to come up with a better estimate of listener preference than the Olive model's 74% correlation. It has to do with known shortcomings of the input variables that Olive defined. For example, the deviation variables (AAD, NBD) are arguably too simplistic because they do not take into account the fact that some frequency ranges are more critical than others (it weighs the entire 100-12kHz range equally). They do not take into account the fact that peaks tend to be more audible than dips. NBD splits the frequency range into fixed, discontinuous, 1/2-octave bands, which is also not great from a perceptual perspective.

Edechamps, I agree with you that the Olive model isn't perfect in the ways you mention. So how would an individual go about considering critical freq ranges? Do the spinorama analysts here memorize which bands are more important and place more "mental" weight when looking at the curves? And do they also memorize what Q's and amplitudes are more audible than others (and at what frequency) and also mentally weight the peaks higher? I guess I'm not clear on how one could use this additional information to then eyeball a FR curve and decide which one is "smoother" in the right ways. And for the sake of argument, I'm talking about visually inspecting the curves taken from several subjectively well-regarded loudspeakers and making firm predictions without listening to them (not a maldesigned speaker with a wild curve vs. a studio monitor).

Another example is the SM variable whose definition is incredibly weird, to the point where it looks more like a mistake than something deliberate.

Curious what's weird about it - it's simply the r^2 correlation of the FR response compared to a flat response? It should "correlate"with the NBD statistic, except it's calculated continuously instead of via discrete 1/2-octave bands.

It is quite possible that, if a new model was designed without these shortcomings, then we might end up with a correlation higher than 74%, from the same raw spinorama data. Or we might not. It's impossible to know without doing more research, which sadly requires a lot of expensive and time-consuming blind testing.

This answers my original question, was whether there existed evidence of a formula/technique using spinorama charts that outperforms the Olive model. In the absence of evidence, I'm wondering why people think their formula/technique might be more accurate/predictive, and how they know that it is? (i.e. have people done their own controlled listening tests to validate their preferences?)

The other reason why people might prefer looking at the curves themselves rather than trusting the Olive preference rating model is because they might want to use the speaker in a different way than the listening setup used to calibrate the model. For example, they might want to use a subwoofer (all subjective tests for the Olive model were done without a subwoofer). Or they intend to EQ the speaker, so they are looking for EQability rather than out-of-the-box performance.

Sure, that's fair - and there are likely many other reasons why individuals might learn more by looking at individual spinorama charts. And individual circumstances might make the Olive formula less applicable, to your point. Don't disagree.

preload · Jul 7, 2020

Ivanovich said:
I think I now understand the original question.

I think people here have come to accept the findings from the research that certain speaker attributes are (always) preferred.

To disprove the theory, I guess you could try to find a speaker that has a low preference rating and is still objectively the preferred speaker.

Right. And I too agree that certain speaker attributes tend to be preferred by listeners, that's a given. Rather, my question is how some individuals believe they can evaluate those speaker attributes via "eyeballing" a series of spinorama curves better than applying mathematical analysis (i.e. Olive's formula). To give an example, we know that listeners prefer a -1.8 (for the sake of argument) downward slope of the in-room response curve. Can anyone here eyeball a FR curve and determine it's slope? If not, how do you know if the speaker will sound "bright" vs. "dull?" [/quote]

I suppose one could quibble with the relative waiting of the bands and low frequency extension to tweak within the general formula, which could make it more tuned to a particular listener’s sonic priorities.

Yes, and is an individual capable of mentally "weighting" the bands using visual inspection and using that information to factor into whether Speaker A will sound better than Speaker B?

edechamps · Jul 7, 2020

preload said:
So how would an individual go about considering critical freq ranges? Do the spinorama analysts here memorize which bands are more important and place more "mental" weight when looking at the curves? And do they also memorize what Q's and amplitudes are more audible than others (and at what frequency) and also mentally weight the peaks higher? I guess I'm not clear on how one could use this additional information to then eyeball a FR curve and decide which one is "smoother" in the right ways.

Well, "spinorama analyst" is not really a thing. These "analysts" are just people with opinions. As you said, it's easy to "eyeball" spinorama charts and determine which speaker is best when the differences are large (really bad speaker vs. really good speaker). But once you get into small differences, the research is not there yet, and you leave the realm of science to enter the realm of speculation.

preload said:
In the absence of evidence, I'm wondering why people think their formula/technique might be more accurate/predictive, and how they know that it is? (i.e. have people done their own controlled listening tests to validate their preferences?)

I think you're putting too much faith in people

Controlled, rigorous speaker listening tests are hard, expensive and time-consuming to conduct. I would never assume someone has done such testing unless they mention it explicitly.

preload said:
Curious what's weird about it - it's simply the r^2 correlation of the FR response compared to a flat response? It should "correlate"with the NBD statistic, except it's calculated continuously instead of via discrete 1/2-octave bands.

Ah, if only it was that simple. Sadly, the coefficient of determination (r²) doesn't work like that. It's way more confusing and weird. I'll just copy-paste the standard blurb I use in response to this question, and you can decide for yourself if you want to go down that rabbit hole (headache painkillers recommended):

CAUTION: interpreting SM is very tricky and fraught with peril. The Olive paper describes SM as the "smoothness" of the response, which is misleading at best. The real meaning of SM as mathematically defined in the paper is way more subtle and hard to describe. In reality, SM describes how much of the curve deviation from a flat, horizontal line can be explained by the overall slope (as opposed to just jagginess). This leads to some counter-intuitive results - for example, a roughly horizontal curve with negligible deviations can have an SM of zero! (For more debate on this topic, see this, this, this, this and especially this and this.)

GelbeMusik · Jul 7, 2020

edechamps said:
CAUTION: interpreting SM is very tricky and fraught with peril. The Olive paper describes …

Again and again, there is no use in discussing some formula, which is not easily accessible to the public. It is hidden behind the curtains of AES' wizardry, asking for a considerable amount of cash from each person as an entry fee.

Anyway, as You claim, the formula doesn't describe what is said. So consider it to be a fail, that simple. It could be helpful to set up a thread, in which the formula is cited correctly and fully, discuss the case and then after refer to that link, in case. Again, with the formula explicitly cited. Trust, e/g my mathematical skills ;-) with confidence, no in-detail explanations needed.

Somehow the usual course of discussions here reminds me of catholic belief. In medieval ages the "book" was available in Latin only, so common folks wasn't able to test the weekly sermon against what is actually written (if it would matter, of course). Then Martin Luther came and ended the many frauds by just translating that "book" to common language. Warfare and eventual changes after ...

It is in particular un-scientific to construct a meaning out of somebodies writing (Olive's). It has to be clear in the first place. In doubt ask him. Or not, but then discard the attempt as fail.

edechamps · Jul 7, 2020

GelbeMusik said:
Again and again, there is no use in discussing some formula, which is not easily accessible to the public. It is hidden behind the curtains of AES' wizardry, asking for a considerable amount of cash from each person as an entry fee.

I agree that paywalls can be frustrating. I, for one, believe scientific research should be freely available to all as a matter of principle. But, sadly, it is what it is. If you want to dismiss all AES research under the grounds that it's behind a paywall, well, that's your prerogative, but I'm afraid you won't get very far that way as you would be deprived of large amounts of information.

In any case, this is a moot point in this case because most of the contents of the paper are also available, mostly verbatim, in a freely available patent.

GelbeMusik said:
Anyway, as You claim, the formula doesn't describe what is said. So consider it to be a fail, that simple.

In the specific case of that particular paper, with the methodology used (designing a model using PCA), it doesn't really matter that the description doesn't match the formula. In fact, none of the text matters, really. All that matters is that the formula in the paper is the formula that was actually used to calibrate the model. And so far we have no reason to believe that wasn't the case. (In particular, internal consistency seems fine.)

Thomas savage · Jul 7, 2020

For a company trying to make speakers with the broadest appeal yes , to individuals looking for what they like less so.

GelbeMusik · Jul 7, 2020

edechamps said:
... patent.

In fact, none of the text matters, really. All that matters is that the formula in the paper is the formula ...

I'm happy to disagree. First, I was asking You to focus the discussion of said formula. It should happen at a one and only place--formula included for consistency reasons.

Second I don't think that words are unnecessary. They connect the data to the real world. Incidentally Your trial to explain the formula says it all, doesn't it?

edechamps · Jul 7, 2020

GelbeMusik said:
First, I was asking You to focus the discussion of said formula. It should happen at a one and only place--formula included for consistency reasons.

Sure, why not. Usually that happens in this thread (or sometimes this one), feel free to continue the discussion there if you'd like. The only reason why I'm having this discussion here is because that's where @preload is asking questions.

As for the formula itself, I've linked to the freely available patent where anyone can read it for themselves.

GelbeMusik said:
Second I don't think that words are unnecessary. They connect the data to the real world.

You responded to a specific point I was making regarding Olive's paper with a vague, sweeping statement. I'm not quite sure how I'm supposed to respond to that. Is there a specific point you want to make around the metrics we are discussing?

GelbeMusik said:
Incidentally Your trial to explain the formula says it all, doesn't it?

Depends on what you mean by "explain". I'm trying to make sure people don't arrive at an incorrect mathematical understanding of how SM behaves relative to the input curve. The only reason I brought up Olive's text is because it tends to confuse people into such a mistaken understanding. I am merely trying to clarify how SM is computed; I am not trying to argue for any particular interpretation of what SM means. That would be a difficult thing to do, in any case, precisely because of its contrived definition. More importantly, trying to ascribe meaning to Olive's variables is fraught this peril because this is a purely statistical model, and as such it only deals with correlation, not causation.

Wes · Jul 7, 2020

GelbeMusik said:
1. spinorama does not really help in any way to assess a concrete situation from the outset.

2. The variables are too diverse and widely scattered. ...

can you explain the basis for #1 and #2 above?

have you conducted experiments that led you to those conclusions?

or what?

tmtomh · Jul 7, 2020

It would appear that Spinorama might be a limited tool, but certainly still a valid one. It can show us poor designs and it can make at least some useful predictions about how certain characteristics of a speaker are likely to cause or not cause certain problems in most listening spaces.

But I don't know if we'll ever get to a place where speakers can be measured with the same level of correspondence between measured fidelity and perceived performance as digital sources and amplifiers. One reason is that I think transducers are by their nature always going to be an order of magnitude more variable and less accurate than upstream components. Another reason is that even many objective-leaning folks - including myself - question whether there really is a singular standard of fidelity when it comes to speakers, and so for better or worse I don't know that a critical mass will ever coalesce around the idea that speakers can be arranged along a linear hierarchy of measured performance like upstream components can.

I think Spinorama is incredibly valuable at illustrating important tradeoffs made in speaker design - how crossover points impact consistency (or lack thereof) of directivity across frequency ranges; how ports improve bass extension but with some costs in bass quality; how DSP can improve bass linearity but potentially at the cost of higher bass distortion; and perhaps most importantly, how it is possible to design two different speakers that produce similar in-room responses but that rely to very different degrees on reflected sound to do so.

It seems to me that listening space size and reflectivity can have a major impact on how important directivity, bass port location, and reflected sound actually is in any given individual situation, and on whether design choices made in those areas play out well or poorly in a given space.

GelbeMusik · Jul 8, 2020

edechamps said:
Sure, why not. Usually that happens in this thread (or sometimes this one), feel free to continue the discussion there ...

It seems I should check the case on my own, and present my results in the mentioned thread.

Otherwise, I may have confused your commitment to education with a rather uncritical defense of the Olive rating. I was still quite biased by my own criticism of smoothness. Like many things about the spinorama standard, it always has big gaps. I can't say often enough how good the standardization of already known best practices is in itself - only the theoretical justification should not be questioned too sharply. It is often defended for the sake of the standard, but not for its own sake, I hope.

Translated with www.DeepL.com/Translator (free version)

preload · Jul 10, 2020

edechamps said:
I agree that paywalls can be frustrating. I, for one, believe scientific research should be freely available to all as a matter of principle. But, sadly, it is what it is. If you want to dismiss all AES research under the grounds that it's behind a paywall, well, that's your prerogative, but I'm afraid you won't get very far that way as you would be deprived of large amounts of information.

In any case, this is a moot point in this case because most of the contents of the paper are also available, mostly verbatim, in a freely available patent.

I hope that folks, in general, aren't debating the merits of Olive's paper without actually reading it!

preload · Jul 10, 2020

edechamps said:
Well, "spinorama analyst" is not really a thing. These "analysts" are just people with opinions. As you said, it's easy to "eyeball" spinorama charts and determine which speaker is best when the differences are large (really bad speaker vs. really good speaker). But once you get into small differences, the research is not there yet, and you leave the realm of science to enter the realm of speculation.

Thank you!! This is was what I was leaning towards as well, but just wanted to make sure I wasn't missing something.

I think you're putting too much faith in people Controlled, rigorous speaker listening tests are hard, expensive and time-consuming to conduct. I would never assume someone has done such testing unless they mention it explicitly.

I've learned to give people the benefit of the doubt...at least initially.

Ah, if only it was that simple. Sadly, the coefficient of determination (r²) doesn't work like that. It's way more confusing and weird. I'll just copy-paste the standard blurb I use in response to this question, and you can decide for yourself if you want to go down that rabbit hole (headache painkillers recommended):

CAUTION: interpreting SM is very tricky and fraught with peril. The Olive paper describes SM as the "smoothness" of the response, which is misleading at best. The real meaning of SM as mathematically defined in the paper is way more subtle and hard to describe. In reality, SM describes how much of the curve deviation from a flat, horizontal line can be explained by the overall slope (as opposed to just jagginess). This leads to some counter-intuitive results - for example, a roughly horizontal curve with negligible deviations can have an SM of zero! (For more debate on this topic, see this, this, this, this and especially this and this.)

Thanks, I took a look at your links, and well, I'm still confused about what the confusion is all about!

In the Olive formula, my understanding is that first, a regression line is "drawn" through the predicted in-room response (PIR) curve between 100Hz and 16kHz. The r^2 is then determined to measure the degree of correlation between that regression line and the PIR curve. PIR curves that tend to coalesce around a straight regression line (i.e. are "smooth"), regardless of slope, should achieve r^2 values close to 1. PIR curves that have lots of peaks and dips will have lower r^2 values (which makes sense because PIR curves with lots of peaks/dips are not smooth).

I do not believe the SM metric calculates the r^2 based on a FLAT horizontal line. Here is the text from the paper:
"For each of the 7 frequency response curves, the overall smoothness (SM) and slope (SL) of the curve was determined by estimating the line that best fits the frequency curve over the range of 100 Hz-16 kHz. This was done using a regression based on least square error. SM is the Pearson correlation coefficient of determination (r 2 ) that describes the goodness of fit of the regression line defined by equation 4."

Equation 4 is just the standard r^2 formula. Also, this appears to be different from the NBD and AAD statistics, which ARE based on deviation from a flat horizontal line.

Let me know if this sounds right to you.

MZKM · Jul 10, 2020

preload said:
Thank you!! This is was what I was leaning towards as well, but just wanted to make sure I wasn't missing something.

I've learned to give people the benefit of the doubt...at least initially.

Thanks, I took a look at your links, and well, I'm still confused about what the confusion is all about!

In the Olive formula, my understanding is that first, a regression line is "drawn" through the predicted in-room response (PIR) curve between 100Hz and 16kHz. The r^2 is then determined to measure the degree of correlation between that regression line and the PIR curve. PIR curves that tend to coalesce around a straight regression line (i.e. are "smooth"), regardless of slope, should achieve r^2 values close to 1. PIR curves that have lots of peaks and dips will have lower r^2 values (which makes sense because PIR curves with lots of peaks/dips are not smooth).

I do not believe the SM metric calculates the r^2 based on a FLAT horizontal line. Here is the text from the paper:
"For each of the 7 frequency response curves, the overall smoothness (SM) and slope (SL) of the curve was determined by estimating the line that best fits the frequency curve over the range of 100 Hz-16 kHz. This was done using a regression based on least square error. SM is the Pearson correlation coefficient of determination (r 2 ) that describes the goodness of fit of the regression line defined by equation 4."

Equation 4 is just the standard r^2 formula. Also, this appears to be different from the NBD and AAD statistics, which ARE based on deviation from a flat horizontal line.

Let me know if this sounds right to you.

I have messed with the SM formula, and changing the regression line/slope has 0 impact on the score, it changes the numerator and denominator in a direct relationship that cancels out.

As such, it’s just the formula for correlation, in which a steeper slope improves the score as the change in the y-axis is highly correlated with its position on the x-axis (SPL decreases with increasing frequency)

What Olive should have tried, and what I’ve been planing on doing, is “normalize“ the PIR curve, likely to its own slope and not the target slope which he even states in the paper likely is altered with directivity, and then run a AAD/NBD on it.

SM currently is pretty sensitive; for an upcoming speaker, I changed the upper range from 16000Hz to 16001Hz (the data file only had ~15450Hz before 16000Hz, but it had 16000.5Hz that I decided to experiment by including), and it changed the rounded preference rating by 0.1 points (I think it was a ~0.03 point change). It‘s only 1 extra point to the ~145 current points, I did t think it would have that large of an impact.

preload · Jul 10, 2020

MZKM said:
I have messed with the SM formula, and changing the regression line/slope has 0 impact on the score, it changes the numerator and denominator in a direct relationship that cancels out.

I'm sorry, but if you change the slope of the regression line without changing the PIR curve, it WILL change the r or r^2 value. There's no way around it mathematically. Perhaps I'm misunderstanding and you can show graphically what you did and what the r^2 values were.

Edit: It's possible I'm not understanding. The regression line is determined by the data points. So are you saying that if you "rotate" the points on the PIR curve, and thereby change the slope of the regression line, the SM formula doesn't change? If so, yes, that makes sense.

edechamps · Jul 10, 2020

preload said:
PIR curves that tend to coalesce around a straight regression line (i.e. are "smooth"), regardless of slope, should achieve r^2 values close to 1. PIR curves that have lots of peaks and dips will have lower r^2 values (which makes sense because PIR curves with lots of peaks/dips are not smooth).

Nope, that's not how it works. r² is counter-intuitive. You might want to stop using the term "smooth" for a moment - it's not part of the definition of r² and it just makes things confusing. Olive should never have used that term.

If you don't believe me, try calculating a SM (i.e. r²) for a PIR curve that only has negligible deviations perfectly centred around an horizontal straight line. You will find SM≈0, in direct contradiction of what you just said ("curves that tend to coalesce around a straight regression line should achieve r^2 values close to 1"). Then, introduce a tilt into that PIR curve, and you will find SM≈1, again in contradiction with what you just said ("regardless of slope").

preload said:
I do not believe the SM metric calculates the r^2 based on a FLAT horizontal line.

Depends on what you mean by "based". r² compares the variance around the regression line to the variance around the mean. The mean is a flat horizontal line. Hence, r² compares the variance around the regression line to the variance around a flat horizontal line - it is NOT just about the variance around the regression line. (This is what confused everyone, including Olive, apparently.) This is why r² is also described as the proportion of variance explained by the regression line. If the regression line is the same as the mean (i.e. the curve is roughly horizontal), then the variance around the regression line is the same as the variance around the mean ("the regression line does not explain any of the variance") and r²=0. On the other hand, if the curve is tilted, then the regression line will be a better fit compared to the mean, and r² will increase.

If you're still confused, it might help to look at a step-by-step example of an SM calculation on real speakers.

preload · Jul 10, 2020

edechamps said:
Nope, that's not how it works. r² is counter-intuitive.

I respectfully disagree that r^2 is "counterintuitive." Pearson r (and r^2) is a pretty basic statistical test, and I interpret it all the time when it is used in biostatistics.

If you don't believe me, try calculating a SM (i.e. r²) for a PIR curve that only has negligible deviations perfectly centred around an horizontal straight line. You will find SM≈0, in direct contradiction of what you just said ("curves that tend to coalesce around a straight regression line should achieve r^2 values close to 1").

The r^2 for a PIR curve that has negligible deviations from a horizontal line (i.e. the points of the curve "follow" that line), should have an r^2 that approaches 1.0. If you're calculating the r^2 for that scenario and you're not getting near 1.0, there was a miscalculation. How are you calculating your r^2? Are you using a statistical software package?

Depends on what you mean by "based". r² compares the variance around the regression line to the variance around the mean. The mean is a flat horizontal line. Hence, r² compares the variance around the regression line to the variance around a flat horizontal line - it is NOT just about the variance around the regression line. (This is what confused everyone, including Olive, apparently.)

I don't think this is correct. The Pearson r is examining the variation of the data points (i.e. the PIR curve) around a regression line that goes through it. I'm not sure where you're getting the need to factor in the mean value (i.e. a flat horizontal line).

If you're still confused, it might help to look at a step-by-step example of an SM calculation on real speakers.

I tried to follow what you were doing with the SM calculation in the link, but after the regression line graphs, the next steps don't seem to match up with how one would typically calculate the Pearson r.

edechamps · Jul 10, 2020

preload said:
I respectfully disagree that r^2 is "counterintuitive." Pearson r (and r^2) is a pretty basic statistical test, and I interpret it all the time when it is used in biostatistics.

Sure. Let me rephrase: it's the way r² is used in Olive's SM that's counter-intuitive.

preload said:
The r^2 for a PIR curve that has negligible deviations from a horizontal line (i.e. the points of the curve "follow" that line), should have an r^2 that approaches 1.0. If you're calculating the r^2 for that scenario and you're not getting near 1.0, there was a miscalculation. How are you calculating your r^2? Are you using a statistical software package?

You don't need to do any calculations to come to this conclusion. You can deduce it directly from Wikipedia's definition of R², which is:

R² = 1 - (SSres / SStot)

In my previous post I took the example of a curve that is deviating around a horizontal, straight line. Thus, the regression line is the same as a horizontal line through the mean. Therefore, the variance around the regression line (SSres) is the same as the variance around the mean (SStot). SSres = SStot, thus, according to the above definition, R²=0. QED.

preload said:
I don't think this is correct. The Pearson r is examining the variation of the data points (i.e. the PIR curve) around a regression line that goes through it.

…and comparing it to the variation of the original data (i.e. the variance around the mean). Off the top of my head I can't think of a straightforward way to demonstrate this for the Person correlation coefficient (r), which might be because the relationship between R² and r is not completely trivial. I prefer to stick to the coefficient of determination (R²) - it's easier.

(One thing that might help dispel the confusion: in my posts, I'm using the terms r² - the Person correlation coefficient squared - and R² - the coefficient of determination - interchangeably. I'm allowed to do that because they are the same thing when least-squares linear regression is used, as it is in the case of SM. Arguably I should have mentioned that up front. I'll try to be more careful with that distinction from now on.)

preload said:
I'm not sure where you're getting the need to factor in the mean value (i.e. a flat horizontal line).

I don't know what to tell you. It's literally right there in the Wikipedia definition of R² (SStot). There's even a figure on the right that illustrates how the mean is used, showing the flat horizontal line and everything. The legend on that figure states: "The better the linear regression (on the right) fits the data in comparison to the simple average (on the left graph), the closer the value of R² is to 1". I don't know how I can make this any clearer.

preload said:
I tried to follow what you were doing with the SM calculation in the link, but after the regression line graphs, the next steps don't seem to match up with how one would typically calculate the Pearson r.

The example steps I show in the post I linked use the explained variance relation, an equivalent definition of R², which can be used in this case because the conditions are met (least-squares regression). This alternative definition is R²=SSreg/SStot, or, in the terms I used in that post, R²=RSS/TSS. The reason why I used this definition in my illustration is because it's easier to visualize.

Tangband · Jul 10, 2020

March Audio said:
They certainly will have different effects on the sound, however Tooles research indicates that the room, whilst changing sound, will not really affect speaker preference. The same speakers are generally favoured regardless.

Thats not entirely true in a small room with walls and a stereo setup. You have something called baffle step compensation thats very dependent of the placement of the loudspeakers in a room.

March Audio · Jul 11, 2020

Tangband said:
Thats not entirely true in a small room with walls and a stereo setup. You have something called baffle step compensation thats very dependent of the placement of the loudspeakers in a room.

Sorry Tangband but the Toole research shows the same speakers are favoured regardless of this. Yes I am fully aware of what baffle step is.

Is Spinorama Valid?

Major Contributor

Major Contributor

Addicted to Fun and Learning

Senior Member

Addicted to Fun and Learning

Grand Contributor

Senior Member

Addicted to Fun and Learning

Major Contributor

Major Contributor

Senior Member

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Master Contributor

Similar threads