NB! This post got a bit long, again. The twitters among us are advised to ignore this post.
@Wombat , if I'm wrong on a matter, then please point it out.
@Floyd Toole has been very forthcoming, even benevolent in his behaviour trying to replace confusion with facts and insights. And I'm truly very grateful for that!
Having said that, I think the "room discussion" has been fruitful for many of us. At least for me... We have had a good discussion on steady-state vs direct sound. And who knew, beforehand, that Genelec's room compensation software GLM can be described thus:
"The red and green GLM curves are estimates of the perceived direct sound frequency response, after measurements performed at one or more locations, before and after compensation".
The important words here, and that may be where Genelec deviate from other providers of room software, are "
perceived direct sound". And if we look at Genelec's in-room product performance guide (
https://www.genelec.com/sites/default/files/media/Studio monitors/Catalogues/genelec_monitors_in-room_performance.pdf), Genelec seem to be more focussed on letting the direct sound dominate in the listening position than you'd expect from a seller in a hifi-shop. So there seems to be a SS vs DS theme that deserves more factual discussion to elevate insight.
Further,
@Thomas Lund stated that "The program is furthermore designed and updated based on research and data from professional applications, which may be different from home requirements".
Which means that GLM as of 2010 may not be exactly the same as GLM as of 2020. That makes it a bit hard to understand GLM perfectly well from an outsider's position.
If I were to try and sum up a broad categorization of views on speakers and room, I'd do it like this:
- Only the speaker matters, because we listen through the room. See to it that you use a speaker that is smooth and of low distortion in anechoic chamber.
- The speaker's integration in the room can be improved in the lower frequencies up to ca. 300 Hz. Use SS measurements to make compensation filters in the lower frequencies.
- The speaker's integration in the room can be improved in all frequencies. Use SS measurements to make compensation filters in all frequencies.
- The speaker's integration in the room can be improved in the lower frequencies up to ca. 300 Hz. Use measurements of perceived direct sound to make compensation filters in the lower frequencies.
- The speaker's integration in the room can be improved in all frequencies. Use measurements of perceived direct sound to make compensation filters in all frequencies.
I guess
@Cosmik is a proponent of (1).
Further, I have the impression that view (2) is a sort of consensus on ASR. And (2) is the view which is most supported by Toole's book, if I'm not mistaken.
However, I believe view (3) is pervasive outside of ASR. Audiolense and Acourate offer this kind of all-frequency-corrections, right?
View (4) and (5) are "novel" and based on input from Genelec. I believe Genelec in practice is somewhere between (4) and (5), maybe closer to (4), because compensation at higher frequencies in GLM are small.
From a research program point of view, this separation of ways to approach audio has some interesting implications. View (1) is a way where measurements and theory are most easily combined because you can control for the room 100 percent. A research program that can control for all factors is an attractive one because of its simplicity, where the beautiful math is the final judge. My experience tells me to be wary of such research programs because they often fail when confronting a reality where the human factor plays a role. However, in the case of audio, we do know - thanks to Toole et al. - that view (1)gives us sound of very high quality.
So deviating from (1), to go further down the list, means that we're trying to improve something that's pretty good already. It's always risky to leave a robust model - where we understand "everything" - to replace it with something that is more complicated and potentially confusing.
Once we leave (1), two new factors arise: (a) the room and (b) the human factor (perception). Both (a) and (b) represent sources of standard errors in the empirical data material that researchers collect and study. It's naturally easier to control for (a) the room than (b) the human factor; (a) is quantized by microphones while (b) is quantized by measurements (mostly surveys, but in some research programs they also measure what happens in the brain) the complexity of human perception.
For the sake of simplicity it's tempting to settle with research program (1), characterized by beautiful mathematics. However, needless to say, this research program was more or less solved before the millennium shift; any improvements since then have been evolutionary, small steps, and researchers have been in the know of most of the compromises made in speaker production.
If you don't feel comfortable choosing the uncomplicated way of (1) - maybe because you have data and experiences that tell you the world is a bit more complicated - you need to venture below (1). And as soon as you leave the simple world of aesthetically attractive mathematics, disagreement will flower among scientists. Sometimes, this disagreement is based on one party being confused; on other occasions there's real room for disagreement and "confusion" is not the correct way to characterize a rival scientist.
Looking at speakers from a factor approach view, we may think of the following factors:
i) Speaker
ii) Room compensation
iii) Human
If we have full control over all factors, we could make sound that will optimize any individual's utility at any point in time. In reality, however, there is an emerging consensus on (i) , i.e. what constitutes a good speaker. Isn't this consensus seen more easily in the world of professional monitors where there is less deviation in design than in the "hifi" and "high-end" world?
But there is no overall consensus on what is (ii) good room compensation, as far as I understand. Olive's survey from 2009 found that room software was the source of differences in perception, but he couldn't drill down to the exact factors that gave rise to differences in perception related to room DSP, did he? And I am not sure we will ever be able to understand the individual human at all times; because your preferences change according to time of day, season, emotional state and many other things. So factor (iii) may always be approached by averaging (median) many individuals preferences over time. In other words, an aesthetically unattractive compromise may be awaiting us at the end of the road, after all. But is that a reason to settle for the mathematical beauty of option (1)?
Confusingly yours,