There is a rational (a.k.a. 'philosophical' [spit emoji]) argument that says that humans have evolved to focus on what a source sounds like, even in a room. They use two ears, head movement, frequency response, phase and timing of transients to hear through the room and to register the room sound separately from the speaker.
@Floyd Toole would agree with this, based on his book, but only above the transition frequency. At low frequencies, the room dominates over the speaker because of our inability to distinguish between direct sound and reflections at these frequencies, combined with the huge impact of modal resonances.
Most audiophiles don't believe this; it is not possible to devise an experiment to demonstrate it directly and objectively so it cannot be real; it is mere philosophy [spit emoji].
Such experiments have, in fact,
been done. The conclusion is that a human listener is quite capable from separating the room from the speaker. See also section 7.6.2 of Toole's book (Third Edition) in which this study and others are discussed.
The opposing argument says that speaker and room are a system with a composite frequency response at the listener's ears which can be modified to some desired result by changing the EQ of the speaker. This is the basis of target curves and room correction. It assumes no ability for a human to hear through the room. It is easy to measure with a laptop and mic. Coincidentally, this view has arisen over recent years at the same time as laptops and cheap measurement mics have become available.
Again, this is not true above the transition frequency.
If a truly neutral speaker existed then if (1) was true it would sound the same in any room OR for (2) it would still need its EQ playing with.
You missed a third option: (3) it would not necessarily sound the same in all rooms, but it would still be preferred to any other speaker in most rooms.
Many audiophiles accept research that says that smooth dispersion patterns are preferred over lumpy ones (even if they don't act directly on it). But they don't seem to worry about the overall depth of variation of dispersion. Roughly, a wider baffle would result in less need to tweak the EQ in (6) and consequently less of a compromise, but this is not known to most audiophiles.
Again, the research does not quite support the idea that speakers that are less directional sound the same as speakers that are more directional. However, I think one could make the case that, as long as their on- and off-axis responses are flat and consistent, these speakers will be given similar preference ratings.
Drivers beam as frequency increases - simple, objective physics. A large, beaming driver crossing over to a small driver produces an abrupt change in directivity. Adding an extra midrange driver in between them maintains wider dispersion and smoother directivity overall, plus other benefits related to power handling etc. However, traditional crossover filters have sonic side effects resulting in many speaker designers stretching the frequency ranges over which drivers have to work - even for supposedly up-market speakers.
Good speakers (JBL LSR30x, M2, Revel, Genelec, Neumann etc.) use techniques to mitigate this problem (in particular, waveguides), and judging from their off-axis measurements, these techniques seem to be effective. (Note: as far as I know, loudspeaker waveguides did not exist back in the 1970s. Horns did exist, but they're not quite the same thing.)
A neutral or non-neutral speaker with flat on-axis FR will give a non-flat FR in a real room if the measurement includes the room sound. An average speaker and average room will yield an average FR. A solely empirical view of this based on real speakers in real rooms results in the belief that listeners prefer an average FR and hence the notion of the target curve. This is then even extended to
headphones. In the philosophical realm this can be anticipated and debunked but only in a way that is incompatible with the empiricist world view.
I'm not quite sure what you mean by "In the philosophical realm this can be anticipated and debunked". I believe you are describing Toole's famous "circle of confusion". The key here is that recordings are produced using monitoring systems that are comprised of speakers such as Genelecs, which have a flat on-axis response, positioned optimally in a well-treated room. Therefore, these recordings are calibrated to sound their best using neutral speakers in good rooms. Therefore, if you want the best possible sound, you need to replicate this setup - a neutral speaker in a good room. And indeed this is precisely what listeners prefer, according to the research results. This makes a lot of sense, IMHO.
Belief that frequency response is everything results in odd appendages to speakers such as bass reflex that provide the right frequency response results at the expense of timing performance.
"Timing performance"? Where is this coming from? Can you cite studies that show that this "timing performance", as you say, is relevant in subjective loudspeaker evaluation?
It also excuses any kind of crossover as long as it results in the right FR at the measurement mic.
If by "measurement mics" you mean the on- and off-axis responses of the speaker as measured in
anechoic conditions, then sure. If you mean the response measured "in-situ" at the listening position, then no, absolutely not. Two ears and a brain can distinguish between the speaker's response and the room contribution; a measurement microphone can't (or at least, not easily).
1970s designers used expensive anechoic chambers, eschewing the much cheaper and easier (and naive?) option of in-room FR measurements.
Today's serious loudspeaker manufacturers, such as Harman, Genelec, or Neumann, use anechoic chambers to design their speakers. Not in-room measurements.