Thank you to everyone posting here, these are some very interesting insights and perspectives be expressed.
I just wanted to introduce a related concept that I haven't heard discussed
Siegfried Linkwitz (RIP) has some very definite ideas about what speakers should do. His ultimate designs are open baffle. (I've never heard such speakers, but really want to.) His website presents some very detailed explanations about sound presentation that are over my head, but I'm trying to chew through them. A lot of food for thought here...
http://www.linkwitzlab.com/speakers.htm
But part of his ideal is that the speakers should "disappear," leaving an experience of a kind of three-dimensional illusion for the listener, in the space. I am dubious.
He does seem to reflect that the best speakers should reproduce the illusion of a "live performance." I think this is a very narrow view, as soon as you step outside the realm of naturalistic recordings of acoustic instruments. (IMO, these type of recordings are easier to create and reproduce in general.)
I've never shared this perspective on what a playback experience should be, because to me recorded music
is music. Most of my relationship with music is of the recorded type. Live performances are hit or miss, at best. (Well, records are too, but you can just turn off a bad record
Speakers that "image" well can produce an illusion of, for example, a violin in a room. In this case, if the speakers reach this "ideal" and
disappear, the illusion is that a complex three-dimensional object, made of wood and metal, is actually the resonating object in the room, not the speakers.
This approach falls apart with most rock music or multitracked music. The problem is that the actual "resonating object" that is the medium upon which the artists construct their product are the monitor speakers in the control room!
One of the weird aspects of multitrack recording is the ability to superimpose instruments that naturally have very different SPL in a single recording. For example, you can combine a mandolin with a marshal guitar amp.
Then think about it: in a typical "pop" recording, we have a lead vocal floating on the top of instruments that generate far more sound energy!
To combine these disparate sounds into a convincing illusion of a performance (broadly) requires a bunch of "tricks" Otherwise the result is "uncanny" and unconvincing.
In this scenario, the actual
speaker boxes play a role in "integrating" the sound. Especially if you look back to classic rock era, the monitor speakers were ether soffit mounted, or wooden boxes. Mixes were then checked in different environments, but I think the wooden box played a foremost role, as it was considered the best playback system.
It is inevitable that the monitors themselves are integral to the work product. In the end, the listener must hear sound waves created by a resonating object. In that case, the speakers are not intended to "disappear," they are fundamental in creating the effect the artists intend.
One of my pet theories about rock music is based on the aesthetic quality that it should be
loud. It is possible to make recordings that sound loud when played back quietly. This is a tricky illusion.
One of the main carriers of such information is that objects generating a lot of sound energy are physical, and are usually recorded in a physical space. High sound pressure levels generate signature sonic characteristics in the way they interact with material objects. This sonic signature can be captured, and then played back at a lower level. The illusion works because we have an evolved auditory sense of when sound is loud, and when it is not. Quiet sounds just don't produce the same timbres as loud sounds.
(This is one reason that extreme dynamic range in a recording and playback system is not
required to communicate the dynamics of a performance.)
Overall, this is a tough illusion to pull off. To help, the mixer can use the fact that the speaker itself is a resonating object, and by combining these sounds of different loudness characteristics into a resonating object, it helps create the illusion that the sounds actually coexisted at roughly the same SPL. This might not be the best approach in terms of a recording that translates well, but everyone works at the limit of possibility. Producers can get desperate.
It works like this:
Let's say you capture a drummer playing loud in a room. The resonating and reflecting objects generate the "loud" sound signature.
If you had a singer singing at a medium volume at the same time, in the same room, the drumset will drown out the singer. The singer does not generate enough SPL to make the walls reflect, to make the drums resonate enough, to capture the sound signature of the singer because drums are overpowering them.
But if you capture the "loud" drums, then mix in electronically (or digitally) the vocal, recorded separately, to make it as loud as the drums, the illusion that the voice is as loud as the drums is enhanced by effects that cause the voice signal to modulate the drum signal, in some way.
This can be done with signal processing, but it's hard, because are ears are very sensitive to the sound signatures of physical objects creating sound. It's hard to fake!
So as a "cheat" if you will, the mixer can rely to a degree on the fact that both the voice and the drums, if mixed to the same level, will actually modulate each other through the effects that are imparted to the physical object of the speaker.
The drums signal is encoded with sound level information that tells us approximately how loud it was. Likewise the vocals. But if these two signals resonate a speaker box, it creates a rich, complex sound signature that helps create the illusion that the voice was "right there with the drums," rattling the walls! In this example, the walls are the walls of the speaker cabinet, because turning up the whole playback system to rattle the actual walls of the room is um...a special case.
If you consider a physical acoustic instrument, it is a three dimensional object, that generates a very complex three dimensional sound in space. The full timbre of the instrument can't be captured with a single mic. (It's much easier, IMO, to get a realistic recording of a violin with more than one mic).
If you were standing in a room with a violin player, and you moved your head, you would experience a different sound, because the direct sound is coming from a different angle out of the violin. This information
cannot be captured in a stereo (or mono) signal. This causes trouble for the illusion of the creating an actual three dimensional object in space. The illusion is unstable. Speakers that "disappear" are vulnerable to this collapse of the illusion.
On the other hand, if you have a "boxy" speaker, it actually is a physical, three-dimensional object, that exists right in the space you are listening. If you move around the room, you will hear a different timbre of sound, because the direct sound will be different. If the box is resonating, it will color the sound more from the side, versus a front listening position. Because of this, our mind can integrate the outputted signal in a way that does not require a narrow listening position, and is stable as the listener moves. (Because we know, instinctively, that when we move around a sound generating object, the sound will change. But our brain adjusts, so that it knows we are moving, and the speaker is standing still.)
Off the bat, I propose that this is a closer representation of the actual artistic work, if it was created in a studio, on speakers.
I haven't actually heard speakers like the open baffle designs of Linkwitz, those might be killer, I don't know. He does discuss on his website the problem of maintaing the image illusion, and claims that his bipolar speaker designs help stabalize the image.
But on speakers that have overly damped cabinets, music can tend to sound "uncanny" because the box is not completing the illusion. It sounds "fake".
On electrostatic speakers, which can generate almost a "holographic" sound image, floating in space, music created in a traditional studio environment can fall apart, with the instruments "floating" in random positions, sounding less related.
Floyd Tool in this lecture paints a picture of an ideal sound production system, as a whole, where the music producers rely on very neutral speakers to craft the product, and then the listeners listen on similar playback systems.
One can see that this might indeed be a kind of ideal, but at this point it seems only that, and cannot be relevant for legacy recordings.
One irony is that the point of view espoused by audiophile types, where speakers should represent acoustic events with some accuracy is an ever decreasing aspect of the music business. Where it's going is crazy, but that's another topic
From my understanding of the research Toole has done, listener preferences were the same across different genres of music. I find this surprising, but accept the results as presented.
But for the speakers that I am familiar with, those that come closer the "ideal" measured speakers produce a very unsatisfying listening experience, for
myself. This paradox is currently the subject of some obsessive personal research