Everything you are saying makes perfect sense. The marketing problem here is active circuitry is impure in the eyes of many of their customers. Imagine doing a digital active crossover? End of days!!!! And a much better less expensive speaker
The only thing wrong with that is using a digital active crossover on a speaker like that will uncover all the inherent problems with that type of crossover. Having myself attempted using a number of digital crossover/equalizer DSPs, I found they all have many issues that you'll never consider until you're struggling to use them. Then when you're readings on the plot show you're close to dialed in, you play some music for the first time and it sounds awful!
There's no end to the people who praise DSPs but from what I've gathered, most people are using speakers that simply aren't able to reproduce the sound with any realistic clarity (they simply can't hear those sounds that are being lost) or they don't know what the DSP is losing in the digital world before its converted back to analog. I noticed there was a significant loss in fine detail, more like the quieter parts were lost. The trailing edges of the sound were missing parts and if enough was removed it was deleted by the way D/A - A/D is set up to work.
Until they make the next step up from 24/96 to a 48/ 192 (imagine the price tag on that) system, there will be audible loss in sound. To show the loss you need to understand how conversion works and then look at the processor used in typical DSPs like miniDSP or Behringer rack mountable DSP systems that tout numerous equalizers and appear to be able to utilize them without any limitations in both EQ settings and crossover settings (plus time alignment delay and even analyzing possibilities).
The reality is, everything is dependent on how much memory is available and how good the processor is
Instead of having the ability to pick out precise EQ points and Q widths, you find that you're not able to always do that depending on how much memory is left after choosing crossover points (again you can't pick an exact point, you're given choices near what you want) and slope types, you'll rarely have enough power to do much else. There's a running tally of how much processor and memory you have used as you work with them.
Then after a week or 2 of sitting there hitting tiny buttons or constantly having to access your computer to make s simple setting change (which can be done in a split second using active analog crossovers and EQs) once you have settled on "either or" and the trade offs you've made along the way, listening to music has just lost its magic.
The intricate sounds of complex analog signals have been lost in the shuffle of conversion due to memory and processing power limitations.
Now I'll explain my idea of what the future of high end sound reproduction will be after DSP has been perfected.
Since we all know sound is actually mono in its source but we use 2 ears to decode it for our brain to determine it's location. It's a 3D type delay sensory function that positions the source. But sound travels in every direction as long as nothing blocks it. Meaning if you blow up a firecracker that's being suspended inside a sphere the blast would cause a wave to emanate outward in all directions (not exactly evenly depending on how the firecracker was made but with a perfectly made encasement it would try to be evenly distributed) and continue in that spherical pattern until it hit a more dense object or reflected in a new direction that is determined by the angle and density of that surface. As long as the atmosphere remains constant the pattern will continue at the pace it originated until the wave has stretched to a point that it can no longer affect the atmosphere to propagate it's existence and the sound becomes quieter.
So the future speaker must be able to put out sound in all directions. Hanging from the ceiling or on a thin pole or by magnets or whatever. It must be able to control the dispersion pattern and manipulate the length and volume of each wave constantly as the input directs it to be reproduced. That means multiple small speakers must be covering the surface (let's say there are 360x 1" drivers aimed evenly in all directions). Each driver has its own amp and DSP controller. Each DSP also has a mic that is set up to aim at each diver from the edges of the wall surface. 360 tiny mics aimed at the hanging ball.
When the input to the speaker tells it that the sound is footsteps coming from around the corner, the ball will create sound in all directions but it will use some speakers to deaden the sound or cancel it out in places the reflections should not be heard. It also will direct which sound and volume each driver produces and it will add reflections into the playback to make the sound come from a point in any place of the room. The main limitation being the sound distance cannot be too far outside the room boundary. Meaning a life like recreation of a band or a play or anything that has a set distance environment could be scaled to fit within the boundaries and be able to change the scale as the point of the recording mic moves in relation to the original recording. Of course for movies this becomes quite a difficult task but for a fixed mic the task would be able to reproduce sounds exactly as they were originally but scaled to fit the room so you aren't loosing any sounds they are simply adjusted to seem like they were originally coming from a space that's exactly the same size as your room.
Making recordings based on room size might be a way of simplifying the task or making the system based on a set room size that is required might be more realistic. The memory and processing needed will be enormous but not impossible by any means.
It's just a matter of how important it is to us as time progresses and the cost to make it might end up being more for a theater type set up that would be used on movies that were recorded and based directly on this device to ensure there is not going to be any situations where the DSP cannot function properly or fast enough to keep the 3D sound image placed properly. Moving cameras would make complex reproduction. A simple concert would become ideal for this type of technology to shine. The only step further for home use would be to have each person wearing a positioning locator to adjust the sound to be as if you were actually moving within the environment. This would only be practical if the sound was accompanied by actual images projected into the space your standing. A wall would be shown but not felt. Perfect for computer games that use virtual reality and able to incorporate the entire room where the sphere is positioned. There again basing the games and the sound field on a set room size would make the reproduction much easier. Scaling would be used to adjust to changes in surroundings as you move about and enter new areas.
This seems to be the only realistic way that sound can be accurately reproduced, using 2 speakers and expecting miracles has not worked. We have finally got to a point where we understand what's going on but we are stuck on this idea that we need 2 full range speakers to reproduce the original mono signal!?? It appears that we have gone off course along the way because it's easier to fake a real sound with 2 speakers than with 1. But of course the design of the speaker enclosure has been problematic and we continue to try and force that design to work by eliminating as much of the unwanted inaccuracies as we can.
Reflections, refraction, bounce,combing, cancelations, adding, tonal deviations, phase shift, positioning, dispersion, driver limitations, loss in path, interference, conversion and ability to capture the source signal accurately are just some of the problems we have been trying to element and so far we have not gained much ground other than setting up a fixed stage that positions sounds to a certain degree. I think we have evolved to see the limitations of the design we have been trying to force into what it simply cannot be
DSP is the pathway to controlling sound and manipulating it to where it can be placed within a room at any point. Stopping reflections or adding them in as needed is the next step. There will not be a need for room mode corrections using physical objects. Hard surfaces will not be a problem when DSP can cancel them out entirely. Off axis will be something that future people will never discuss, you can buy a home and use any room that's large enough to set the speaker up and it can also contain that some in that room by setting up boarders and cancelling the sounds that will travel outside that room.