To answer the question in the OP seriously, it seems fairly obvious to me that the rational ratio these days is to spend everything on an active DSP-based speaker system, preferably one with digital input, an on-board volume control and basic tone controls or eq. Then set apart some small bucks for a google chromecast audio using its digital out as the source, or any CD- or DVD-player with digital output from a flea market. The question is basically if one can find an active speaker which suits one's needs and aesthetics. This gives much more "sound per dollar" than any other solution.
Then one can take it from there, when funds, time, space and spouses allow:
- Introduce some box or preamp with room correction into the system, or buy roon and use the roon equalizer for room correction
- Commit to the hassle of getting multiple subwoofers to work
- Go multichannel, necessitates more speakers and a processor of some kind, preferably one with a good algorithm for upmix from stereo
All of these things will represent improvements over a straight stereo setup, but they add cost and time and take up space etc.
(on room treatment, my opinion is that domestic rooms sound fine as they are for sound reproduction, without major treatment, as long as walls are not bare, there are some book shelves, curtains, carpets etc in place. My view is based both on psychoacoustic studies - the brain adapts to the room - and my own anecdotal listening impressions over the years)
If one insists on passive speakers - and particularly for floor standers there still aren't that many actives much to choose from in the budget range at least - I would say that one should spend everything on getting the best speakers one can afford, and then just find an old AVR with reasonable amounts of power from a flea market or a garage sale.
So my suggested ratio is as follows:
Speakers - 99 percent
Everything else - 1 percent