Thanks for the insight Don, but that raises more questions for me and I think my main questions is still not yet answered (?)
What do you mean by peak-to-power ratio? I understand that dB is on a logarithmic scale and thanks to your post, I now understand that more power means less clipping. However, what do you mean by the "midrange" rather than the range as a whole?
I think though that the crux of my initial question is still not answered (unless I'm mistaken). Why do more powerful amps give a better sound even when compared at the same volume to weaker amps (e.g. a desktop amp vs. a phone)? Supposedly, in certain situations it can help add more detail to the bass or make the highs a bit more crisp, etc.
On some level, I kinda get it. Like if you have a diaphragm that is really stiff compared to one that is more pliable, more power is needed to "push" the stiffer diaphragm properly. But if you can get the same volume off a phone vs. a good amp, what contributes to the better sound? From my understanding, volume comes from the amp's ability to power the drivers. If my phone has a weaker amp, I just need to turn it up until there's enough voltage (or watts?) to reach the same volume as a more powerful amp, correct? So what makes the difference in "better sound" if all you need to do is turn up the volume? Provided there's not clipping or that kind of technical faults.
You listen at an average level but there are peaks much louder (higher) than that. Drums in music, gun shots or explosions in movies, and so forth can have peaks much larger than the average listening level. Even single tones have a peak value higher than its average level; look up "crest factor".
Midrange is typically defined as the voice band, 300 Hz to 3 kHz, the frequency range typical of the average voice. A deep male voice down to 300 Hz, a baby's cry up around 3 kHz. Our hearing peaks (is most sensitive) at around 3 kHz. Below that range is bass or low frequencies (LF) and above treble or high frequencies (HF).
I don't know about the rest of "better". In some cases a lower-powered amp that has greater SNR (lower noise) will sound better than a high-powered amp. More powerful amplifiers have things other than higher watts as your first post said. Lower output impedance may better control the speakers. Better power supplies may sustain higher wattage for longer (yes, that is also part of "higher wattage"). If you are clipping due to the high peaks in music and movies then a higher-power amplifier may help. But I tend to think a lot of the difference is from expectation bias; you buy a new high-power amp and expect it to sound better. Many, many times I have heard "something new" when I changed a component only to go back to the old and discover it was there all along; I had ether forgotten or not focused on it before. Folk tend to listen more critically when something changes, with the expectation of sonic differences from that change, and that extra focus helps identify "new" things that were often there all along. Like tighter bass or silkier highs. Blind testing can be very revealing and vexingly humbling at the same time. At 1 W most of the time a 10 W and 100 W amp will sound the same except the 100 W amp is likely to be a little noiser (higher gain is needed to generate those extra watts from the same input signal).
Volume is a sneaky thing. If you just bump the volume 1 dB whilst watching a movie you'll hardly notice. If you compare two components and one is just a fraction of a dB louder, then the louder one invariably sounds "better". That is one of the great sales gimmicks; make the more expensive component just a hair louder, and it will sound better every time, whether it really is or not.
I consider loudness curves a separate issue; read the equal-loudness curves article on Wikipedia for an intro. Yes, at most levels much more power is needed by bass (LF) signals than by the midrange, but that goes back to how much power you need to listen in your room to your speakers at the level you want. And as Amir implied the power needed also depends upon the characteristics of the speakers. It's complicated.
HTH - Don