I did a bit of digging on Spotify’s output before with DigiCheck and the TotalMix level meters on my Babyface Pro and am pretty sure I have the explanation for what was seen in the OP.
So, as per usual, the source files submitted to Spotify are 16/44.1 WAVs, lossless and equivalent to CD quality. A large amount of modern music is boosted in level and limited in the mastering stage to -0.5, -0.3, -0.1, or even -0.0 dBFS, then exported to WAV. This is all jolly and well, until lossy encoding takes place.
Spotify takes these files and encodes them into Ogg Vorbis. In the process, the waveform of the files gets altered slightly. Now those peaks that were previously at -0.5 dBFS or whatever in the WAV, could now be +2.0 dBFS in the Ogg copy.
This is not a problem in itself, if Spotify’s Ogg decoder were to format its output in floating point. Any digital volume control that comes after Spotify, before the output is converted into integer, could reduce the gain, and those +2 dBFS peaks wouldn’t get chopped off.
However, the Ogg decoder in Spotify is set to output in 16-bit, and in integer format at all times. This means that when normalization is turned off, there is simply no headroom to accommodate for any peaks above 0 dBFS that arose from the Ogg encoding process, so they simply get chopped off.
When normalization is turned on, it is done inside the decoder. So the resulting output is still 16-bit integer, but the normalization (if the song is loud enough where it gets turned down, not up) gives the decoder headroom to accommodate those >0 dBFS peaks. Let’s say the normalization turns the song down by 6 dB—now those +2 dBFS peaks are represented as -4 dBFS, which is able to fit in the integer output, as well as clear the -1 dBFS limiter that Spotify puts on normalized output.
Comparing these two scenarios, it is evident that normalization off is chopping off those peaks generated from the encoding process, while normalization tends to let them through, so hopefully it can be seen why the normalized version has a greater apparent dynamic range as a result.
As an aside, one might wonder if having normalization off but using Spotify’s volume control would prevent this clipping. The answer is no: the volume control acts after the decoder’s output, so the data the volume control is working with has already been smashed to 16-bit integer. At least the volume control outputs in 24-bit...
Also, it’s worth considering how much resolution you lose when setting the normalization to quiet, considering that it’s done in 16-bit. If it turns down the level by 18 dB then Spotify’s output is effectively only 13 bits at best.
In my opinion, the way Spotify’s audio pipeline is set up is kind of dumb. The Ogg decoder and normalization should be outputting floating point, so at least any clipping can be avoided by turning down the volume control. It would also prevent the loss of bit resolution that currently occurs from normalization.