Spotify and loudness normalization - how can normalizaed track have higher DR than non-normalized

danadam · Feb 23, 2021

Wombat said:
So Spotify provide Specs. of their normalisation process?

Can you provide a reference?

I edited the post to clarify: And to the extent that I'm able to. I don't have access to the masters they are using so all I can compare is Spotify's client output with normalizaton on and off.

The link I already posted is the only thing I have. On that page they also say that they use ReplayGain to calculate loudness.

Sukie · Feb 23, 2021

Music1969 said:
Yep sorry, I forgot this was a (my) Spotify thread for a moment LOL.

Ha - funny.

Interestingly enough, Spotify claim that they can't use Album normalisation on playlists. Surely that's a software issue for them as clearly Tidal can do this, as can Roon.

Wombat · Feb 23, 2021

danadam said:
I edited the post to clarify: And to the extent that I'm able to. I don't have access to the masters they are using so all I can compare is Spotify's client output with normalizaton on and off.

The link I already posted is the only thing I have. On that page they also say that they use ReplayGain to calculate loudness.

Is their normalisation just a plain and simple level adjustment or does it have compensatory psychoacoustic/other treatment?

Just a curious question, nothing more. I would like to know more about the audio processing that Streamers employ. The commercial temptation is to one-up the competition.

danadam · Feb 23, 2021

Wombat said:
Is their normalisation just a plain and simple level adjustment or does it have compensatory psychoacoustic/other treatment?

Just a curious question, nothing more.

I thought I already answered this question, but ok, let's try again

First, I assume that we don't count the limiter engaging at -1 dBFS as "compensatory psychoacoustic/other treatment".

With that, as far as I can tell, no, they don't do any compensatory psychoacoustic/other treatment. This is my conclusion from the following process applied to a few tracks:

Capture a digital output of Spotify client when playing a track with normalization disabled. Let's call the resulting track "normOff".
Capture a digital output of Spotify client when playing the track with normalization enabled. Let's call the resulting track "normOn".
Align those tracks.
Find a suitable sample in those tracks to calculate the level difference between them. Suitable means:
- not too quiet, in order to have good precision,
- and not too loud, in order to avoid samples that could be affected by clipping in "normOff" or by limiting in "normOn".
Calculate the level difference and simply reduce the volume of the louder track by that amount using SoX and its "vol" command:
Code:
```
sox input.wav output.wav vol 0.xxx
```

Generate a null of the tracks using SoX:

Code:

sox -m -v 1 input1.wav -v -1 input2.wav null.wav

Check the level of the null and generate its spectrogram, again using SoX:
Code:
```
sox null.wav -n stats
sox null.wav -n spectrogram -X 4 -y 513 -o null.png
```

From that process, if "normOff" was louder than "normOn", the null is mostly perfect*, sometimes with a more or less occasional bleep indicating the clipping in "normOff" discussed earlier. Btw, Spotify actaully recommends to master tracks below -1 dBTP (True Peak) to avoid those bleeps.

On the other hand, if "normOn" was louder than "normOff", the null is sometimes perfect and sometimes not. When it is not, the difference is only in parts where the track is loud, indicating the use of the limiter in "normOn". And usually changing the normalization level to "Quiet" and repeating the process results in perfect null.

*) perfect null means RMS below -90 dBFS, peak below -80 dBFS. The output of the client is 16 bit, so those values indicate only a difference in dither that was applied.

Wombat said:
I would like to know more about the audio processing that Streamers employ. The commercial temptation is to one-up the competition.

In my case, if they want to catch-up to the competition, they should use album normalization everywhere and not use limiter, just play the album that much quieter so it does not clip.

Music1969 · Feb 23, 2021

danadam said:
So in this case, if you are bothered by the limiting but still want to have normalization, you should change the normalization level to "Quiet".

I've set to "Quiet" permanently now.

I think there's enough analysis now to say it's the only Spotify option that both preserves dynamic range and avoids limiting?

bennetng · Feb 24, 2021

Did anyone try this? Looks like one can get rid of what the Spotify client does by using foobar as the format decoder and player.
https://hydrogenaud.io/index.php?topic=119972

charleski · Feb 24, 2021

Keep in mind that the TT DR measure is a very rough guide to the dynamic range in a track. It calculates the RMS values of short sections using a sliding window and then reports the value of the track’s peak divided by the RMS value representing 80% of the maximum. It’s really just looking at the crest factor rather than the full dynamic range.

If you look at the numbers from the article, it appears that Spotify’s manipulation is effectively expanding the crest by boosting the peaks as shown by @danadam. The Dua Lipa track goes from -6.8 LUFS in the original to -14.1LUFS, a difference of 7.3LUFS. If they were simply reducing the gain you’d expect the peak to reduce by roughly 7.3dB to match this, but it’s only 5.4dB lower. Since the original track is clipped (true peak > 0) this is probably the result of an attempt to reconstruct the waveform in the clipped sections.

If you’re really interested in the dynamic range of a track, then the EBU Loudness Range is a better guide.

Feelas · Feb 24, 2021

You could theoretically squeeze the silent parts to be even more silent and louder parts to be louder, google up expanders - this is just the inverse idea of compression. Some music might be either mastered before the theoretical limits, and Spotify might just expand it so it hits expected peaks. Actually, expanding the dynamics could be a good idea if we're talking i.e. Europe - the DAPs have software-enforced limit to have not enough power to hit >85dB SPL, at least in theory it should (disregarding the efficiency differences), so squeezing the last bit of digital dynamics would make sense, since after some region you can't just "turn it up" by hand, since the devices seems to have no more juice, even if it does.

Still, why would one assume that "normalization" must only be "dynamic compression"? Expanding is also a known process in pro audio. I know that the bias is to disregard the processing, but seriously, there's more to processing than merely killing the music... And actually, underprocessed albums of the 70s are just as annoying as overtly compressed - sometimes even to the edge of not having necessary SNR to keep the noise down.

Music1969 · Feb 24, 2021

Someone on Gearslutz did a good investigation.

More evidence of clipping with normalization OFF.

More evidence turning normalization ON is a good idea.

https://www.gearslutz.com/board/showpost.php?p=15248911&postcount=34

charleski · Feb 24, 2021

Music1969 said:
Someone on Gearslutz did a good investigation.

More evidence of clipping with normalization OFF.

More evidence turning normalization ON is a good idea.

https://www.gearslutz.com/board/showpost.php?p=15248911&postcount=34

The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.

Music1969 · Feb 24, 2021

charleski said:
The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.

Qobuz version also clips

hyperplanar · Feb 24, 2021

I did a bit of digging on Spotify’s output before with DigiCheck and the TotalMix level meters on my Babyface Pro and am pretty sure I have the explanation for what was seen in the OP.

So, as per usual, the source files submitted to Spotify are 16/44.1 WAVs, lossless and equivalent to CD quality. A large amount of modern music is boosted in level and limited in the mastering stage to -0.5, -0.3, -0.1, or even -0.0 dBFS, then exported to WAV. This is all jolly and well, until lossy encoding takes place.

Spotify takes these files and encodes them into Ogg Vorbis. In the process, the waveform of the files gets altered slightly. Now those peaks that were previously at -0.5 dBFS or whatever in the WAV, could now be +2.0 dBFS in the Ogg copy.

This is not a problem in itself, if Spotify’s Ogg decoder were to format its output in floating point. Any digital volume control that comes after Spotify, before the output is converted into integer, could reduce the gain, and those +2 dBFS peaks wouldn’t get chopped off.

However, the Ogg decoder in Spotify is set to output in 16-bit, and in integer format at all times. This means that when normalization is turned off, there is simply no headroom to accommodate for any peaks above 0 dBFS that arose from the Ogg encoding process, so they simply get chopped off.

When normalization is turned on, it is done inside the decoder. So the resulting output is still 16-bit integer, but the normalization (if the song is loud enough where it gets turned down, not up) gives the decoder headroom to accommodate those >0 dBFS peaks. Let’s say the normalization turns the song down by 6 dB—now those +2 dBFS peaks are represented as -4 dBFS, which is able to fit in the integer output, as well as clear the -1 dBFS limiter that Spotify puts on normalized output.

Comparing these two scenarios, it is evident that normalization off is chopping off those peaks generated from the encoding process, while normalization tends to let them through, so hopefully it can be seen why the normalized version has a greater apparent dynamic range as a result.

As an aside, one might wonder if having normalization off but using Spotify’s volume control would prevent this clipping. The answer is no: the volume control acts after the decoder’s output, so the data the volume control is working with has already been smashed to 16-bit integer. At least the volume control outputs in 24-bit...

Also, it’s worth considering how much resolution you lose when setting the normalization to quiet, considering that it’s done in 16-bit. If it turns down the level by 18 dB then Spotify’s output is effectively only 13 bits at best.

In my opinion, the way Spotify’s audio pipeline is set up is kind of dumb. The Ogg decoder and normalization should be outputting floating point, so at least any clipping can be avoided by turning down the volume control. It would also prevent the loss of bit resolution that currently occurs from normalization.

bennetng · Feb 24, 2021

^ The very reason I mentioned the foobar Spotify plugin. I myself don't use Spotify so I haven't tried it.
https://hydrogenaud.io/index.php?topic=119972

More details:
https://archimago.blogspot.com/2019/06/guest-post-why-we-should-use-software.html

charleski · Feb 24, 2021

Music1969 said:
Qobuz version also clips

Unfortunately a lot of the overpaid mastering engineers working in the industry are simply incompetent. There are lots of ways to goose the crest factor using EQ and multi-band compression, but that is fiddly, takes time, and involves a degree of trial-and-error. So they just slap on a hard limiter and crank the threshold ... then claim this is what the customer wants and they’re blameless.

As @hyperplanar notes, the encoding process inherently attempts to reconstruct the true peaks that have been clipped off (this true of any frequency-domain encoder). And it may be that this is all they’re doing, rather than relying on anything more sophisticated.

dasdoing · Feb 24, 2021

charleski said:
The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.

sorry, but where did you see that in the PDF?

charleski · Feb 24, 2021

dasdoing said:
sorry, but where did you see that in the PDF?

True peak > 0 = clipped waveform

dasdoing · Feb 24, 2021

charleski said:
True peak > 0 = clipped waveform

I mean the waveform reconstruction

charleski · Feb 24, 2021

dasdoing said:
I mean the waveform reconstruction

As I explained in my first post, simply turning down the gain will scale both the integrated loudness value and the peak by the same amount. This is not the case, therefore the waveform has been reconstructed. But hyperplanar is right in pointing out that this may simply be a side-effect of the encoding process, you can see the same thing happening when encoding to aac or mp3.

dasdoing · Feb 24, 2021

yes, encoding will often create higher peaks, that's why the master should have headroom.

hyperplanar · Feb 24, 2021

charleski said:
But hyperplanar is right in pointing out that this may simply be a side-effect of the encoding process, you can see the same thing happening when encoding to aac or mp3

I can confirm this is indeed the case, the only signal processing that occurs when normalization is enabled is a reduction in gain and the limiter (sounds like a compressor with instant attack and a slow release) that only touches the signal at -1 dBFS. The difference in apparent dynamic range is simply due to Spotify’s output clipping when normalization is off (which sounds quite bad and is maybe why some people think Spotify has bad sound quality).

None of this would be an issue if Spotify would just do floating point output. It’s a much easier ask than getting every single person and label submitting stuff to limit their files to -2 dBTP instead.

Spotify and loudness normalization - how can normalizaed track have higher DR than non-normalized

Major Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Senior Member

Major Contributor

Major Contributor

Major Contributor

Senior Member

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Senior Member

Similar threads