• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Spotify and loudness normalization - how can normalizaed track have higher DR than non-normalized

danadam

Addicted to Fun and Learning
Joined
Jan 20, 2017
Messages
993
Likes
1,543
So Spotify provide Specs. of their normalisation process?

Can you provide a reference?
I edited the post to clarify: And to the extent that I'm able to. I don't have access to the masters they are using so all I can compare is Spotify's client output with normalizaton on and off.

The link I already posted is the only thing I have. On that page they also say that they use ReplayGain to calculate loudness.
 

Wombat

Master Contributor
Joined
Nov 5, 2017
Messages
6,722
Likes
6,464
Location
Australia
I edited the post to clarify: And to the extent that I'm able to. I don't have access to the masters they are using so all I can compare is Spotify's client output with normalizaton on and off.

The link I already posted is the only thing I have. On that page they also say that they use ReplayGain to calculate loudness.

Is their normalisation just a plain and simple level adjustment or does it have compensatory psychoacoustic/other treatment?

Just a curious question, nothing more. I would like to know more about the audio processing that Streamers employ. The commercial temptation is to one-up the competition.
 
Last edited:

danadam

Addicted to Fun and Learning
Joined
Jan 20, 2017
Messages
993
Likes
1,543
Is their normalisation just a plain and simple level adjustment or does it have compensatory psychoacoustic/other treatment?

Just a curious question, nothing more.
I thought I already answered this question, but ok, let's try again :)

First, I assume that we don't count the limiter engaging at -1 dBFS as "compensatory psychoacoustic/other treatment".

With that, as far as I can tell, no, they don't do any compensatory psychoacoustic/other treatment. This is my conclusion from the following process applied to a few tracks:
  • Capture a digital output of Spotify client when playing a track with normalization disabled. Let's call the resulting track "normOff".
  • Capture a digital output of Spotify client when playing the track with normalization enabled. Let's call the resulting track "normOn".
  • Align those tracks.
  • Find a suitable sample in those tracks to calculate the level difference between them. Suitable means:
    • not too quiet, in order to have good precision,
    • and not too loud, in order to avoid samples that could be affected by clipping in "normOff" or by limiting in "normOn".
  • Calculate the level difference and simply reduce the volume of the louder track by that amount using SoX and its "vol" command:
    Code:
    sox input.wav output.wav vol 0.xxx
  • Generate a null of the tracks using SoX:
    Code:
    sox -m -v 1 input1.wav -v -1 input2.wav null.wav
  • Check the level of the null and generate its spectrogram, again using SoX:
    Code:
    sox null.wav -n stats
    sox null.wav -n spectrogram -X 4 -y 513 -o null.png
From that process, if "normOff" was louder than "normOn", the null is mostly perfect*, sometimes with a more or less occasional bleep indicating the clipping in "normOff" discussed earlier. Btw, Spotify actaully recommends to master tracks below -1 dBTP (True Peak) to avoid those bleeps.

On the other hand, if "normOn" was louder than "normOff", the null is sometimes perfect and sometimes not. When it is not, the difference is only in parts where the track is loud, indicating the use of the limiter in "normOn". And usually changing the normalization level to "Quiet" and repeating the process results in perfect null.

*) perfect null means RMS below -90 dBFS, peak below -80 dBFS. The output of the client is 16 bit, so those values indicate only a difference in dither that was applied.

I would like to know more about the audio processing that Streamers employ. The commercial temptation is to one-up the competition.
In my case, if they want to catch-up to the competition, they should use album normalization everywhere and not use limiter, just play the album that much quieter so it does not clip.
 
Last edited:
OP
M

Music1969

Major Contributor
Joined
Feb 19, 2018
Messages
4,674
Likes
2,849
So in this case, if you are bothered by the limiting but still want to have normalization, you should change the normalization level to "Quiet".

I've set to "Quiet" permanently now.

I think there's enough analysis now to say it's the only Spotify option that both preserves dynamic range and avoids limiting?
 
Last edited:

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,240
Location
Manchester UK
Keep in mind that the TT DR measure is a very rough guide to the dynamic range in a track. It calculates the RMS values of short sections using a sliding window and then reports the value of the track’s peak divided by the RMS value representing 80% of the maximum. It’s really just looking at the crest factor rather than the full dynamic range.

If you look at the numbers from the article, it appears that Spotify’s manipulation is effectively expanding the crest by boosting the peaks as shown by @danadam. The Dua Lipa track goes from -6.8 LUFS in the original to -14.1LUFS, a difference of 7.3LUFS. If they were simply reducing the gain you’d expect the peak to reduce by roughly 7.3dB to match this, but it’s only 5.4dB lower. Since the original track is clipped (true peak > 0) this is probably the result of an attempt to reconstruct the waveform in the clipped sections.

If you’re really interested in the dynamic range of a track, then the EBU Loudness Range is a better guide.
 

Feelas

Senior Member
Joined
Nov 20, 2020
Messages
390
Likes
316
You could theoretically squeeze the silent parts to be even more silent and louder parts to be louder, google up expanders - this is just the inverse idea of compression. Some music might be either mastered before the theoretical limits, and Spotify might just expand it so it hits expected peaks. Actually, expanding the dynamics could be a good idea if we're talking i.e. Europe - the DAPs have software-enforced limit to have not enough power to hit >85dB SPL, at least in theory it should (disregarding the efficiency differences), so squeezing the last bit of digital dynamics would make sense, since after some region you can't just "turn it up" by hand, since the devices seems to have no more juice, even if it does.

Still, why would one assume that "normalization" must only be "dynamic compression"? Expanding is also a known process in pro audio. I know that the bias is to disregard the processing, but seriously, there's more to processing than merely killing the music... And actually, underprocessed albums of the 70s are just as annoying as overtly compressed - sometimes even to the edge of not having necessary SNR to keep the noise down.
 

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,240
Location
Manchester UK
Someone on Gearslutz did a good investigation.

More evidence of clipping with normalization OFF.

More evidence turning normalization ON is a good idea.

https://www.gearslutz.com/board/showpost.php?p=15248911&postcount=34
The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.
 
OP
M

Music1969

Major Contributor
Joined
Feb 19, 2018
Messages
4,674
Likes
2,849
The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.

Qobuz version also clips :oops:
 

hyperplanar

Senior Member
Joined
Jan 31, 2020
Messages
301
Likes
581
Location
Los Angeles
I did a bit of digging on Spotify’s output before with DigiCheck and the TotalMix level meters on my Babyface Pro and am pretty sure I have the explanation for what was seen in the OP.

So, as per usual, the source files submitted to Spotify are 16/44.1 WAVs, lossless and equivalent to CD quality. A large amount of modern music is boosted in level and limited in the mastering stage to -0.5, -0.3, -0.1, or even -0.0 dBFS, then exported to WAV. This is all jolly and well, until lossy encoding takes place.

Spotify takes these files and encodes them into Ogg Vorbis. In the process, the waveform of the files gets altered slightly. Now those peaks that were previously at -0.5 dBFS or whatever in the WAV, could now be +2.0 dBFS in the Ogg copy.

This is not a problem in itself, if Spotify’s Ogg decoder were to format its output in floating point. Any digital volume control that comes after Spotify, before the output is converted into integer, could reduce the gain, and those +2 dBFS peaks wouldn’t get chopped off.

However, the Ogg decoder in Spotify is set to output in 16-bit, and in integer format at all times. This means that when normalization is turned off, there is simply no headroom to accommodate for any peaks above 0 dBFS that arose from the Ogg encoding process, so they simply get chopped off.

When normalization is turned on, it is done inside the decoder. So the resulting output is still 16-bit integer, but the normalization (if the song is loud enough where it gets turned down, not up) gives the decoder headroom to accommodate those >0 dBFS peaks. Let’s say the normalization turns the song down by 6 dB—now those +2 dBFS peaks are represented as -4 dBFS, which is able to fit in the integer output, as well as clear the -1 dBFS limiter that Spotify puts on normalized output.

Comparing these two scenarios, it is evident that normalization off is chopping off those peaks generated from the encoding process, while normalization tends to let them through, so hopefully it can be seen why the normalized version has a greater apparent dynamic range as a result.

As an aside, one might wonder if having normalization off but using Spotify’s volume control would prevent this clipping. The answer is no: the volume control acts after the decoder’s output, so the data the volume control is working with has already been smashed to 16-bit integer. At least the volume control outputs in 24-bit...

Also, it’s worth considering how much resolution you lose when setting the normalization to quiet, considering that it’s done in 16-bit. If it turns down the level by 18 dB then Spotify’s output is effectively only 13 bits at best.

In my opinion, the way Spotify’s audio pipeline is set up is kind of dumb. The Ogg decoder and normalization should be outputting floating point, so at least any clipping can be avoided by turning down the volume control. It would also prevent the loss of bit resolution that currently occurs from normalization.
 
Last edited:

bennetng

Major Contributor
Joined
Nov 15, 2017
Messages
1,634
Likes
1,693
^ The very reason I mentioned the foobar Spotify plugin. I myself don't use Spotify so I haven't tried it.
https://hydrogenaud.io/index.php?topic=119972

More details:
https://archimago.blogspot.com/2019/06/guest-post-why-we-should-use-software.html
Capture.PNG
 

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,240
Location
Manchester UK
Qobuz version also clips :oops:
Unfortunately a lot of the overpaid mastering engineers working in the industry are simply incompetent. There are lots of ways to goose the crest factor using EQ and multi-band compression, but that is fiddly, takes time, and involves a degree of trial-and-error. So they just slap on a hard limiter and crank the threshold ... then claim this is what the customer wants and they’re blameless.

As @hyperplanar notes, the encoding process inherently attempts to reconstruct the true peaks that have been clipped off (this true of any frequency-domain encoder). And it may be that this is all they’re doing, rather than relying on anything more sophisticated.
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
4,297
Likes
2,765
Location
Salvador-Bahia-Brasil
The original has been clipped in mastering. Spotify is trying to reconstruct the unclipped waveform, but the original data is irretrievably lost (unless they somehow have access to the stems used for the mix). Algorithms for waveform reconstruction are inherently imperfect, but since the distortion caused by clipping is often blatantly obvious reconstruction may produce a more pleasing result.

sorry, but where did you see that in the PDF?
 

charleski

Major Contributor
Joined
Dec 15, 2019
Messages
1,098
Likes
2,240
Location
Manchester UK
I mean the waveform reconstruction
As I explained in my first post, simply turning down the gain will scale both the integrated loudness value and the peak by the same amount. This is not the case, therefore the waveform has been reconstructed. But hyperplanar is right in pointing out that this may simply be a side-effect of the encoding process, you can see the same thing happening when encoding to aac or mp3.
 

hyperplanar

Senior Member
Joined
Jan 31, 2020
Messages
301
Likes
581
Location
Los Angeles
But hyperplanar is right in pointing out that this may simply be a side-effect of the encoding process, you can see the same thing happening when encoding to aac or mp3
I can confirm this is indeed the case, the only signal processing that occurs when normalization is enabled is a reduction in gain and the limiter (sounds like a compressor with instant attack and a slow release) that only touches the signal at -1 dBFS. The difference in apparent dynamic range is simply due to Spotify’s output clipping when normalization is off (which sounds quite bad and is maybe why some people think Spotify has bad sound quality).

None of this would be an issue if Spotify would just do floating point output. It’s a much easier ask than getting every single person and label submitting stuff to limit their files to -2 dBTP instead.
 
Top Bottom