Is audio file compression just decreasing dynamic range?

Soria Moria · Nov 23, 2022

I've been wanting to learn how audio compression specifically decreases audio quality. Every article I can find is just an audio compression 101 over-simplified piece that tells you it discards unimportant data, which is vague. From personally listening to compressed audio I've come to the conclusion that it just decreases the dynamic range, is that true? If you've ever played the game Skyrim, you might have noticed that the audio bitrate in it is very low at 48kbit. When people are doing nothing but speaking, I barely hear a difference and it sounds okay, but when music is playing and gets loud it's extremely noticeable. Example video:

Have I understood the effects of audio compression or is there more to it?
Thanks.

somebodyelse · Nov 23, 2022

Confusingly the same word is used to cover two entirely different things. One is to reduce dynamic range as in the compressor/limiter effect commonly used in music production - arguably overused. The other is to reduce the file size or bit rate needed to store the data (audio or video) which should not alter the dynamic range. Within that group you have lossless and lossy compression. With lossless you get out exactly what you put in - the compression is achieved by a more efficient representation of the same data. Lossy audio compression exploits knowledge of what we can hear and what we can't to reduce the amount of information needed to reproduce something that sounds close to the original, for some definition of close. Precisely how you do this is heavy on the mathematics and deeply unintuitive for most people - maybe someone knows a good explanation of the Fourier Transform?

One example of the 'what we can hear' part would be that if you have a loud sound and a quiet sound at a similar frequency, beyond a certain difference in volume we only hear the louder one. because of that we don't need to reproduce the quieter one, Another would be that for low frequencies most of the time the left and right signals are very similar, and we can't tell the direction very easily anyway. Rather than store left and right separately we can just store one side, and maybe the much smaller difference between left and right.

Soria Moria · Nov 23, 2022

somebodyelse said:
Confusingly the same word is used to cover two entirely different things. One is to reduce dynamic range as in the compressor/limiter effect commonly used in music production - arguably overused. The other is to reduce the file size or bit rate needed to store the data (audio or video) which should not alter the dynamic range. Within that group you have lossless and lossy compression. With lossless you get out exactly what you put in - the compression is achieved by a more efficient representation of the same data. Lossy audio compression exploits knowledge of what we can hear and what we can't to reduce the amount of information needed to reproduce something that sounds close to the original, for some definition of close. Precisely how you do this is heavy on the mathematics and deeply unintuitive for most people - maybe someone knows a good explanation of the Fourier Transform?

One example of the 'what we can hear' part would be that if you have a loud sound and a quiet sound at a similar frequency, beyond a certain difference in volume we only hear the louder one. because of that we don't need to reproduce the quieter one, Another would be that for low frequencies most of the time the left and right signals are very similar, and we can't tell the direction very easily anyway. Rather than store left and right separately we can just store one side, and maybe the much smaller difference between left and right.

Very helpful. Thanks!

tmtomh · Nov 24, 2022

@somebodyelse explained it really well - great post!

I would only add one point, and it's more of a footnote. Lossy file-size compression, for example mp3 or AAC, can technically have a very small impact on dynamic range. If the original digital source is highly compressed, having been subject to a brickwall limiter to create a "buzzcut" waveform with tons of peaks at exactly the same level, then lossy compression to mp3, AAC, and so on can somewhat randomize the levels of those peaks (because lossy compression, unlike lossless compression, alters the original data).

If you run a lossless original of such a source through a tool like the DR Meter, and then run an mp3 made from that original through the DR Meter, you are likely to find that the mp3 version actually shows as having slightly greater dynamic range than the original on one or more tracks, usually 1dB more.

This is just an artifact of the lossy compression process, and I very much doubt that the minimal increased dynamic range based on slightly altered peak levels is actually audible in most cases. But I've always found it an interesting little side-effect of lossy file-size compression when applied to dynamically compressed masterings.

MaxwellsEq · Nov 24, 2022

Technology that changes dynamic range based on non-linear tube designs, predate modern computers. These are known as compressors (in special cases limiters). This type of compression makes quieter sounds louder and is useful when the dynamic range is too large for "normal" consumption, e.g. broadcasters use it so music is comprehensible to people driving.

You can Zip a spreadsheet to make it smaller (e.g. to send it over a slow speed connection). When you un-Zip it, it must be identical to prevent errors. In computing this is known as "lossless file compression". You could do this to an audio WAV file to reduce its size a bit. For audio, a better lossless compression tool is FLAC because it's designed around the sort of patterns audio has in it.

To make the file or stream even smaller, you have to throw stuff away. This would be a disaster for the spreadsheet! This is "lossy media compression". It's used in video and audio.

What do you throw away? It's actually very complicated and based on fooling your brain into believing nothing is missing. For example, the sound is broken into frequency bands. If a certain frequency is loud it masks nearby frequencies as far as your brain is concerned so you can get rid of them.

AnalogSteph · Nov 25, 2022

The way in which (many) audio data reduction algorithms operate has one slightly surprising side effect: They can cover a total dynamic range that's almost arbitrarily large. The algorithms will encode a signal at -100 dBFS just as diligently as the same at -20 dBFS. Now obviously, if there is one at -20 dBFS and another at -100 dBFS at the same time, it won't take much for the much weaker signal to be disregarded, but you bet that in quiet sections the algorithm will try its hardest to encode the noise floor in a faithful manner.

This means that you can basically throw 24-bit or even 32-bit samples at an MP3, AAC or ATRAC encoder, and total dynamic range will be limited only by the decoder's computational accuracy and output data format. Versus straight 16 bit input, compressed data rates will actually tend to reduce slightly if VBR, as white noise is not easy to encode. By contrast, 24-bit FLACs tend to be about 50% larger.

Is audio file compression just decreasing dynamic range?

Soria Moria

Addicted to Fun and Learning

somebodyelse

Master Contributor

Soria Moria

Addicted to Fun and Learning

tmtomh

Major Contributor

MaxwellsEq

Major Contributor

AnalogSteph

Major Contributor

Similar threads