Lossless Audio Compression

amirm · Feb 14, 2016

Lossless Audio Compression
By Amir Majidimehr

This article is about the basics of lossless audio compression. Before we get into this topic, let’s first look at what uncompressed audio looks like. Here, we are talking about digital samples that are N number of bits wide, X number of channels, running at certain sampling frequency. For CD audio, each sample is 16 bits or two bytes, stereo (two channels), and 44,100 samples/sec. Multiply all of this and you get 1.4 Megabits per second. Converted to bytes by dividing by 8 we get 1.7 Kilobytes/sec. So a typical 3-minute song takes up about 32 Megabytes of storage without any form of compression. This may not seem like much but multiply this by three to get six channels and make it a movie soundtrack at 90 minutes and you require an entire DVD just for a single audio track! So some form of compression is handy to have.

Our first tool for dealing with such large amount of data is lossy compression. “Perceptual coding” is the technique used in lossy compression whereby we model the human ear and based on its characteristics are able to sharply reduce the file size with comparatively little loss of fidelity. For example, 128 kbps MP3/AAC/WMA represents an 11:1 compression (1.4/0.128). Inverted, only 9% of the data is kept and the rest is discarded! No matter how much you may be put off by the notion of lossy compression, you have to admire how well it operates given so little data.

I am hoping that you are already familiar with the concept of lossless compression of data. Just about any program you download from the Internet uses lossless compression to reduce its size and hence, speed delivery to your computer. The most common utility that performs this function is “zip.” How does it work? Simply put, instead of having every piece of data represented as fixed number of bits (e.g. 8 bits for each one of the characters in this article), less bits are allocated to most common values (e.g. vowels in the English language) than for less often used values (e.g. letter X in English). Since by definition there are more of the common values, representing them with fewer bits we gain significant level of compression. Actual algorithms are more complex than this but you get the idea.

Alas, zipping techniques do not work with audio. Indeed if you attempt to zip an audio file, its size may even grow rather than shrink! Reason is that audio at first blush is a highly incompressible type of content. The digital samples from a single instrument represent a complex waveform. Combine multiple instruments and vocals and you have something that appears to be a totally random set of numbers with little to no redundancy to remove.

Fortunately by applying some simple mathematics we can extract redundancy that is hidden in the samples. Take the following audio samples for example: 3, 7, 11, 15. If you feed this to the zip program, it sees that the numbers are all different and gives up compressing them. But if we look carefully, we see that each sample is made up of the previous sample plus 4. So instead of storing four numbers, we could simply store the first one and tell the decoder to keep adding 4 to each sample to get the next one in the sequence. In this sense, we would need only two numbers: the initial number “3” and the differential of “4.” The decoder can synthesize the rest of the numbers, giving us 2:1 compression ratio (four numbers becoming two).

The above is a trivial example of what we call “Linear Prediction Coding” or LPC for short. A pretty fancy signal processing term to be sure but fortunately for us, it has its roots in simple algebra. Linear means a set of samples that follow a line. Prediction means that we use the past samples to predict the future ones. You can see how I used both of these aspects in my above example. I assumed the samples were on a line and that the only thing that separated them was an offset. If you remember your college math, this is a form of “curve fitting.” I am trying to find a curve (a line in this scenario) that matches the sequence of numbers.

Of course I cheated in my example by assuming the decoder already knew the shape of the line and that the samples kept going that way forever. Real life is much more complex than that. Numbers may follow a line more or less but not precisely. In the above examples, samples could be 3, 9, 13, and 15. In this case, the lossless compressor still pretends that they line up perfectly. But it also keeps track of the “error” from the perfect line. In would still generate the initial value “3” and increment of “4” but it also has to transmit the error that would be generated by following the straight line. The “residual error” in this example would be “2” for the second sample, “0” for the third, and “-1” for the last.

The residual error values must be transmitted to the receiver efficiently. Fortunately a technique called “Rice coding” (honest, that is what is called !), is used to compress those error values efficiently. Reason is that mathematically we can show that error values have a favorable distribution and hence can be compressed efficiently.

Predicting the shape of the line (i.e. the formula that tells us the approximate value of the next sample) is more complex yet. Different lossless schemes use varying techniques for arriving at what formula best represents future samples. The encoder may try multiple permutations, analyzing each iteration to see if it was more or less efficient. This makes the encoder slower but fortunately, computers have gotten so fast these days that the computational complexity is not a major concern. And outside of live broadcast, we can encode once and be done with it.

But wait, there is more! We have another powerful tool to apply in cases where there is more than one channel of audio. Listen to a typical audio track and you notice the same frequencies often coming out of both speakers (1960s Beatles music excepted). A lossless encoder can divide the spectrum into two or more bands and isolate the mid and low frequency components that tend to be more shared between the two channels. After this division, it can apply different techniques to reduce the data rate such as subtracting the common signal from both channels.

Movie tracks provide even more opportunity for elimination of redundancy as the rear channels are often quiet without much sound in them. For this reason, lossless 5.1 channel codecs can achieve compression efficiencies that are much higher than stereo. For example, Dolby TrueHD used in Blu-ray Disc format when used in 6 channel, 16-bit/48Khz mode can achieve better than 3:1 compression whereas the best 2-channel lossless codec can rarely exceed 2:1 compression.

Of note, while there are a number of different lossless codecs, what separates them is only a single digit difference in compression efficiency. The best may shrink an audio file by 55% and the worst at 60%. Unfortunately that extra 5% may require more work on the part of the encoder making it slower. Again, in a PC this is not material as computers are plenty fast at either task. So your choice is more determined by the hardware or software that supports the specific codec and not which codec to use (unlike lossy codecs which do sound different from each other).

By the way, despite folklore on the Internet, lossless audio codecs do not change the sound as they are proven to mathematically reproduce the original data stream. Nothing is gained or lost. That said, playing a lossless track can sound different on a PC than the original. Why? Well, that is the topic for another article.

Last but not least, note that the data rate for a lossless audio (or video) codec is NEVER fixed. A lossy codec like MP3 or Dolby AC-3 can force the data rate to be fixed by varying quality. A lossless codec by definition cannot vary quality. So as a result, it has no choice but to let the data rate spike as it wants when it sees a complex waveform it cannot shrink. As a rule the spikes are never more than uncompressed stream as the codec can simply choose to pass the original data through, as opposed to trying to “compress” it and make the data set bigger instead (the problem of zip expanding the size is avoided). So if you are streaming audio around your home and are using lossless compression, you need to plan for the full data rate of the source even though you are benefitting from lossless compression in the actual amount of data transferred. For CD audio for example, this will always be 1.4 Mbits/sec.

antoniosarco · Oct 26, 2016

The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).

For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.

Generic compression formats, like 7z, rar, and zip, contains file of any kind, so they must rely on lossless compression to guarantee any input bit is extracted exactly the same with no loss of information. Lossless compression uses statistical models to map the input to a smaller output eliminating redundancy in the data.In this way the output carry exactly all the information featured by the input in less bytes, and can be expanded when needed to a 1:1 copy of the original data, which is a fundamental property for storing some types of data - i.e. a software, a database. For this reason lossless compression algorithms are used for archive file formats used in general purpose archive manager utilities, like 7Z, RAR, and ZIP, where an exact and reversible image of the original data must be saved.

Antonio

amirm · Oct 26, 2016

Hi Antonio. Welcome to the forum. Good to have someone join and have their first post be technical in nature!

Cosmik · Oct 26, 2016

amirm said:
“Perceptual coding” is the technique used in lossy compression whereby we model the human ear and based on its characteristics are able to sharply reduce the file size with comparatively little loss of fidelity. For example, 128 kbps MP3/AAC/WMA represents an 11:1 compression (1.4/0.128). Inverted, only 9% of the data is kept and the rest is discarded! No matter how much you may be put off by the notion of lossy compression, you have to admire how well it operates given so little data.

I think another element in this technique's effectiveness is that it not only models the human ear, but it also mimics the way that many sounds are constructed. It may throw away 90% of the data, but it is not throwing away 90% of the information. Straight PCM is not an efficient way of storing many audio signals, but Fourier coefficients are, regardless of how the ear works.

amirm · Oct 26, 2016

That was a "lay" statement I made. The analysis is done in frequency domain and a perceptual overlay is used to determine what needs to be kept. The transform used is determined based on trade off between time and frequency resolution. What is produced as a bit stream has nothing numerically in common with the PCM data that was fed to it.

Cosmik · Oct 26, 2016

amirm said:
That was a "lay" statement I made. The analysis is done in frequency domain and a perceptual overlay is used to determine what needs to be kept. The transform used is determined based on trade off between time and frequency resolution. What is produced as a bit stream has nothing numerically in common with the PCM data that was fed to it.

Not disagreeing, but I think that people miss an element in why lossy codecs are so effective. They are not relying only on discarding information i.e. 'fooling' the ear. They are literally reducing the amount of data needed to store the sound in all its detail. They are not literally lossless, but they can be a lot closer than the 90% reduction suggests.

fas42 · Oct 26, 2016

A perfect example of where massive levels of the right highly non-linear distortion can be close to transparent to the ear - yet somehow, the irregular appearance of the wrong version of such is considered to be irrelevant, and doesn't need to be measured ...

amirm · Oct 27, 2016

fas42 said:
A perfect example of where massive levels of the right highly non-linear distortion can be close to transparent to the ear - yet somehow, the irregular appearance of the wrong version of such is considered to be irrelevant, and doesn't need to be measured ...

Lossy compression is indeed an example of a distortion we don't try to measure. Listening tests do rule there. But they are done blind. As long as the same methodology is used instead of measurements, that is totally valid data.

tomelex · Oct 27, 2016

Nice teaching guys

Ricardus · Oct 15, 2023

Andysu said:
View attachment 318893

The what now???

Andysu · Oct 15, 2023

G|force · Oct 15, 2023

LoL I like the imagery of McReady and the scotch's microphone talking but where did that mesh with re-recording? Funny nonetheless.

Andysu · Oct 15, 2023

G|force said:
LoL I like the imagery of McReady and the scotch's microphone talking but where did that mesh with re-recording? Funny nonetheless.

it's all true , and i'm just about fed up with it , why Laserdisc is back on my buying list and 4k atmos re-tampering mixers are not even on the lower end of my audio food chain
Laserdisc forever

G|force · Oct 15, 2023

OT but my favorite Laserdisc is Pink Floyd in Pompeii, mono audio track but the live filming on location floats my boat.

Andysu · Oct 15, 2023

G|force said:
OT but my favorite Laserdisc is Pink Floyd in Pompeii, mono audio track but the live filming on location floats my boat.

rare , i see if other is available at sensible price , one of only few 70mm split surrounds , let me look now

Andysu · Oct 15, 2023

yeah it is , but not paying that

Lossless Audio Compression

amirm

Founder/Admin

antoniosarco

New Member

amirm

Founder/Admin

Cosmik

Major Contributor

amirm

Founder/Admin

Cosmik

Major Contributor

fas42

Major Contributor

amirm

Founder/Admin

tomelex

Addicted to Fun and Learning

Ricardus

Addicted to Fun and Learning

Andysu

Major Contributor

G|force

Senior Member

Andysu

Major Contributor

G|force

Senior Member

Andysu

Major Contributor

Andysu

Major Contributor

Similar threads