Bits and Bytes of Audio/Video Formats

amirm · Mar 4, 2016

This is an article I wrote in 2011. It is basic information but very important to know as it is at the heart of digital audio and video.

Understanding Audio/Video Rates
By Amir Majidimehr

Get ready to learn the most fundamental concept in audio/video which sadly, most people don’t understand or get badly wrong. I can’t blame them as properly understanding them requires pretty deep understanding of how these content streams are captured, encoded and transmitted. This is useful knowledge to have as you would want to know how many songs or movies you can store on your 16 Gigabyte tablet or 4 Terabyte video server. Turns out that with some simple math and basic concepts we can learn everything we need here.

Let’s start at the top. My assumption in this article is that most of you know that there are eight bits in one byte. Okay, if you didn’t, don’t feel bad, as even the people who are supposed to know such as magazine writers and technical people in the field often confuse bits and bytes. As you can imagine, with almost an order of magnitude difference between these two terms, it is super important to get them right.

To avoid the above confusion I rarely abbreviate “b” for bit and “B” for byte, as is often done, and instead will spell them out as bits and Bytes. Since we usually deal with larger numbers of these, the prefixes Kilo, Mega, and Giga are added to represent thousands, millions, and billions, respectively.

Since these are computer concepts, the units here are not decimal but rather, binary. This means that “kilo” is actually 1024, not 1000. Throughout this article I will be using the familiar decimal versions of these numbers. The world won’t come to an end if we are off by 2.4 percent.

Let’s dive right in and talk about audio. To record audio on CDs, we have to convert the analog audio to digital. The “sampling rate” is 44.1 KHz, which means that we take a digitize the analog value 44,100 times per second (frequency is measured in Hz, which means one “cycle” per second—in this case, one sample per second). At the risk of sounding pedantic, CD is stereo which is comprised two independent channels. Each audio sample in turn has 16 bits of resolution, or two bytes.

If we multiply 44.1 KHz (the sampling rate) by two (number of channels), and then by 16 (bits of resolution), we get the data rate of CD in bits per second. The result is 1,411 Kbits/sec (“kbps”) or roughly 1.4 Mbit/sec (Mbps). Converting this to bytes, we get about 176 KBytes/sec.

When a CD drive is used for data there are other overheads leading the industry to use 150 Kbytes/sec as the reference baseline speed for a CD drive. This has become a marketing spec for optical media where the speed of the drive is specified by a number followed by “X” with X representing this 150 Kbytes/sec. So if you see a “20X” drive, it means that it can read at 20 * 150, or 3 MBytes/sec. I said this was a marketing spec because in reality the drive speed is variable depending on the track being read so average speed across the entire media (e.g. when you rip a CD) is lower than this number.

So far we are in the traditional A/V realm. Let’s jump into the new era. Here, we are talking about things like “128 kbps MP3.” This audio stream is exactly what it says it is. That is, the music file is represented as 128 Kbit/sec, compressed in MP3 format. This compares to our original CD source at 1.4 Mbit/sec.

To figure out how much the file is compressed, we simply divide the MP3’s data rate (128 Kbits/sec or 0.128 Mbits/s) by the CD’s data rate (1.4 Mbps) and arrive at 0.09. In other words, the MP3 file represents only 9 percent of the data of the original source. Perhaps more interesting is the inverse ratio, which tells us that a whopping 91 percent of the original information has been thrown out and you are hearing what is left! While as audiophiles we want our music to be of higher fidelity than 128 kbps MP3, it is remarkable how much quality is preserved relative to so little bits contained in that file.

Keep in mind that sampling rate and bit rates of compressed files are two entirely different things. A 128 Kbits/sec MP3 has the same sampling rate as the uncompressed music. Same is true of 256 Kbit/sec MP3 and 384 Kbits/sec. They are compressed versions of the same 44.1 KHz audio stream. So don’t make the common mistake of talking about the bit rate of the file as sampling rate.

Back to our original uncompressed CD, if we multiply its 176 Kbytes/sec data rate by 3,600 (seconds in one hour), we get the total space consumed for one hour of music, which is 630 MBytes (rounded to 650 MBytes to include overhead).

Now let’s apply the same math to the MP3. The 0.128 Mbit/sec must be divided by eight to convert it to bytes and then multiplied by 3,600 to get the same capacity requirement. This adds up to 57.6 MBytes/hour, showing the remarkable saving in storage capacity when using “lossy” audio compression. This is the reason that solid-state “flash”-based music players can hold so much compressed music. For a typical three-minute song, a 128 kbps MP3 would take up 2.88 MBytes of space. So if you have a 4-Gigabyte flash memory player, it can hold 4,000 / 2.88 or 1,388 songs. The same player would only hold 125 songs in the original uncompressed format that is on a CD.

Let’s put this in the context of audio for DVD and Blu-ray Disc. Here, we are using “5.1” channels of content to create a surround experience. The notiation means we have five full frequency channels, and a sixth low bandwidth channel indicated by the “.1.” Note that we don’t really have 5.1 channels as a mathematical figure because the low-frequency channel does not equate10 percent of the full bandwidth channel. But for the sake of simplifying our life, let’s pretend that it does use 10 percent as many bits to do its job and use the number 5.1 just like we used 2.0 for stereo computations.

For sampling rate of surround music, the standard in the industry is to deliver 48 KHz as opposed to 44.1 used in CDs. The sample resolution can be 16, 20, or 24 bits.

To arrive at the data rate of the uncompressed 5.1 source, and assuming 20-bit samples, we just need to multiply all of this together as we did with the CD: 5.1 (channels) * 48 (sampling rate) * 20 (bits of resolution) = 4.9 Mbits/sec. To figure it out for 16 and 24 bits, simply swap out the 20 for those numbers.

If we take the uncompressed data rate of 4.9 Mbit/sec, divide it by 8 to get bytes/sec, and then multiply by 3,600, we get a capacity requirement of 2.2 GBytes/hour. A two-hour movie would then need 4.4 Gigabytes just for the audio, or nearly half the capacity of a standard DVD!

Movies on DVD therefor are compressed using Dolby® Digital (AC-3) compression at typical data rate of 448 kbps. So let’s compute the compression ratio as we did with MP3. We simply repeat the same math by dividing 0.448 (data rate of Dolby Digital) by 4.9 (data rate of the uncompressed audio) and get 0.9. So as in case of MP3, quite a bit—91 percent—is thrown out in the process of compressing the multichannel audio. DTS® Digital Surround™, in contrast, at 1.5 Mbit/sec, would represent 30 percent of the original, or a very mild 3:1 compression ratio (although “half-rate” DTS at 750 Kbit/sec is also applying a reasonable amount of compression at 6:1).

Let’s put things in perspective in a different way. If we divide 448 Kbit/sec by 5.1 channels, we get 88 Kbits/sec allocated to each channel on the average. If we had a stereo track at the same rate, it would be at 2 (channels) * 88 (data rate), or 176 Kbits/sec. So, this Dolby Digital encoding has a 50 percent higher data rate than the 128 Kbits/sec MP3 example from before. While the compression techniques are different between the two formats, one can still see that the 448 kbps Dolby Digital is able to “breathe more,” as far as data rate is concerned, compared to 128 Kbps MP3. So in theory, this encoding is more transparent to the source.

When it comes to Blu-ray disc, we have a third option: lossless encoding. This is a process by which the audio data rate is reduced but the full fidelity is still maintained. Think of it as compressing your files on your computer and how you can get them back intact after decompression. Dolby TrueHD and DTS-HD™ Master Audio are both lossless surround audio formats supported optionally in Blu-ray Disc.

The price we pay here is that lossless compression is far less efficiency than lossy techniques like MP3. Typical compression ratios are about 2:1 for music, reaching up to 3:1 for multichannel movie sound. The efficiency becomes higher with more channels and (non-intuitively) lower at higher bit resolutions (e.g. 24 bits compared to 16 bits). Using a rough figure of a 2.5:1, we save a whopping 2.6 GBytes of space from our two-hour movie.

Okay, so maybe you are not interested in audio and want to instead learn about video. Like the CD example we first need to understand how the source is captured and encoded.

The resolution of broadcast quality standard-definition (SD) video which is also used for DVD calls for 720 horizontal pixels by 486 vertical pixels, which is often rounded to 720 x 480. Multiply the two numbers and we arrive 345,600 pixels in each frame of video. Converting this to millions and rounding, we get 0.3 million pixels per frame. Yes, what you are thinking is right. That the resolution of your 6 megapixel camera in your phone is 20 times higher than SD video! But wait, you are watching that DVD image with the same resolution on a 50-inch display or a 8-foot projection screen? Hmmm.

Movies are recorded at 24 frames per second. So in every second we have 24 * 345,600 = 8,294,400 pixels. What does that mean in terms of bits and bytes? The trick is to know the sample resolution in bits. Computer users are comfortable with the concept of RGB color pixels used in PCs. Here, we have 24 bits for each pixel, with 8 bits allocated to Red, Green, and Blue sub-pixels. But that is not how we capture and transmit video.

By taking advantage of the fact that your eye is less sensitive to color resolution than to black and white, we represent pixels as a pair of data representing black and white component called “Luminance” and color called “Chrominance.”. These are often abbreviated to Luma and Chroma. We still use a triplet notation like RGB, but here one is the Luma and the other two represent the Chroma. As a way of example, if you see a notation of 4:4:4, it means that color and black and white samples have the same resolution as they do in RGB. It may sound strange to you, but this is not a very common format, even in broadcast world. The most often used format is 4:2:2, which means that we use half as much bandwidth for color. The format used for distribution of content to consumers—whether it is over digital broadcast, optical disc (SD or HD), or the Internet/IPTV—is even lower at 4:2:0. This means that we have a quarter of the resolution for color, as compared to black and white.

With me so far? Good, because we are not done yet. We still don’t know how many bits each color or black and white sample occupy. In the computer world, we use 8 bits for each color (although higher bit depth is used for professional photography). In broadcast/professional video, we use either 10 or 8 bits with the former being preferred. For delivery to consumers the 8-bit format is the only one used. In the case of 4:2:0 video encoding used in DVD, Blu-ray and Internet delivery then, each video sample takes 12 bits, 8 bits for Luma and average of 4 bits for Chroma (samples are in reality are 8 bits each but since they are spaced out, they average to 4 bits).

Now we are ready to compute the data rate of our movie stream. Multiplying 8,294,400 (pixels in an SD image) by 12 (bits of resolution) and rounding, we get 100 Mbits/sec. To put this in context, the United States ATSC digital broadcast standard for high-definition provides for 19 Mbits/sec, yet we just learned that standard-definition video in the uncompressed domain takes up more than five times as much to transmit! At the risk of stating the obvious, we are talking about much bigger numbers than audio.

Speaking of high-definition video, let’s compute its rate. The highest approved resolution for today’s High Definition broadcasts is 1920 x 1080, which is called “1080.” Apply the newly learned math, and we get 2,073,600 pixels per frame or about 2 million. Comparing that to SD video, we see that we have six times more resolution, so it’s a pretty big step up, but at two megapixels it still pales in comparison to even the cheapest digital still camera.

Continuing on with our math homework we arrive at 597 Mbits/sec for the total data rate of this 1080, 4:2:0 source at 24 frames/second. Converting this to bytes gets us 75 Megabytes/sec. Therefore, a two-hour movie takes 400 GBytes of storage when uncompressed! Now contrast this with the 9 Gigabytes available in DVD, and up to 50 in Gigabytes in Blu-ray Disc, and you see that compression has to be your friend. And unfortunately lossless video compression need not apply.

A typical movie encoded for DVD is compressed to an average rate of about 5 Mbits/sec using MPEG-2 compression. Dividing this by the source data rate of 100 Mbits/sec we get 20:1 compression or 5 percent of the source data rate is represented in the final output!

Internet delivery of content uses more advanced video compression technology, such as VC-1/WMV-9 or MPEG-4 H.264 AVC. Data rates there may be around 2 Mbits/sec for standard definition, representing 50:1 compression. In this case, an incredible 98 percent of the source is thrown away! Remarkable performance relative to what is left, don’t you think?

For high definition video on Blu-ray Disc, there is wide variation in data rate and movie sizes due to much larger disc capacity and the amount of auxiliary content that may exist on the movie title. As a decent guess let’s use an average data rate of 20 Mbits/sec. This translates to 3% of the uncompressed 1080p/24 video source (20/600) remaining.

We are fortunate that video has so much redundancy allowing us to achieve good picture quality despite extreme levels of compression.

Now let’s look at some communication speeds in order to figure out what quality we can fit in those channels. Networking people like to use bits/second because that makes the numbers bigger than they really are. A 1.5 Mbit/sec DSL broadband Internet connection has a maximum data rate of 1.5 Mbits/sec. Recall that uncompressed SD video has data rates of 100 Mbits/sec or 66 times higher. This means that a two-hour movie would take 132 hours to download without compression. At the 5 Mbits/sec compressed data rate used for DVD video, the time drops like a rock to only six hours. Using advanced video compression, the stream could be downloaded in real-time (i.e. two hours in our example) if encoded at 1.5 Mbits/sec.

A cable broadband customer getting 8 Mbit/sec can download the video encoded at DVD rate at speeds approaching twice real-time and can start playing the movie before it fully downloads because the network can run faster than the movie’s playback rate. If encoded using more efficient compression at 1 to 2 Mbit/sec, we now have four to eight times real-time performance.

1080 high definition video as encoded on Blu-ray Disc at 20 mbit/sec average would overwhelm just most broadband connections today. The reduced rates on the Internet are the sober consequences of this.

So, there you have it. Amazing what you can do with “elementary math,” no?

luft262 · Apr 16, 2021

Great post! That was fun and enlightening. I would enthralled to see you make a video about the various audio formats (CD, MP3, AAC, FLAC) and how they impact audio quality to help consumers make better music and streaming service purchasing decisions. Thank you for your hard work!

Bits and Bytes of Audio/Video Formats

amirm

Founder/Admin

luft262

Senior Member

Similar threads