History of Audio and Video Streaming and Digital Distribution

amirm · Mar 4, 2016

This is an article I wrote for Widescreen Review in 2015 to show the progress we had made since the inception of that magazine in audio/video. It also covers part of my career that related to this evolution. Hope you don't hold that against it

.
----
History of Audio and Video Streaming and Digital Distribution
In one of my favorite movies, Good Will Hunting, Matt Damon’s character tells a story of being on an airplane. The captain not knowing the microphone was on, says he would love to have a cup of coffee and something else that is not fit for print. After he finishes the punch line, Robin Williams’ character who is playing his therapists, challenges him as to whether he has even been on an airplane. Matt’s character answers that the joke works better when he says it in first person. So that is how this article is going to unfold. I will tell you the evolution of audio and video story with respect to part of my own career intertwined in it.

The year is 1995 or so. I am working for a company called Abekas Video Systems. That name probably isn’t familiar to you unless you have worked in television broadcast or post production. If you have, you would have known them to be one of the leading hardware companies in video effects and processing. They were to video industry, what Dell and HP are to computer field. Our customers were the highest of the high-end, the network broadcasters and major post production (editing) houses. Equipment had to perform to the highest specification of fidelity, way beyond what we do in calibrating our equipment and such. The equipment was large and rack mounted with average retail of $20,000 or more. Despite their massive scale and complexity, I had a dream that someday personal computing horsepower would be sufficient to replace it all. I didn’t know how or when but thought about the possibility all the time.

The parent company of Abekas (Carlton Television which is a major owner of networks in UK) wanted it sold and we proceeded to do exactly that. Just before that happened, I heard about Netscape and initial explosion of interest in what was becoming the “World Wide Web.” I had been fortunate to have been part of the Internet revolution back in early 1980s by working on the UNIX (parent of Linux) operating system and one of the first Ethernet implementations, running the TCP/IP protocol. That was yesterday and this was now. The revolution was continuing with the birth of the web and browsers but I was no longer in it. I felt a sense of regret and was looking for a way to get back into mainstream computing in general, and the web revolution in the specific.

Back to the sale of Abekas, as soon as that went through, I got an offer to run engineering at another video company called Pinnacle Systems (now part of Avid). I accepted the position as their products were PC based and I thought it brought me a step closer to the dream that computers could be powering high-end video one day. Alas, Pinnacle was only a tiny step toward that as it still employed tons of hardware in the form of dense cards that plugged into the PC. The computer was just running the user interface and no more. I was looking for the computer to play the central role, not a side job.

One day I decided to connect with my old boss and CEO of Abekas and we got together over lunch. I go there and ask him what he was doing and he says he is running a start-up that is doing video on the web. Video on the web? Was he kidding? Surely that was not possible. Why? Because access to the Internet at that time was using dial-up modems. Broadband was still in trials and people were dubious of the prospects of it becoming mainstream. Heck, even on dial-up we had 28 Kbits/sec modems and “new” 56 Kbits/sec modems were just come to market.

Let’s do a bit of math to see why I was so incredulous that anyone would attempt to push video through the web at that time. Uncompressed broadcast quality video has a resolution of 720x480. That means we have 345,600 pixels per frame of progressive video. If we wanted to send 30 frames per second, this number would balloon to 10,368,000 pixels per second. In the highest end video encoding mode of 4:4:4, each pixel would require 24 to 30 bits (8 and 10 bits per component respectively). The common mode though is 4:2:2 which drops one of color bandwidth in half. If we assume 8 bits per component, we get 16 bits. Consumer video broadcast halves this yet again to 4:2:0 so we are now left with 8 bits for black and white (Luma) and 4 bits on the average per pixel for color (Chroma). This means each pixel needs 12 bits to describe it. Multiplying this by our total number of pixels gives us 124,416 Kbits/second.

Traffic on the web uses the TCP/IP protocol which adds its own layer of overhead. As a result our “28 Kbit/sec” modem speed shrinks down to 22 to 24 Kbits/second. And we are trying to do what with that? Push 124,416 Kbits/second through it??? Taking a ratio of those two, our channel has a capacity of just 0.00002% of what we need. And that is forgetting about audio which is a much peskier thing in amount of compression it can withstand. Yes we can apply video compression but this type of shrinkage requires miracles and prayer.

The “video” demo that my ex-boss showed me did not have a resolution of 720x480, or even one fourth of that at 180x120. No sir. It was a tiny little postage stamp window that updated a few frames per second here and there. To any lay person the experience was totally underwhelming. And to a broadcaster, a joke. But not for me. I stood there in awe as I watched a video of a Stanford University class lecture. The video was blurry and not much more than a slide show. But provided an experience that was simply not possible before. Clicking on a link in a browser and instantly watching the lecture gave you a sense of being in that classroom, the poor quality notwithstanding. For me, this was it. It was my way to get back into the computer world while still utilizing my hard earned video experience. Two months later I was there running engineering. Then 18 months later I was at Microsoft which acquired the company.

When I got to Microsoft my first question was why they had acquired us. I mean, this technology did not remotely make money so why would a major corporation have any interest in it? The answer I got back was this: “Bill Gates thinks one day television will be software.” Software? All of it? OK, I thought some of it would be software but all of it? Oh well, I am getting paid to work on something I absolutely loved so why question it.

While video was struggling to establish roots on the Internet, audio revolution there was in full swing at that time. Not as a streaming solution but for downloading and exchanging music. Any music file could be downloaded in MP3 and play it in a few minutes on your PC. Most everyone thinks the key enabler for this was the MP3 codec. But in reality it was another unsung hero. MP3 codec just like its MPEG-2 video counterpart was designed for implementation in hardware. With no hardware at the time to play it, no one was going to use it to make this transformation occur.

Music compression works in “frequency domain” meaning we take the audio samples in time and decompose them into fundamental frequency bands that represent it. Using psychoacoustics science we take out data which has minimal impact on audibility. This is the lossy portion of compression. There is a lossless step then that compresses these so called “frequency coefficients” into a compact “bit stream.” To play the file, the losslessly compressed bit stream is expanded followed by conversion back to time domain, i.e. PCM audio samples ready to play. No, there is no exam and you don’t have to understand this detail. The key to note is that a lot of mathematical operations are required. This is why I mentioned the original expectation of the standard being a hardware implementation.

Something changed unexpectedly. Computers, namely their CPUs suddenly become fast enough to decode MP3 at 128 Kbps. The moment this happened, we had millions of computers becoming music players. Without this having happened, there would have been no Napster, and quite possibly no iPod, or iPhone!!!

The hardware version of MP3 did play a pivotal role. In 1998 Diamond Multimedia which was a major supplier of computer graphics cards, decided to build a portable music player around MP3 codec. It used flash memory so could only hold about a dozen tracks of music (32 Megabytes). At the heart of the player was a piece of silicon from a German company by the name of Micronas. The chip was actually developed for deployment of European digital FM broadcasting. It was repurposed by Diamond Multimedia to decode MP3s from flash memory. And with this development, a new category of consumer electronics was born: the small portable music player.

Both Micronos and Diamond disappeared from the scene due to competition. One of them was a little-known start up called Portal Player. They had built a programmable single chip music player. We worked with them while I was at Microsoft to implement our WMA audio codec on their chip. These programmable parts could handle multiple codecs and hence our work with them to add that functionality in addition to MP3.

Another unsung hero was small hard disks being produced at the time by companies such as IBM, Hitachi and Toshiba. These were much smaller and lighter than laptop drivers. Unfortunately the capacity was also sharply reduced which heavily limited their application. They were on the way out before a remarkable thing happened. Apple chose to build a portable music player by combining the Portal Player chip and the smallest of these drives from Toshiba. Translation? Apple iPod was born in 2001 and with a massive marketing budget became an overnight success.

A key aspect of Apple’s success was an exclusive on the Toshiba drive. No one else was making those drives in such small form factor and as a result, apple succeeded to keep competitors at bay quite effectively. This is not to undermine all the great things that Apple did in user interface and industry design. Without that exclusive use of the Toshiba drive, it is entirely possible that other competitors could have caught up with Apple much as they have done with Android phones and tablets.

In case you are curios why we developed WMA, let me explain the motivation behind it. Just like video, one way to reduce the bandwidth of audio is to chop it down. Convert it to mono, and keep bringing down the sampling rate and bandwidth it provided and eventually you can stuff it in a dial-up modem. Those degradations are noticeable of course. On a 56 Kbits/sec modem, you had audio fidelity that was a bit above AM which was mono and muffled. The goal with WMA codec was to produce FM quality on dial-up modems. It also halved the required storage for “CD quality” on portable music players which used expensive flash memory. We achieved both of those goals more or less.

One of the major challenges for delivery of audio and video on the Internet is its variable throughput. Even with high bandwidth broadband links I am sure you all have hit on a YouTube link only to see it buffer. The Internet is a “best effort” type of channel meaning it provides no guarantee of throughput whatsoever. Your ISP oversells their capacity hoping that not everyone uses peak traffic at the same time. But should that happen, things get slow. Video and audio on the other hand are “real-time” events. You can’t pause the audio constantly or the user gets frustrated. So somehow we have to match these two opposing systems together.

At the start-up I worked at, VXtreme, we came up with a clever solution to this problem. Instead of encoding at just one bit rate, we encoded audio and video at multiple rates. The player would make an educated guess as to the starting bit rate based on past history. One of two things would then happen. If the link ran too slow, say 200 Kbits/sec versus encoding rate of 300 Kbits/sec, the player would switch down to a lower fidelity layer at for example 150 Kbits/sec. The audio and video quality would go down but the stream would keep playing. Conversely, if the link speed was faster, it would select a higher fidelity layer. Both of these led to much more satisfying experience than pausing and buffering.

To avoid oscillation, the system doesn’t instantly switch up and down. You don’t want a situation where you jump to a higher fidelity layer only to find insufficient bandwidth and having to switch back down to lower fidelity a second or two later. This is why you may see your streaming movie service switch to a lower fidelity and stay there for a while.

This technology was called MBR or multi-bitrate but its common name today is “smart” or “intelligent” streaming. It is in play whether you watch YouTube or Netflix. Yes, you can still see buffering messages. If the lowest encoded bit rate is still too high for the actual throughput you have to the server, there is no choice but to pause playback and read ahead. Once there is enough read into memory (“buffer”) the systems starts to play again. If there is insufficient bandwidth still, you will keep getting these periods of pauses and playback.

Another interesting aspect of streaming was that for years and years, playback occurred through extensions to the browser and stand-alone players. Implementation in the browser in the form of “HTML 5” which is very common today, came much later. Audio/video on the web simply was not taken seriously for quite a long time. I remember running into my pro video colleagues and have them ask me why I was wasting my time with Internet video and wouldn’t go back to working for video companies. The tide change came when Google paid $1.65 billion dollars for YouTube. Now it was no longer a curiosity on the side of the web. All of a sudden I am getting email notification that we are nominated and eventually win a (technical) Emmy award from no less than National Academy of Television Arts and Sciences. The industry was finally seeing the Internet streaming as real.

From Left to right, Will Poole, myself and Anthony Bay (my bosses during my time at Microsoft), celebrating our Emmy Award granted graciously to us by National Academy of Television Arts and Sciences in 2006 for innovation in streaming technologies.

When I stood up to take the Emmy award at the ceremony, the story I told was what I heard when I got to Microsoft: “Bill Gates said one day Television would become software.” And boy has he been right. The entire system of video delivery end to end is through software.

Please excuse me now as I go to get a cup of coffee with this story and five dollars...

Amir Majidimehr is the founder of Madrona Digital (www.madronadigital.com) which specializes in custom home electronics. He started Madrona after he left Microsoft where he was the Vice President in charge of the division developing audio/video technologies. With more than 30 years in the technology industry, he brings a fresh perspective to the world of home electronics.

Thomas savage · Mar 4, 2016

i would of thought a special Emmy 'crown' should of been made for the KING OF AUDIO. by the looks of it you would of been just as at home winning the Best Actor award... smooth

svart-hvitt · Mar 1, 2019

AUDIO STREAMING IS NOW 75 PERCENT OF MARKET

http://www.riaa.com/wp-content/uploads/2019/02/RIAA-2018-Year-End-Music-Industry-Revenue-Report.pdf

RayDunzl · Mar 1, 2019

Early 80's Forward Thinking (pre-internet):

"The next document explains the meeting mentioned above at Rothschild Venture Capital, held at a time in the long-forgotten past, when CDs weren't even on the market. . .

A PROPOSAL FOR A SYSTEM TO REPLACE PHONOGRAPH RECORD MERCHANDISING

Ordinary phonograph record merchandising as it exists today is a stupid process which concerns itself essentially with moving pieces of plastic, wrapped in pieces of cardboard, from one location to another.
These objects, in quantity, are heavy and expensive to ship. The manufacturing process is complicated and crude. Quality control for the stamping of the discs is an exercise in futility. Dissatisfied customers routinely return records because they are warped and will not play.

New digital technology may eventually solve the warpage problem and provide the consumer with better quality sound in the form of compact discs [CDs]. They are smaller, contain more music and would, presumably, cost less to ship. . . but they are much more expensive to buy and manufacture. To reproduce them, the consumer needs to purchase a digital device to replace his old hi-fi equipment (in the seven-hundred-dollar price range).

The bulk of the promotional effort at every record company today is expended on 'NEW MATERIAL' . . . the latest and the greatest of whatever the cocaine-tweezed rug-munchers decide to inflict on everybody this week.

More often than not, these 'aesthetic decisions' result in mountains of useless vinyl/cardboard artifacts which cannot be sold at any price, and are therefore returned for disposal and recylcing. These mistakes are expensive.

Put aside momentarily the current method of operation and think what is being wasted in terms of GREAT CATALOG ITEMS, squeezed out of the marketplace because of limited rack space in retail outlets, and the insatiable desire of quota-conscious company reps to fill every available slot with this week's new releases.

Every major record company has vaults full of (and perpetual rights to) great recordings by major artists in many categories which might still provide enjoyment to music consumers if they were made available in a convenient form.
MUSIC CONSUMERS LIKE TO CONSUME MUSIC. . . NOT SPECIFICALLY THE VINYL ARTIFACT WRAPPED IN CARDBOARD.

It is our proposal to take advantage of the positive aspects of a negative trend afflicting the record industry today: home taping of material released on vinyl.

First of all, we must realize that the taping of albums is not necessarily motivated by consumer 'stinginess.' If a consumer makes a home tape from a disc, that copy will probably sound better than a commercially manufactured high-speed duplication cassette legitimately released by the company.

We propose to acquire the rights to digitally duplicate THE BEST of every record company's difficult-to-move Quality Catalog Items [Q.C.I.], store them in a central processing location, and have them accessible by phone or cable TV, directly patchable into the user's home taping appliances, with the option of direct digital-to-digital transfer to the F-l (SONY consumer-level digital tape encoder), Beta Hi-Fi, or ordinary analog cassette (requiring the installation of a rentable D-A converter in the phone itself. . . the main chip is about twelve dollars).

All accounting for royalty payments, billing to the consumer, etc., would be automatic, built into the software for the system.

The consumer has the option of subscribing to one or more 'special interest category,' charged at a monthly rate, WITHOUT REGARD FOR THE QUANTITY OF MUSIC THE CUSTOMER WISHES TO TAPE.

Providing material in such quantity at a reduced cost could actually diminish the desire to duplicate and store it, since it would be available any time day or night.

Monthly listings could be provided by catalog, reducing the on-line storage requirements of the computer. The entire service would be accessed by phone, even if the local reception is via TV cable.

One advantage of the TV cable is: on those channels where nothing ever seems to happen (there's about seventy of them in L.A.), a visualization of the original cover art, including song lyrics, technical data, etc., could be displayed while the transmission is in progress, giving the project an electronic whiff of the original point-of-purchase merchandising built into the album when it was 'an album,' since there are many consumers who like to fondle & fetish the packaging while the music is being played.

In this situation, Fondlement & Fetishism Potential [F.F.P.] is supplied, without the cost of shipping tons of cardboard around.

Most of the hardware devices are, even as you read this, available as off-the-shelf items, just waiting to be plugged into each other in order to put an end to the record business as we now know it."

- from The Real Frank Zappa Book, published 1988

History of Audio and Video Streaming and Digital Distribution

amirm

Founder/Admin

Thomas savage

Grand Contributor

svart-hvitt

Major Contributor

RayDunzl

Grand Contributor

Similar threads