• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The Truth About HiFi Network Devices

do you want to get to right answer here or are you giving up?

I’ve been in the trenches of digital media my entire career - plenty of right answers have already been given.
 
I’ve been in the trenches of digital media my entire career - plenty of right answers have already been given.

OK, you gave up. Sad.
 
I suppose the fuel that audiophools have is that, no, streamed, downloaded or really any "remote" audio is asynchronous and... it's not perfect.
My reason for listening and watching CDs, Blu-Rays is mostly that I have better compression rate, lossless compression or no compression at all.
(This no longer applies to my DVD collection which is mostly in PAL quality with AC3 sound, to which streaming is usually in better or at least equal quality).

At this day and age I can legally download FLAC audio files, while my choice of amplifier and speakers (NAD d3020v2, Elac DBR62) aimed at CD quality.

Streaming services usually have to apply lossy compression formats. How much different compression methods change the original data would be an interesting thing to study, but in general I watch and listen streaming services anyway. If there is something I want to have in better quality in most cases its readily available to purchase.
 
What exactly am I giving up on?

a reaction on this:

Lets be clear: its not about the transport of the data.
The fundamental problem is in the fixed data rate of a live radio station playing out a digital version of its program.
This rate is equal for ALL listeners..
The listeners are all listening at a slightly different rate ( their local D/A clock)
Then there are two choices:
1: Sample rate conversion to bridge the difference
2: buffer/sample management that once in a while inserts zeros/skips samples/buffers.
Whats wrong with this reasoning?
 
1. You're inferring SRC to slave a clock when there is no clock to slave it to.
2. Same as above. Local clock is the only clock in the universe from the DACs perspective so the error can't be known. The buffer won't overflow as the application isn't going to fetch segments it doesn't need, and if it underflows playback will pause until it's recovered.

Before you attempt to add underflow to your list of non-existent problems to solve, do the math on how long it'd take to consume 1s of buffer with the clock tolerances we're dealing with today.

All of this has been explained to you multiple times.
 
1. You're inferring SRC to slave a clock when there is no clock to slave it to.
2. Same as above. Local clock is the only clock in the universe from the DACs perspective so the error can't be known. The buffer won't overflow as the application isn't going to fetch segments it doesn't need, and if it underflows playback will pause until it's recovered.

Before you attempt to add underflow to your list of non-existent problems to solve, do the math on how long it'd take to consume 1s of buffer with the clock tolerances we're dealing with today.

All of this has been explained to you multiple times.

ok lets forget about 1 :)

>>2. Same as above. Local clock is the only clock in the universe from the DACs perspective so the error can't be known. The buffer won't overflow as the application isn't going to fetch segments it doesn't need, and if it underflows playback will pause until it's recovered.

Ok, so we agree... pfff
 
No, not really. What you've been saying is quite different.
No, you say both clocks differ and at some point in time you have to insert zero's ( 'pause') to get things right again.. that's what I said..
 
Last edited:
No, what you say both clocks differ and at some point in time you have to insert zero's ( 'pause') to get things right again.. that's what I said..

Pausing and inserting zeros in the context you claimed is quite different, but sure, you can't play what's not there. That however has nothing to do with the base claims you made which are false:

Great write up! The only case I would like to add is broadcasting of live streams (internet radio/tv). In this case the rendering device needs to reconstruct the sample clock with some kind of (digital) pll or do sample rate conversion. This process is well understood but could potentially be degraded by packet jitter/delays if not implemented perfectly..
With a live internet radio this is not the case, the source 'pushes' the data at a fixed rate.
 
>> With a live internet radio this is not the case, the source 'pushes' the data at a fixed rate.

So how does this work then? There is a live stream of DJ/radio station.
At some point in time the stream is available in some sample rate and encoded into packets.
The sample rate is set at the sending side, a fixed number of samples are grouped in an audio frame and encode per frame.
So this generates Fs/samples per frame frames per second.
So whats wrong with saying that this data at that point is pushed at a fixed rate?
That these frames are fetched by N listeners to listen to them does not change the fact that that these frames come availabe at that fixed rate..
What am I seeing wrong here?
 
In terms if "Live" as in, a person in a room with a mic, online, with listeners.

The "Live" audio is recorded to a file. A period of time later, the HTTP service is primed to begin accepting connections at the start of the file.

So, the "Live" studio is ahead of the listeners by maybe 10 seconds, but a minute is probably more likely.

When anything goes wrong, or someone drops the F bomb on the stream, they press a button on the server and everyones stream immediately switches to an "Alt" stream, like ads, jingles or just back to the looping playlist.

So the "live" listeners are probably receiving up to a minute of audio as a local buffer, while being served from a longer buffer.

I think one minor, but missed bit of information thus far on "how all this works", might be to remind folks that the VAST majority of internet traffic is "peer to peer", uni-cast. There are extremely few "broadcast" or even "multi-cast" services on the internet and those that do exist are usually things like Netflix backbone routing agreements that's allow them to send wide spanning multi-cast packets on IPv6 instead of sending a separate stream for every user.

In internet radio station terms... uni-cast. If 1000 people are listening. There are 1000 open individual HTTP connections and each of them is at a different position in the stream. The Internert radio software is nothing more than an HTTP server, a playlist manager and a little bit of logic to keep the clients from running off the end of the buffer.
 
In terms if "Live" as in, a person in a room with a mic, online, with listeners.

The "Live" audio is recorded to a file. A period of time later, the HTTP service is primed to begin accepting connections at the start of the file.

So, the "Live" studio is ahead of the listeners by maybe 10 seconds, but a minute is probably more likely.

When anything goes wrong, or someone drops the F bomb on the stream, they press a button on the server and everyones stream immediately switches to an "Alt" stream, like ads, jingles or just back to the looping playlist.

So the "live" listeners are probably receiving up to a minute of audio as a local buffer, while being served from a longer buffer.

I think one minor, but missed bit of information thus far on "how all this works", might be to remind folks that the VAST majority of internet traffic is "peer to peer", uni-cast. There are extremely few "broadcast" or even "multi-cast" services on the internet and those that do exist are usually things like Netflix backbone routing agreements that's allow them to send wide spanning multi-cast packets on IPv6 instead of sending a separate stream for every user.

In internet radio station terms... uni-cast. If 1000 people are listening. There are 1000 open individual HTTP connections and each of them is at a different position in the stream. The Internert radio software is nothing more than an HTTP server, a playlist manager and a little bit of logic to keep the clients from running off the end of the buffer.
If indeed 'live' is with a minimal 10 seconds delay then these xx ppm clock differences do not create problems anymore.. It would take days before such buffering could run out :)
 
Oh. And lets be real here. On PCs, smart phones or anything else these days, a few mega bytes of buffer is perfectly fine. With most services offering 48@16 compressed you would need to have a 'really' screwed up clock to run off the end of the buffer. That is unlikely "today", because it's pretty hard for a modern PC running at 4.8Ghz to screw up a 48k clock and screw it up enough that you run out of a 1 megabyte buffer.

If you are trying to create a low latency USB audio interface, you might have more trouble dealing with 192 byte buffers than 1024 Kilo byte buffers.
 
I'm not unsympathetic to what you are trying to get at.

I have dealt with that stuff in very real and sometimes painful, (literally), ways. Trying to work with low-latency, low level DSP style audio. 1ms buffers sound great until your input stops. Because low hardware (MCUs DSPs) have a tendency to encourage the use of "circular buffers" as some how a good? idea.. Then you get a 1kHz primary harmonic of the last millisecond of audio on repeat. That hurts if the volume is up.

Similarly, although less painful and somewhat amusing, until it gets old... Having the output catch up with the input and then overtake it into the past. So for a while you get the music where it's interleaved between "now" and "1 millisecond ago". It starts with just a crackle, then slowly progresses into some pretty hilarious "Sci-Fi" effects especially on vocals and eventually "phases" back out again and you catch up with the present again. Kinda mind boggling and ... completely wrong. Start again.

But this is all "Synchronous" audio. Trying to do async-audio with 1ms buffers.... it's actually better to have only a single sample buffer, the 1ms just makes your problem bigger.
 
We run several "live" streams out of the station that I work for. By the time the stream has gone through encoding, ad replacement and broadcast (expletive) delay, the listeners are getting the "live" program with a delay of around 40-50 seconds. The stream/s we generate are forwarded to a service provider that handles the unicast connections.
 
The stream/s we generate are forwarded to a service provider that handles the unicast connections.

When you get into those kinds of service providers, you would be amazed at the kind of high scale hardware they can roll out for things like this. Hardware that most people wouldn't even consider exists or there is a need for.

Where the real magic happens in todays era is in how the big international media giants manage "in rush" and the release of large titles. The next big series they drop an Ep.1 of, you could have maybe 5 million people in the UK alone streaming it. Maybe 100 million world wide isn't completely out of scope.

How do you manage to keep all of those uni-cast connections all reading the same file?

You don't. You pre-ship the high demand items, once, to hundreds of "edge nodes" around the world. When a stream is requested, a "mirror" is selected which closer to the client. When an edge node reaches a limit, another one is spun up, when a datacentre reaches the companies bandwidth limit, they spin an edge node up in another data centre.

Same challenge but slightly more fragile and why still to this day they are fraught with tech troubles, are actual LIVE multi-million viewer streams. For example I was watching a NASA LIVE broadcast on YouTube that claimed to have over 5 million concurrent viewers. The principle is the same. "Fan out". But this time it's a tree of stream forwarders. The master streams it to a multilpexer which sends it to 100 main servers which each stream it on to 100 localised servers.

These servers don't need to be "real" either. Ultimately they WILL have physical hardware assigned, but media companies can literally "spawn and spin up" a new edge node in any datacentre (they have their stuff in) in the world in seconds and destroy it again when it's not needed.

It's off topic, but those "virtualisation" layers, these days, go very, very deep. To the point that most networks beyond the home are not what they appear to be at all and are just a "logical" layer over or within something else. The 'actual' hardware has been desparately abstracted away behind "cloud" and IAAS, Infrastructure as a service.

Without it, big media giants could not scale to the size they need for high demand and they couldn't afford to keep it all running when it's quiet.
 
When you get into those kinds of service providers, you would be amazed at the kind of high scale hardware they can roll out for things like this. Hardware that most people wouldn't even consider exists or there is a need for.

Where the real magic happens in todays era is in how the big international media giants manage "in rush" and the release of large titles. The next big series they drop an Ep.1 of, you could have maybe 5 million people in the UK alone streaming it. Maybe 100 million world wide isn't completely out of scope.

How do you manage to keep all of those uni-cast connections all reading the same file?

You don't. You pre-ship the high demand items, once, to hundreds of "edge nodes" around the world. When a stream is requested, a "mirror" is selected which closer to the client. When an edge node reaches a limit, another one is spun up, when a datacentre reaches the companies bandwidth limit, they spin an edge node up in another data centre.

Same challenge but slightly more fragile and why still to this day they are fraught with tech troubles, are actual LIVE multi-million viewer streams. For example I was watching a NASA LIVE broadcast on YouTube that claimed to have over 5 million concurrent viewers. The principle is the same. "Fan out". But this time it's a tree of stream forwarders. The master streams it to a multilpexer which sends it to 100 main servers which each stream it on to 100 localised servers.

These servers don't need to be "real" either. Ultimately they WILL have physical hardware assigned, but media companies can literally "spawn and spin up" a new edge node in any datacentre (they have their stuff in) in the world in seconds and destroy it again when it's not needed.

It's off topic, but those "virtualisation" layers, these days, go very, very deep. To the point that most networks beyond the home are not what they appear to be at all and are just a "logical" layer over or within something else. The 'actual' hardware has been desparately abstracted away behind "cloud" and IAAS, Infrastructure as a service.

Without it, big media giants could not scale to the size they need for high demand and they couldn't afford to keep it all running when it's quiet.

That's an impressive way of working! But also the end of 'real' live streams?
Or will the performance of these systems evolve to support at least lower latencies so you don't find out that the winning goal is made when your neighbor is cheering who is watching cable? :)
 
That's an impressive way of working! But also the end of 'real' live streams?
Or will the performance of these systems evolve to support at least lower latencies so you don't find out that the winning goal is made when your neighbor is cheering who is watching cable? :)
That's an ongoing significant issue for which there is no current fix when using scaled unicast. The higher the quality, the deeper the buffer, so UHD watchers saw the goal nearly 60 seconds after those watching on DTT.

Multicast is an obvious fix, but a) it's a lot more complicated than unicast, especially if you cross ISP BGP boundaries, b) ISPs are not interested in deploying it (they probably don't want the service calls).
 
Back
Top Bottom