Audio/Video distortion experiment: 1000x "I am sitting in a room"

lashto · Nov 24, 2022

Found an interesting experiment: record a video, upload to youtube and then download/upload it 1000x to see the effects of video and audio compression.

The initial experiment done ~2010 with a webcam. Big kudos to that guy, considering the youtube speed at that time it probably took many months.
A redux from 2018 with a better camera & improved youtube algorithms.
And the best quality trial from 2019 with an 8K camera. Still lots of artifacts after a while.
More details about the original idea (1969) & an early recording (1981) showing the very audible effects of rooms/microphones/etc.

I know that a few members did similar experiments with audio tracks going through multiple AD/DA runs. I wonder what will happen if someone does 1000x rounds with a pair of last generation ADC/DAC devices at 120+dB SINAD.
Will it really sound the same?! Or these things are still not exactly perfect/inaudible and the track will begin to sound 'creepy' after a while?!

MaxwellsEq · Nov 24, 2022

Video compression is a great deal more aggressive than audio compressing because the brain is easier to fool with video than sound. Film is sampled at 26 times every second, CD is 44000 times each second!

Video inside a TV Studio is at 270Mbit/s for Standard Definition and 3G for HD. But you watch it on YouTube or via your STB at a handful of Mbits/s these are compression ratios of several 100 fold.

An uncompressed AD to DA copy in audio has NO compression, so a 1000x will have almost no impact if done properly.

kchap · Nov 24, 2022

This predates Youtube. Years ago, I've heard a piece of atonal music** on the radio. The piece consisted of a human voice making a short statement that played endlessly from a speaker, to a microphone and then via a tape loop back to the speaker until it sounded like white noise. I have no idea of the composer or the title.

** Atonal music is probably the wrong term. It's the only term I could think of that came close to describing the piece.

lashto · Nov 24, 2022

MaxwellsEq said:
Video compression is a great deal more aggressive than audio compressing because the brain is easier to fool with video than sound. Film is sampled at 26 times every second, CD is 44000 times each second!

Video inside a TV Studio is at 270Mbit/s for Standard Definition and 3G for HD. But you watch it on YouTube or via your STB at a handful of Mbits/s these are compression ratios of several 100 fold.

video and audio compression are surely way "more aggressive" than AD/DA. And nobody can say exactly what youtube does .. or did in 2010. The improvements between 2010-2019 are easily visible/audible though.

MaxwellsEq said:
An uncompressed AD to DA copy in audio has NO compression, so a 1000x will have almost no impact if done properly.

IIRC, the previous AD/DA experiments done by ASR members were deemed 'inaudible' after 5-10 runs. No idea which ADC/DACs were used but probably not the latest and greatest. However, even 2010 youtube still looks very good after 5-10 runs so those 'short' AD/DA trials are not very relevant.

Your "almost no impact" might be a good guess for the latest AD/DA devices. But it's just a guess and we still have an open question until somebody does the 1000x rounds.

Killingbeans · Nov 24, 2022

lashto said:
I wonder what will happen if someone does 1000x rounds with a pair of last generation ADC/DAC devices at 120+dB SINAD.
Will it really sound the same?! Or these things are still not exactly perfect/inaudible and the track will begin to sound 'creepy' after a while?!

I'd imagine that the noise floor slowly creeping upwards would be the first thing to become noticeable. It would possibly overpower the music long before any distortion products had a chance to become dominant. Assuming you keep going into the 1000000x times.

A better analogy to the YouTube experiment would be to do FLAC -> MP3 -> FLAC 1000x times.

lashto · Nov 24, 2022

Killingbeans said:
I'd imagine that the noise floor slowly creeping upwards would be the first thing to become noticeable. It would possibly overpower the music long before any distortion products had a chance to become dominant. Assuming you keep going into the 1000000x times.

hm, you might be right about the noise floor being the first to "creep upwards". But the HD is still there, I will not write it off so easily. And it will probably depend a lot more on the ADC/DAC choices. To be seen, I guess.

Killingbeans said:
A better analogy to the YouTube experiment would be to do FLAC -> MP3 -> FLAC 1000x times.

The FLAC/MP3 cycle is an almost perfect analogy and it could also be an interesting experiment. Way easier/faster, too. But not sure about the practical value of that, to me it looks like just another variation of the original "distortion art" experiment.

I'd be much more interested in a test that shows if those latest and greatest ADCs/DACs are really 1000x 'perfect'

Killingbeans · Nov 24, 2022

lashto said:
hm, you might be right about the noise floor being the first to "creep upwards". But the HD is still there, I will not write it off so easily. And it will probably depend a lot more on the ADC/DAC choices. To be seen, I guess.

The thing is that distortion doesn't "stack" the way normal intuition tells you. At some point the contribution of the subsequent conversions will become practically non-existent.

But yes, it would be an interesting experiment.

lashto · Nov 24, 2022

kchap said:
This predates Youtube. Years ago, I've heard a piece of atonal music** on the radio. The piece consisted of a human voice making a short statement that played endlessly from a speaker, to a microphone and then via a tape loop back to the speaker until it sounded like white noise. I have no idea of the composer or the title.

If you look at the OP, I updated the links to the 'original' with the dates/years. And yes, it was way before youtube. I assume many similar experiments were done in the meantime, just post some links if you find interesting variations

incoherent · Nov 24, 2022

lashto said:
hm, you might be right about the noise floor being the first to "creep upwards". But the HD is still there, I will not write it off so easily. And it will probably depend a lot more on the ADC/DAC choices. To be seen, I guess.

The FLAC/MP3 cycle is an almost perfect analogy and it could also be an interesting experiment. Way easier/faster, too. But not sure about the practical value of that, to me it looks like just another variation of the original "distortion art" experiment.

I'd be much more interested in a test that shows if those latest and greatest ADCs/DACs are really 1000x 'perfect'

I thought this FLAC -> MP3 -> FLAC idea was an interesting question, so I analyzed it in MATLAB today. Here was my approach:

I used a source FLAC of the song Hallelujah performed by Ryan Adams from https://archive.org/details/ryanadams2006-10-17.sbd.flac16

This was read into MATLAB, converted to an MP4 with AAC encoding at 192 kbit/s, and written to disk. I used this encoding because I could not find a built-in MP3 writer for MATLAB. This file was read back into MATLAB and rewritten to disk as a 24-bit FLAC file. This would be one "iteration" of conversion. I repeated this 100 times.

The power- and cross-spectral densities are shown in the figure "specDensTrim.png". The densities were estimated using half-second records with 50% overlap and a Hann window. The original FLAC keeps its power as the signals near 20 kHz, while the AAC encoding reduces this power. It is strange to me that the AAC-encoded MP4 has added power below 10 Hz. Not sure what's going on there.

Given these spectral densities, it is possible to calculate the coherence between the signals to get some quantitative measure of their similarity. The coherence is shown in "coh100Trim.png" for the original FLAC and 100 iterations of encoding. I also attached "cohComparisonLogTrim.png" comparing the coherence as the number of encoding iterations varies between 1, 10, and 100. The change from 0->1 is large. The change from 1->10 is smaller. The change from 10->100 is smaller still. I think this supports killingbeans idea that

The thing is that distortion doesn't "stack" the way normal intuition tells you. At some point the contribution of the subsequent conversions will become practically non-existent.

MaxwellsEq · Nov 24, 2022

MaxwellsEq said:
Video compression is a great deal more aggressive than audio compressing because the brain is easier to fool with video than sound. Film is sampled at 26 times every second, CD is 44000 times each second!

Video inside a TV Studio is at 270Mbit/s for Standard Definition and 3G for HD. But you watch it on YouTube or via your STB at a handful of Mbits/s these are compression ratios of several 100 fold.

lashto said:
video and audio compression are surely way "more aggressive" than AD/DA. And nobody can say exactly what youtube does .. or did in 2010. The improvements between 2010-2019 are easily visible/audible though.

I have excellent technical knowledge of what YouTube was doing in 2010 and now. I have the advantage of being an engineer in both broadcasting and streaming worlds.

Please don't make this into a mystery, it's not. It's engineering, and engineers at YouTube, Netflix etc. frequently publish technical articles on how they do things and share this knowledge with their peers. There are only so many technologies available and only so many ways to build the engineering.

I will unpack the numbers.

1. Video
10 bits needed for each colour R G B = 30 bits per pixel.
HD has 1920 x 1080 for a frame = 2,073,600 pixels multiplied by 30 = 62,208,000 bits for a frame
For Progressive 50 frames per second = 50 x 62,208,000 = 3,110,400,000 bits every second
This is 3.11Gbit/s for uncompressed HD video. UHD is c 12Gbit/s!

In general, over the last few years, video from YouTube, Netflix et al uses Manifests to offer different qualities based on what your device and network can sustain. But in general, assume most non-UHD streams are between 3/5 Mbit/s for talking heads and 5 to 15/20Mbit/s for more action. Let's pick 10 Mbit/s

So the 3,110,400,000 is converted to 10,000,000. This is compression ratio of 311.

Perhaps it helps I explain that this means 3.10 Gbit/s of information has been discarded from 3.11 Gbit's of source data! And that is just one pass!

It's been known for decades that you should never run a lossless compression algorithm on a compressed video signal. The algorithms above work by throwing away stuff your brain doesn't notice (for example, we don't need very resolute colour images, so we can throw a huge amount of colour resolution away). BUT, the second time, the algorithm "assumes" there is information that can be thrown away, but that simply is no longer true. This time, stuff gets thrown away that you cannot do without. So looping a 3Gbit/s stream into a compressed stream at 10 Mbit/s, then doing it just once more will produce massive artefacts.

2. Sound
In sound, the same thing happens. A Red Book CD produces 44,100 samples which have a bit depth of 16 (notice that pictures only have 10 bits per pixel). There are 2 channels so you need 1.411 Mbit/s. It's generally accepted that a 256kbit/s MP4 stream is a good comparator with an uncompressed audio stream. This is only a 5x compression!

Again - it's been known for years that you should never double lossless compress audio. For example, MP3s used in studios (rather than CDs or WAVs) sound wrong via compressed DAB or via compressed Internet streams!

SO, I would expect a couple of loops of YouTube serial compression to be heavily artefact-ed, because so much gets chucked away on the first compression, a second compression has nothing to work with so damages the image and sound.

BUT, if you are going uncompressed - to - uncompressed - to - uncompressed etc. I would expect almost no impact.
DA to AD to DA to AD should be fine for many 100s of times assuming that both the DAC and ADC are linear to 20 bits or so. Gradually the result may sound different from the original, but the difference would NOT be digital artefacts as above, but variations on the interpretation of the least significant bits. This might appear as changes to the noise floor.

lashto · Nov 24, 2022

incoherent said:
I thought this FLAC -> MP3 -> FLAC idea was an interesting question, so I analyzed it in MATLAB today. Here was my approach:

I used a source FLAC of the song Hallelujah performed by Ryan Adams from https://archive.org/details/ryanadams2006-10-17.sbd.flac16

This was read into MATLAB, converted to an MP4 with AAC encoding at 192 kbit/s, and written to disk. I used this encoding because I could not find a built-in MP3 writer for MATLAB. This file was read back into MATLAB and rewritten to disk as a 24-bit FLAC file. This would be one "iteration" of conversion. I repeated this 100 times.

great, many thanks! AAC/192 should be just as good as any other compressed/lossy format

incoherent said:
The original FLAC keeps its power as the signals near 20 kHz, while the AAC encoding reduces this power. It is strange to me that the AAC-encoded MP4 has added power below 10 Hz. Not sure what's going on there.

strange, the only sum-effect should be a noise-floor increase. Otherwise, only loss-effects should happen

incoherent said:
The change from 0->1 is large. The change from 1->10 is smaller. The change from 10->100 is smaller still.

That sounds like the expected result. At least for mp3, a large part of the compression is the elimination of the inaudible sounds (i.e. masked sounds and anything outside the audible range). Most of that is done during step1 (ideally, all of that). If you use a high rate like 320kbps mp3, that might be all that is done in terms of compression/loss.
Even with low kbps rates, the first step should be waaay more 'compressive' than the subsequent ones. Actually, my expectation was/is that the compression/loss will just stop after several steps and I was quite surprised that the youtube experiment ended up with completely garbled and unrecognizable audio

incoherent said:
I think this supports killingbeans idea that

@Killingbeans was talking about the accumulated effects of HarmonicDistortion in an AD/DA cycle and you are testing the effects of lossy compression. Don't think there are many (if any) similarities between those scenarios. E.g. the effects of every AD/DA step should be of the approx same magnitude regardless of the step number.

If you like to do more experiments, highly welcome. E.g. how about a low rate like 32 or 64kbps (probably the youtube rate). Does it really get completely garbled like in the youtube experiments or some other 'things' are happening there?!

Also highly welcome to post some results so we can also have some fun

. E.g. the AAC files for steps 10,20,50,100. And I guess you can just skip the FLAC steps and simply do AAC->AAC->AAC...

lashto · Nov 25, 2022

MaxwellsEq said:
MaxwellsEq said:
...
Please don't make this into a mystery, it's not. It's engineering, and engineers at YouTube, Netflix etc. frequently publish technical articles on how they do things and share this knowledge with their peers. There are only so many technologies available and only so many ways to build the engineering.
...

not sure what was not clear or what are you explaining.

Yes the few audio/video compression algorithms are (generally) well documented. But how do you know exactly what options were available in 2010? How do you know exactly which algorithms did the user choose for each one of his 1000x upload/download ops? Or was there any choice at the time? And since the experiments took many months, how many times did youtube change that code/config during the experiment?
Yes, youtube&co do publish a lot but they'll never tell 'everything'. Yes, one can get a pretty good 'picture' about what happened but there's no way to know exactly. Most probably, even google won't be able to tell what happened 10+ years ago.
It looks to me that there will always be some mystery left in there. That's just fine, it was an art experiment. And I do not see anyone trying to increase that mystery, just attempts to clarify it (as much as possible...)

Anyway, nevermind all that..
About AD/DA it was also ~agreed that the noise floor should be the biggest change/effect. How about the accumulated HD, do you have some ideas on how to estimate that effect?

MaxwellsEq · Nov 25, 2022

lashto said:
Yes, youtube&co do publish a lot but they'll never tell 'everything'. Yes, one can get a pretty good 'picture' about what hapened but there's no way to know exactly. Most probably, even google won't be able to tell exactly what happened 10+ years ago.
It looks to me that there will always be some mystery left in there. That's just fine, it was an art experiment. And I do not see anyone trying to increase that mystery, just attempts to clarify it (as much as possible...)

Some us are older than 30. I started doing very complex, millisecond-resolution analysis of media movement over IP in the 1990s and have published and given talks on the subject. I've analysed what YouTube have been doing many times over the last years. I also work with many domain experts in this field. None of what happened in 2010 is a mystery. It's all engineering.

I can tell you are determined to make a "thing" out of this. You don't want to increase your knowledge, which is a pity since it is at the heart of scientific endeavour. I'm now ignoring you.

lashto · Nov 25, 2022

MaxwellsEq said:
I can tell you are determined to make a "thing" out of this.

Hm, I pretty clearly wrote "nevermind all that". (also 'spoilered' that now, probably should've done it from the start)

The compression experiment is just not very interesting to me. I already know all I want to know about the subject, but everyone is highly welcome to explore/educate/etc on that part. Same as highly welcome to act 'offended' and ignore...

I only asked for very specific AD/DA insights. That is what I want to know/learn.

Killingbeans · Nov 25, 2022

lashto said:
How about the accumulated HD, do you have some ideas on how to estimate that effect?

I think I've might have been talking out of my a##.

It's possibly both noise and distortion that will experience the diminishing contribution effect(?):

The Signal Chain: How do noise and distortion propagate through my system?

This article is to explore how noise and distortion change through our system, and how different components can influence the final output. A common question is “If I have DAC A, preamp B, and amp C, where should I set the volume and gain controls for the best performance?” Usually this...

www.audiosciencereview.com

lashto · Nov 25, 2022

Killingbeans said:
I think I've might have been talking out of my a##.

I do not think anyone has a clear/sure formula (at east not anyone in this thread) so that's just fine. Let's just say that we are all 'estimating'

Killingbeans said:
It's possibly both noise and distortion that will experience the diminishing contribution effect(?):

The Signal Chain: How do noise and distortion propagate through my system?

This article is to explore how noise and distortion change through our system, and how different components can influence the final output. A common question is “If I have DAC A, preamp B, and amp C, where should I set the volume and gain controls for the best performance?” Usually this...

www.audiosciencereview.com

interesting link, thank you.

A lot of that should apply but with some changes.
The noise-sum formula would be simpler in our case since there is (hopefully) no gain in the AD/DA chain.
A 'simple' root-sum-square formula should be good enough to approximate the Noise-sum.

The HD in our case is also "independent" between the ADC and the DAC but should be ~the same distortion for each full AD/DA step. However, we might still have "a little delay and phase shift" in each step and it won't be a 100% linear sum.
All in all, another 'simple' root-sum-square formula may be good enough to approximate the HD-sum too.

So, if anyone has the right tool (MATLAB?), he can calculate the 1000x root-sum-square with noise=-120dB and HD=-120dB for both the DAC and the ADC.

This is waaay too much audio-math for me ...
Still hope that someone will actually do the experiment. It may take a lot of time but should still be easier than calculating. And more precise/sure.

Audio/Video distortion experiment: 1000x "I am sitting in a room"

lashto

Major Contributor

MaxwellsEq

Major Contributor

kchap

Addicted to Fun and Learning

lashto

Major Contributor

Killingbeans

Major Contributor

lashto

Major Contributor

Killingbeans

Major Contributor

lashto

Major Contributor

incoherent

New Member

Attachments

MaxwellsEq

Major Contributor

lashto

Major Contributor

lashto

Major Contributor

MaxwellsEq

Major Contributor

lashto

Major Contributor

Killingbeans

Major Contributor

The Signal Chain: How do noise and distortion propagate through my system?

lashto

Major Contributor

The Signal Chain: How do noise and distortion propagate through my system?

Similar threads