• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Beyond FLAC - AI big data compression?

Graham849

Active Member
Forum Donor
Joined
Jul 14, 2022
Messages
233
Likes
204
Location
Australia

AnalogSteph

Major Contributor
Joined
Nov 6, 2018
Messages
3,395
Likes
3,343
Location
.de
It better be. I have no idea where they are getting their 43.4% for ImageNet and 16.4% for LibriSpeech from when the table says 48.0% and 21.0% at best, respectively. Still better, but anyway.

It also takes a 140 GB model size at that point, and who knows how much in terms of computing resources / time. I bet you could also make a conventional algorithm do better if you throw that much at it.
 

AudiOhm

Senior Member
Joined
Oct 27, 2020
Messages
411
Likes
410
Location
London, Ontario, Canada
In this day an age of incredible internet/CPU speeds, and massive storage capabilities, why compress anything?

Ohms
 

Vladetz

Active Member
Joined
Jul 20, 2020
Messages
132
Likes
85
Location
Russia
In this day an age of incredible internet/CPU speeds, and massive storage capabilities, why compress anything?

Ohms
To get better quality with same size
 

CapMan

Major Contributor
Joined
Mar 18, 2022
Messages
1,109
Likes
1,889
Location
London
Because of hosting and transit charges, this costs streaming service companies a lot.


JSmith
Genuine question - from a purely environmental impact, is it better to send big files around the internet and server farms or use CPU grunt to compress them and send / manage smaller files.

Frankly whichever causes the least environmental harm is the right answer (IMHO)
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,059
Likes
36,460
Location
The Neitherlands
Frankly whichever causes the least environmental harm is the right answer (IMHO)
That may well be from an environmental viewpoint. However, I think the majority of these decisions is made from an economical benefit.
 

JPA

Active Member
Forum Donor
Joined
Mar 21, 2021
Messages
157
Likes
266
Location
Burque
Genuine question - from a purely environmental impact, is it better to send big files around the internet and server farms or use CPU grunt to compress them and send / manage smaller files.

Frankly whichever causes the least environmental harm is the right answer (IMHO)
It costs more in power (energy) to move data around than to compress/decompress it.
 

palm

Member
Joined
Mar 4, 2023
Messages
74
Likes
62
The tradeoff is different when compressing material for your own use and when it is streamed at a very large scale
 

stunta

Major Contributor
Forum Donor
Joined
Jan 1, 2018
Messages
1,156
Likes
1,403
Location
Boston, MA
And there is a cost to decompressing too... its just distributed.
 

TonyJZX

Major Contributor
Joined
Aug 20, 2021
Messages
2,010
Likes
1,957
doesnt this butt into the basic laws of entropy

a data file be it a flac or excel spreadsheet can only be compressed down so much IF you wanted to maintain the 'integrity of the data'

for an excel sheet you cannot have any lesser than 100% of the data maintained

for flac its the same thing

for MP3 it is NOT the same thing

for JPG it is NOT the same thing for obvious reasons

for JPG MP3 you can remove 'data' if human senses do not detect the change

for an Excel sheet you will detect the change even if its changed by a fraction of a cent (for example)

ie. there's nothing left or no repetition left to throw out
 

Zoomer

Senior Member
Joined
Oct 5, 2019
Messages
323
Likes
469
The research paper is titled "Language Modeling Is Compression," and if I understand the Ars article correctly it explores the idea that "the ability to compress data effectively is akin to a form of general intelligence... So theoretically, if a machine can compress this data extremely well, it might indicate a form of general intelligence—or at least a step in that direction."

A much more interesting question than "will we have smaller FLAC files next year."
 

CapMan

Major Contributor
Joined
Mar 18, 2022
Messages
1,109
Likes
1,889
Location
London
It costs more in power (energy) to move data around than to compress/decompress it.
I guess an analogy is soft drinks manufacturers who produce and ship the concentrate which is rehydrated and bottled in market for resale?
 

RandomEar

Senior Member
Joined
Feb 14, 2022
Messages
335
Likes
776
The research paper is titled "Language Modeling Is Compression," and if I understand the Ars article correctly it explores the idea that "the ability to compress data effectively is akin to a form of general intelligence... So theoretically, if a machine can compress this data extremely well, it might indicate a form of general intelligence—or at least a step in that direction."

A much more interesting question than "will we have smaller FLAC files next year."
I'm sorry, but I think that's just arstechnica dreaming. Current LLMs are trained pattern generators. Very advanced pattern generators admittedly, but there's nothing "intelligent" in the fact that they can replicta patterns in audio or image data despite being trained on text. It's just somewhat different patterns. LLM have no understanding of what they are doing, they lack the ability to reflect or make logical deductions and don't recognize their own mistakes.

Concerning the article: It's not surprising at all that an LLM over a hundred GB in size can compress data better than a PNG implementation which is typically below 1 MB. I know I may sound disillusioned saying this, but in the end, it's a trivial "cheat": They simply took some of the file's entropy and packed it into the compression algorithm a.k.a. the LLM. If you send a file turbo-compressed by the LLM to anyone else, they still need to acquire that LLM to decompress anything. Typical compression algorithms have implementations which are a couple of MB and their size is usually irrelevant considering the amount of data that is being processed. The opposite is true for the LLM-approach presented here. The LLM approach "cheats" by requiring you to effectively pre-load fractions of billions of files' entropy before even starting to work. That's why it compresses well.

Compression always is a trade off between algorithm complexity, runtime computational effort and file size. According to this article, LLM's don't change that a all. They trade insane algorithm complexity and a significant runtime computational effort for somewhat smaller files.
 

JPA

Active Member
Forum Donor
Joined
Mar 21, 2021
Messages
157
Likes
266
Location
Burque
I guess an analogy is soft drinks manufacturers who produce and ship the concentrate which is rehydrated and bottled in market for resale?
Yes, that's not a bad analogy.
 

Timcognito

Major Contributor
Forum Donor
Joined
Jun 28, 2021
Messages
3,566
Likes
13,368
Location
NorCal
Not sure where to put this but $400 of AI, Python, Excel, and Machine Learning software (42 apps) for $25. Not a scam. I have bought many bundles from these guys, mostly CAD stuff. Just picked the most recent AI post on ASR. FYI
 
Top Bottom