• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). There are daily reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Why CPU's fail - Video

JSmith

Major Contributor
Joined
Feb 8, 2021
Messages
3,466
Likes
7,983
Location
Algol Perseus
Quite a good roundup on why CPU's fail and why failure is more frequent now;



JSmith
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
17,043
Likes
29,643
I don't know that I've experienced anything in the video. Each bit of info is true and possible. I don't experience 3 year old phones getting slower because of a dying CPU. More likely is upgraded OS that is more intense and battery failure. Never had a CPU fail in a PC. Maybe the smaller production processes are reaching that point where it matters, but I felt everything about the video was more FUD than useful. I've had almost everything in a PC fail at some point in time except a CPU. I did have a CPU dead out of the box once.

I saw one commenter said it really isn't an issue until you get into CPU's below the 22 nanometer process. The silicon oxide layer is about 2 atoms thick at that point. That would be 2012 Ivy Bridge and Haswell for Intel as the last 22 nm process CPU. Anyone seeing CPU failures uptick in recent gen CPU's?
 
Last edited:
OP
JSmith

JSmith

Major Contributor
Joined
Feb 8, 2021
Messages
3,466
Likes
7,983
Location
Algol Perseus
Anyone seeing CPU failures uptick in recent gen CPU's?
1669860831004.png



I've not had a CPU failure in a PC... but servers, yes.


JSmith
 
OP
JSmith

JSmith

Major Contributor
Joined
Feb 8, 2021
Messages
3,466
Likes
7,983
Location
Algol Perseus
everything about the video was more FUD than useful
That's unfortunate as @AnastasiInTech tends to really know her subject areas quite well.

It is quite factual to state logic gates are physically smaller in size and it makes them less durable so they can reach EOL sooner.


JSmith
 

dlaloum

Major Contributor
Joined
Oct 4, 2021
Messages
1,809
Likes
1,251
I have a collection of vintage CPU's that have been replaced due to performance upgrades... but over the last 40 Years + (starting with Z80's, and 6502's) I have very very rarely seen CPU failures.

Obviously you have to cool them properly... and properly, varies depending on the TDP of the CPU.... I can run a Ryzen 5700G in a heatsink case, with heatpipes connecting it to the case walls which are heatsinks... constrained to around 75W it runs perfectly well. - In the same setup, I could NOT run a CPU with 120W+ TDP... it would cook itself (or shut itself down as it self protected)

I also run my own server at home - to keep all my audio and video files.... it runs around the clock, and I have the fans turned down to minimise noise - but still with a decent heatsink mounted, it has been running 24/7 for 4 years without any issues... (it also acts as a VM host for various experiments)

Having said that - these are environments in which I can engineer my own cooling solutions.

My Onkyo AVR toasted its HDMI board years ago - it ran seriously hot, and had no heatsinking on it... average life on that generation of HDMI boards was 3 to 5 years... with 5 years being fairly rare.

My subsequent Integra AVR had issues with its DSP processor - again, it runs seriously hot, the chip warps due to hear, and then disconnects the Ball Grid Array (BGA) connection points.... effectively bricking the AVR.

In both of these AVR examples - decent Heatsinking would probably have avoided the problem - and could not be done without voiding the warranty... (I should have done it regardless...)

My first question looking at my latest AVR purchase, was, as a result, "how hot does it run". (those previous AVR's ran so hot, you could not keep your hand on the top of the case, without serious discomfort)... my current Integra 3.4 runs very cool - that is, when running the top of the case barely warms up a little - That is a good sign for its potential longevity.
 

dlaloum

Major Contributor
Joined
Oct 4, 2021
Messages
1,809
Likes
1,251
That's unfortunate as @AnastasiInTech tends to really know her subject areas quite well.

It is quite factual to state logic gates are physically smaller in size and it makes them less durable so they can reach EOL sooner.


JSmith
Yes - they are more sensitive to overheat/overvoltage issues - but if used properly, and coddled as they should be, there is no reason they should not have a long life.

The problem is that often they are run too close to their extreme margins ... as designers try to eke out higher performance, overclocking etc...

The key to long life is good cooling, and undervolting - keeping well away from the maxima of the design.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
39,929
Likes
184,945
Location
Seattle Area
I had not heard of this channel before. I watched the video and while it hits on some topics, it seems to be collection of google searches than real investigation here. Yes, the feature size (transistor) is far smaller than before. But that also comes with operation at much lower voltage, and lower dissipation. We would not have been able to scale CPUs if we had not found countermeasures for all the problems she mentions. She says the 386 CPU is highly reliable. Well, it had far more transistors than 8088 CPU. So how come it didn't suffer?

By far the highest density semiconductors are memories. If density is the problem, why isn't she talking about them dying instead of CPUs?

Towards the end she gets more realistic realizing it is heat dissipation that is main source of failures. So many devices are running without adequate cooling and that is what drives their shorter life. But even that can have mitigation with automatic power throttling. More than once I have found my Samsung phone so hot I could not touch it due to wireless charging not working with it. It has happily survived all of this.

So no, I don't buy that we have gone backward because the density has increased.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
17,043
Likes
29,643
That's unfortunate as @AnastasiInTech tends to really know her subject areas quite well.

It is quite factual to state logic gates are physically smaller in size and it makes them less durable so they can reach EOL sooner.


JSmith
As I said, everything is true. As far as that goes. You provided info that servers are seeing more failure rates. I view her site from time to time. It seemed like an effort to make someone worry about their CPU having limited life. It doesn't appear to me that it is a worry yet, but hey maybe in a few years we'll see lots of failures. I'm typing this on a home server with a Haswell CPU which is not so far from that 100,000 hrs of use she mentioned. Should I expect if I upgrade, which I actually will do soon I think, that I shouldn't expect 10 years of life? Obviously many devices aren't on 24/7, but this home server I have is and has been since it was new.

After years of getting cheaper and faster PC's are now getting faster, but if anything going up in price. If they also cannot be counted on for so many years, then a double whammy on the value.

I also notice the 11th gen Intels have a higher failure rate than the AMD units, but the AMD units are a smaller process. So more than just process size. Doesn't make me feel good about the 2020 11th gen I7 Macbook Pro I have though.
 

AnalogSteph

Major Contributor
Joined
Nov 6, 2018
Messages
2,505
Likes
2,386
Location
.de
also notice the 11th gen Intels have a higher failure rate than the AMD units, but the AMD units are a smaller process. So more than just process size. Doesn't make me feel good about the 2020 11th gen I7 Macbook Pro I have though.
Your CPU is an entirely different beast than Rocket Lake for the desktop (different process, package and everything). I'd have more to worry about, being the proud owner of an i7-11700.

The most commonly failing desktop CPU in recent times may be the AMD Ryzen 3600. A guy who commonly fixes bent pins on these also said this generation Ryzen has noticeably poorer PCB material than those preceding and following. So a lot more to it than just feature size.
 

Sokel

Major Contributor
Joined
Sep 8, 2021
Messages
2,190
Likes
1,712
I have an i5 680 running at 4.4Ghz (stock 3.6Ghz) almost 24/7 in one of my PC's for 11 years using a generic Corsair watercooler at lowest speed.
I think I'll die and this bastard will be still running.
 

nebunebu

Member
Joined
Sep 20, 2022
Messages
14
Likes
8
I have never experienced a CPU gone totally dead on me. However, I've experienced multiple times were AMD cpu's have been the cause for applications failure.
Probably the best CPU I have been operating are the IBM Power CPU's, 90% load running for years and years without any problems.
Tho - our cooling system is quite impressive, we have a pipe 100 meters down into the sea giving a consistent temperature of approx 8 Celcius. The sea water is cooling our fresh water systems which is then pumped to datacenter cubes, sending cold air at the front of the servers, trapping the heat at the back.
 

dlaloum

Major Contributor
Joined
Oct 4, 2021
Messages
1,809
Likes
1,251
I have never experienced a CPU gone totally dead on me. However, I've experienced multiple times were AMD cpu's have been the cause for applications failure.
Probably the best CPU I have been operating are the IBM Power CPU's, 90% load running for years and years without any problems.
Tho - our cooling system is quite impressive, we have a pipe 100 meters down into the sea giving a consistent temperature of approx 8 Celcius. The sea water is cooling our fresh water systems which is then pumped to datacenter cubes, sending cold air at the front of the servers, trapping the heat at the back.
Very... Cool...
 

voodooless

Master Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
6,085
Likes
10,084
Location
Netherlands
By far the highest density semiconductors are memories. If density is the problem, why isn't she talking about them dying instead of CPUs?
To be fair, I think memory fails much easier than any CPU. An 8 GB DDR4 stick has about 70 billion transistors. An i9-12900K has "only" about 3 billion transistors. That is a massive difference indeed. But density-wise, they are usually not made with the smallest process: 10 nm, while the i9 is 7 nm. More impressive is the 57 billion transistors of the Apple M1 Max at 5 nm. The M2 Max will probably add another 20%. This is excluding memory. The latest 4 nm Nvidia GPUs exceed the DDR4 stick with 76 billion transistors! Talk about heat-death...

I couldn't watch that video to the end, that voice is just horrible o_O
 

jae

Addicted to Fun and Learning
Joined
Dec 2, 2019
Messages
977
Likes
1,171
I've only had maybe one fail, after a lot of abuse with liquid helium/nitrogen
 
Top Bottom