Reliability Science in Audio Equipment

EERecordist · Apr 26, 2024

Some time ago I worked in semiconductor reliability for a Fortune 50 company. I have worked on software reliability.

Here is a simplified discussion of quality and reliability for audio equipment. I would term quality as ship and pre-ship quality and reliability as quality over time.

The short version is:

1 Buy equipment with the longest warranty you can find.
2 Buy equipment that can be returned for a money back guarantee within a minimum of a month.
3 When you get the equipment, run it for one week straight 168 hours at 110% of the supply voltage and 140 degrees F. Most people are not set up at home to test the equipment for a few days at high pressure and humidity, or on a shake device.
4 If it fails, return it and get your money back, then buy a different manufacturer's product.
5 Extended software-firmware reliability is essentially impossible

Hardware reliability

Electronics are made from all the resistors, capacitors, inductors, discrete semiconductors, chips, etc. which the audio system maker buys from component vendors.

They are assembled on a PCB with solder. Wires and connectors are added. It is put in a chassis.

Every manufacturer would do some kind of test on the PCB and on the finished product. It is much cheaper to throw out the bad PCB before it becomes a system.

Every part of the above would have one or more failure modes - reason for a failure. Statistically failures are high at the beginning of life, low in the middle of life, and high at the end of life. That is called the Weibull Curve or the bathtub curve.

So how do you build electronics that will last, for say 20 years, without testing them for 20 years?

For that reliability engineers use statistics to calculate acceleration factors for failure modes. The math allows calculation of long term failure probability from accelerated testing.

The accelerated testing could include greater than normal temperature, humidity, voltage, current, vibration, radiation, etc.

The chip vendors (are supposed to) do all of that and produce data sheet variations at different prices. Regular grade will be priced lower than military, automotive, aviation, or space grade.

Since the component vendor has used reliability engineering, the system buyer should not need to test the incoming components. Usually the test equipment is not cheap.

Today the PCB assembly is contracted out. The assembly vendor or systems vendor would test the PCB. That will hopefully uncover soldering problems. The final assembly and testing would likely be contracted out, with testing. Packaging usually would be contracted out.

So you want your audio system equipment maker to choose good quality component vendors.

Japanese, American, Taiwanese, Korean, and German chip makers, foundries, and systems makers have been doing quality and reliability for some time. Where they do a step in another country, they are capable of, and responsible for, managing that quality.

There are many chip failure modes: material impurities/contamination, chip layer defects, electromigration - where the current flow moves metal atoms creating a narrow spot - accelerating to an open circuit, moisture getting to the chip and corroding it, inadequate heat removal through the package, bonding opens and shorts, design flaws resulting in inadequate electrical margins, and more.

A favorite, that a friend of mine discovered, is radioisotopes in the packaging emitting alpha particles which cause bit changes in some memory circuits. That problem was solved; you wouldn't encounter it in audio gear.

Different capacitor technologies have different failure modes. So it is common when restoring old audio equipment to replace capacitors with better specified current day capacitors.

Electronic components have a manufacturing economic lifecycle. So getting drop-in replacements becomes harder over time. This also intersects with the right-to-repair movement and regulation.

Warranty

Equipment makers (should) know their calculated failure rates over time. So they set the purchase price to include the cost of replacements over the warranty period. Usually the labor cost of customer communications, receiving, and shipping, is much greater than the replacement hardware cost.

If the company goes out of business there is no way to support the warranty. If they are acquired, the fate of the warranty will vary.

Software Reliability

This brings us to software reliability, the short version. Most software is designed to only get through the release outgoing testing.

The software is only as good as the testing. Another time I was an engineering manager for an Internet backbone company. It is expected to get every bit through unaltered every time for years. One of my employees in our vendor qualification lab was known as Dr Death. They were very good at finding issues the equipment maker never designed for in software, but could happen with the operations staff. An analogy in audio equipment would be if the device had 4 input selector buttons. What happens if you push all 4 at the same time? Another friend is an academic in provably-correct software for radiation therapy machines.

The software development is made through a tool chain of other software.

The software target is the audio equipment. The brains of digital audio equipment is very often a small computer: microprocessor, microcontroller, etc. The computer runs a real time (enough) operating system and a stack(s) of network communication software. In modern microkernal operating systems, the operating system would have all kinds of libraries, and code to couple to the external world - drivers in Windows-speak. The operating system, network stacks, and libraries have their own parts of the toolchain.

Then hackers will try to penetrate the entire software system. Usually they would use the Internet to get to the equipment. That has resulted in botnets of home routers and security cameras. Anyone in the profession knows there is no air-gap, witness Stuxnet. Hackers can find new flaws, and old flaws in the libraries. Theoretically if your device is not networked it can run reliably for years on its original software.

Very few system companies have the resources to maintain the security of their systems for any amount of time against hackers which also requires maintaining all the associated toolchains, and a way to remotely update the customer software throughout the world.

All of the above is represented in the end-of-life or end-of-support date set by the equipment maker to the customer.

The Philosophy of Quality

Finally, there are many good books written on quality, many readable by non-engineers. There is a contrast in philosophy between the six-sigma approach and the Toyota Way, and the idea that you just take the factory output and bin it out from high to low quality at corresponding price.

pseudoid · Apr 26, 2024

Great summary... I think the thread discussion is about "Reliability Engineering", not 'science' per se!

MIL-HDBK-217, MIL-STD-756 and MIL-HDBK-810 have set the original standards for Hardware 'reliability' engineering.
MIL-STD-497 and DOD-STD-216x are the basic standards for software 'quality' assurance programs.
Notice that even the notion of "software reliability" does not deserve a single military standard, or a handbook or even regulation.

Subjectively speaking, software sucks... the life out of audio hardware and music. And hoping the next (umpteenth) update is going to make software good-enough is like watching the same movie over and over again with the hopes that the ending does not bring tears to your eyes.

EERecordist · Apr 26, 2024

pseudoid said:
Great summary... I think the thread discussion is about "Reliability Engineering", not 'science' per se!

Agree. I just used science in the title because of ASR. Useful, readable, short, and entertaining was my objective.

pseudoid said:
Notice that even the notion of "software reliability" does not deserve a single military standard, or a handbook or even regulation.

Many friends and colleagues worked on the Ada programming language, a spectacular program failure.

We do have aviation and medical software standards, even so they have had spectacular life safety failures. But I shudder to think what would happen in those fields without those standards.

Thanks for reading it!

SSS · Apr 26, 2024

EERecordist said:
Some time ago I worked in semiconductor reliability for a Fortune 50 company. I have worked on software reliability.

Here is a simplified discussion of quality and reliability for audio equipment. I would term quality as ship and pre-ship quality and reliability as quality over time.

The short version is:

1 Buy equipment with the longest warranty you can find.
2 Buy equipment that can be returned for a money back guarantee within a minimum of a month.
3 When you get the equipment, run it for one week straight 168 hours at 110% of the supply voltage and 140 degrees F. Most people are not set up at home to test the equipment for a few days at high pressure and humidity, or on a shake device.
4 If it fails, return it and get your money back, then buy a different manufacturer's product.
5 Extended software-firmware reliability is essentially impossible

Hardware reliability

Electronics are made from all the resistors, capacitors, inductors, discrete semiconductors, chips, etc. which the audio system maker buys from component vendors.

They are assembled on a PCB with solder. Wires and connectors are added. It is put in a chassis.

Every manufacturer would do some kind of test on the PCB and on the finished product. It is much cheaper to throw out the bad PCB before it becomes a system.

Every part of the above would have one or more failure modes - reason for a failure. Statistically failures are high at the beginning of life, low in the middle of life, and high at the end of life. That is called the Weibull Curve or the bathtub curve.

So how do you build electronics that will last, for say 20 years, without testing them for 20 years?

For that reliability engineers use statistics to calculate acceleration factors for failure modes. The math allows calculation of long term failure probability from accelerated testing.

The accelerated testing could include greater than normal temperature, humidity, voltage, current, vibration, radiation, etc.

The chip vendors (are supposed to) do all of that and produce data sheet variations at different prices. Regular grade will be priced lower than military, automotive, aviation, or space grade.

Since the component vendor has used reliability engineering, the system buyer should not need to test the incoming components. Usually the test equipment is not cheap.

Today the PCB assembly is contracted out. The assembly vendor or systems vendor would test the PCB. That will hopefully uncover soldering problems. The final assembly and testing would likely be contracted out, with testing. Packaging usually would be contracted out.

So you want your audio system equipment maker to choose good quality component vendors.

Japanese, American, Taiwanese, Korean, and German chip makers, foundries, and systems makers have been doing quality and reliability for some time. Where they do a step in another country, they are capable of, and responsible for, managing that quality.

There are many chip failure modes: material impurities/contamination, chip layer defects, electromigration - where the current flow moves metal atoms creating a narrow spot - accelerating to an open circuit, moisture getting to the chip and corroding it, inadequate heat removal through the package, bonding opens and shorts, design flaws resulting in inadequate electrical margins, and more.

A favorite, that a friend of mine discovered, is radioisotopes in the packaging emitting alpha particles which cause bit changes in some memory circuits. That problem was solved; you wouldn't encounter it in audio gear.

Different capacitor technologies have different failure modes. So it is common when restoring old audio equipment to replace capacitors with better specified current day capacitors.

Electronic components have a manufacturing economic lifecycle. So getting drop-in replacements becomes harder over time. This also intersects with the right-to-repair movement and regulation.

Warranty

Equipment makers (should) know their calculated failure rates over time. So they set the purchase price to include the cost of replacements over the warranty period. Usually the labor cost of customer communications, receiving, and shipping, is much greater than the replacement hardware cost.

If the company goes out of business there is no way to support the warranty. If they are acquired, the fate of the warranty will vary.

Software Reliability

This brings us to software reliability, the short version. Most software is designed to only get through the release outgoing testing.

The software is only as good as the testing. Another time I was an engineering manager for an Internet backbone company. It is expected to get every bit through unaltered every time for years. One of my employees in our vendor qualification lab was known as Dr Death. They were very good at finding issues the equipment maker never designed for in software, but could happen with the operations staff. An analogy in audio equipment would be if the device had 4 input selector buttons. What happens if you push all 4 at the same time? Another friend is an academic in provably-correct software for radiation therapy machines.

The software development is made through a tool chain of other software.

The software target is the audio equipment. The brains of digital audio equipment is very often a small computer: microprocessor, microcontroller, etc. The computer runs a real time (enough) operating system and a stack(s) of network communication software. In modern microkernal operating systems, the operating system would have all kinds of libraries, and code to couple to the external world - drivers in Windows-speak. The operating system, network stacks, and libraries have their own parts of the toolchain.

Then hackers will try to penetrate the entire software system. Usually they would use the Internet to get to the equipment. That has resulted in botnets of home routers and security cameras. Anyone in the profession knows there is no air-gap, witness Stuxnet. Hackers can find new flaws, and old flaws in the libraries. Theoretically if your device is not networked it can run reliably for years on its original software.

Very few system companies have the resources to maintain the security of their systems for any amount of time against hackers which also requires maintaining all the associated toolchains, and a way to remotely update the customer software throughout the world.

All of the above is represented in the end-of-life or end-of-support date set by the equipment maker to the customer.

The Philosophy of Quality

Finally, there are many good books written on quality, many readable by non-engineers. There is a contrast in philosophy between the six-sigma approach and the Toyota Way, and the idea that you just take the factory output and bin it out from high to low quality at corresponding price.

Well done presentation on qualitiy. Decades ago working for hp there were components and hardware quality control standards which maintained the hp renowned long life instruments usage without problems. But times had changed and there is no need to build equipment which will last for 50 years because of fast technology changes. My old 40 years old oscilloscopes, voltmeters, signal generators still work like at the first day. At those days more and more was software controlled. Programming software reliably is hard work with clever architectural design needed. Complex big software programs with dependance on software libraries can be an nightmare. Myself I best use audio hardware without microprocessor. But modern DACs have software in it, so I live with this.

pseudoid · Apr 27, 2024

EERecordist said:
Thanks for reading it!

No, no!
Thanks for posting!

Audiofire · May 8, 2024

EERecordist said:
3 When you get the equipment, run it for one week straight 168 hours at 110% of the supply voltage and 140 degrees F. Most people are not set up at home to test the equipment for a few days at high pressure and humidity, or on a shake device.

Well, that is certainly a rigorous accelerated stress test. In reality, specifications of audio equipment are not necessarily that high (then you're deliberately damaging it).

EERecordist said:
4 If it fails, return it and get your money back, then buy a different manufacturer's product.

I would return it if it does not fail too (money-back guarantee), then get another (of the same) one.

Smaestro · Jul 20, 2024

Hey there, I'd like to tack on my personal experience. I've worked as an RE and later formed and ran reliability engineering teams. I did this in operations, so to improve existing equipment, not in the design phase where you have large influence over the design.

There are databases with collected failure data from all sorts of industries. This is used. combined with inhouse testing and (for example) a Weibull analysis, to design for reliability.

However if you look at electronics equipment as a whole, the data shows mostly random failure rates. An amplifier, without obvious flaws as running out of spec, is almost just as likely to die at 1st year in its life as in its 40th year. (Graph E below)

Opposed to mechanical failure rates which for example show a 'Bathtub' curve (Graph A): a lot of parts failing in the beginning, at some end of life point, but not in between.

The sheer amount of parts on a pcb makes it numerically more likely to become random, but also some parts simply dont exhibit time or use based wear.

Some parts like capacitors are likely to leak after some time so it's not black and white.

But from our analysis, electrical components fit that description of being unpredictable very well, unless parts were misused (mounted outside in non waterproof housing, stresses on cables, etc).

Edit: ignore the percentages in the image, it's just about the failure rate graph shapes. The percetages is how many parts researched in those studies fit those graphs. They are different from study to study because one focused on the Navy (lots of salt corrossion) while another focussed on production, etc.

SSS · Jul 20, 2024

Smaestro said:
Hey there, I'd like to tack on my personal experience. I've worked as an RE and later formed and ran reliability engineering teams. I did this in operations, so to improve existing equipment, not in the design phase where you have large influence over the design.

There are databases with collected failure data from all sorts of industries. This is used. combined with inhouse testing and (for example) a Weibull analysis, to design for reliability.

However if you look at electronics equipment as a whole, the data shows mostly random failure rates. An amplifier, without obvious flaws as running out of spec, is almost just as likely to die at 1st year in its life as in its 40th year. (Graph E below)

Opposed to mechanical failure rates which for example show a 'Bathtub' curve (Graph A): a lot of parts failing in the beginning, at some end of life point, but not in between.

View attachment 382050

The sheer amount of parts on a pcb makes it numerically more likely to become random, but also some parts simply dont exhibit time or use based wear.

Some parts like capacitors are likely to leak after some time so it's not black and white.

But from our analysis, electrical components fit that description of being unpredictable very well, unless parts were misused (mounted outside in non waterproof housing, stresses on cables, etc).

Edit: ignore the percentages in the image, it's just about the failure rate graph shapes. The percetages is how many parts researched in those studies fit those graphs. They are different from study to study because one focused on the Navy (lots of salt corrossion) while another focussed on production, etc.

Working at hp late 1970ies in the production of desktop computers we tried to measure the early failures especially on integrated TTL circuits on the PC boards. Using Weibull and Deming there was no real clue which helped. Indeed the failures were random. And my experience with my measurement gear (hp, Keithley and other good brands) and audio gear seem get random failures. Of course as you stated some capacitors have limited life but can last very long. Almost no failures of metal film resistors when not overloaded. Carbon composite resistors are nightmare, not used much anymore. Carbon film resistors last pretty fine which shows at my ancient Rohde&Schwarz RF generator from around 1960.

Petrushka · Jul 20, 2024

I briefly worked at a company called Software Quality Engineering. Basically a clearing house for software testing consultants.

The big product was conventions.

pseudoid · Jul 20, 2024

Smaestro said:
Smaestro said:

However if you look at electronics equipment as a whole, the data shows mostly random failure rates. An amplifier, without obvious flaws as running out of spec, is almost just as likely to die at 1st year in its life as in its 40th year. (Graph E below)

Click to expand...

Smaestro said:

I did this in operations, so to improve existing equipment, not in the design phase where you have large influence over the design.

Click to expand...

The sheer amount of parts on a pcb makes it numerically more likely to become random, but also some parts simply dont exhibit time or use based wear.

SSS said:
....we tried to measure the early failures especially on integrated TTL circuits on the PC boards.

I am getting a feeling -- from these above statements -- that indicate a certain impossibility to determining the MBTF of electronics hardware and nothing could be further from the truth.
Reliability, QualityAssurance and Serviceability are all technical Engineering disciplines and aren't black sciences. To many, they may look elusive; though they are achievable.

During World War II, the V-1 missile team, led by Dr. Wernher von Braun, developed what was probably the first reliability model. The model was based on a theory advanced by Eric Pieruschka that if the probability of survival of an element is 1/x, then the probability that a set of n identical elements will survive is (1/x)n . The formula derived from this theory is sometimes called Lusser's law (Robert Lusser is considered a pioneer of reliability) but is more frequently known as the formula for the reliability of a series system: Rs = R1 x R2 x . . x Rn

This was the old war days but military learned the hard way that ignorance of operational readiness/reliability needed standardization and never to be handled as an after-thought... after the WWII. This necessity (for standardization) was applied to all defense department' procurements. Fortified/additional standards were later mandated for NASA programs. Not one single of NASA's Shuttles would have been launched without years of exhaustive MTBF calculations for every piece-part was accounted for. No black sciences involved here either.

MIL[-STD-]217 provides a mechanism for calculating MTBF based on a variety of factors that include solder joint counts, IC complexity and passive component counts of various types. A thorough calculation will also take into account the stress levels on each component (e.g., percentage of rated power rating, ambient conditions) and is referred to as a stress-based calculation. Calculations include numbers for mechanical components such as fans or connectors. There is also a simpler method of calculation based on parts count for each component type, which generally produces much higher failure rates; this is contained in Appendix A of MIL 217F Notice 2.

Relatively milder reliability and quality requirements for commercial aircraft and vehicles also became necessary. Consumer hardware and electronics manufacturers were not immune, if they wanted to compete for consumer $$s.
Then, some weird inversion happened with the cart and the ox (reverse incest?): In the past 2 decades or so, COTS has been driving defense acquisitions.

I did not mean this to be so blabbery but if you really like such a topic, check out all 1000+ pages of MIL-HDK-338B or if you dig statistics and want the TL&DR; search for "Bayesian Statistics in Reliability Analysis".
yawn.gif

Audiofire · Jul 20, 2024

pseudoid said:
I did not mean this to be so blabbery but if you really like such a topic, check out all 1000+ pages of MIL-HDK-338B or if you dig statistics and want the TL&DR; search for "Bayesian Statistics in Reliability Analysis".

For pictures that are easier on the eyes and related to our interests:
NASA Workmanship Standards

"Workmanship is defined as the control of design features, materials and assembly processes to achieve the desired durability and reliability for subassembly interconnections, specifically those in printed wiring assemblies and cable harnesses, and the use of inspection techniques and criteria to assure interconnect quality." (nepp.nasa.gov)

RayDunzl · Jul 21, 2024

Our joke for Telephone Central Office Software, NEC Switching Systems Division, 1980's:

"Software is never finished.

You just have to tell the customer it is."

An actor pretending to push buttons on the Master CPU of a multi-processor switch that could handle 100,000 phones when maxed out.

SSS · Jul 21, 2024

pseudoid said:
I am getting a feeling -- from these above statements -- that indicate a certain impossibility to determining the MBTF of electronics hardware and nothing could be further from the truth.
Reliability, QualityAssurance and Serviceability are all technical Engineering disciplines and aren't black sciences. To many, they may look elusive; though they are achievable.

comment:
Of course we knew MTBF and failure mechanisms as well as that the more components the pc board had the probability for failures rise. But if the failure rate is low and affects only few components different types then MTBF calculation can be done but I think will not really predictable valid.

Audiofire · Jul 21, 2024

SSS said:
MTBF calculation can be done but I think will not really predictable valid.

MTBF is the arithmetic mean of failure times, so not meant to be more than an estimate anyway.

pseudoid · Jul 21, 2024

Audiofire said:
MTBF is the arithmetic mean of failure times, so not meant to be more than an estimate anyway.

My understanding is that it is -- a relative, and -- a statistical method of determining predictive reliability of a system based on predictive 'mortality' of it's subsystems and components.
R&D engineers (in defense, space, airline, vehicular, medical electronics) used to think of ReliabilityEngineering (RE) as a nuisance and just another little nit that hinders programs' designs/schedules/costs/etc., ...until it was proven otherwise and RE efforts were required to be interleaved within designs, proposals and budgets.
Sony had a 9-volume quality design criteria that were based on such predictive MBTF of each component.
The contention used to be "You could pay me now, or you can pay me later!" << in this case "me" being RE.

pseudoid · Jul 21, 2024

pseudoid said:
R&D engineers (in defense, space, airline, vehicular, medical electronics) used to think of ReliabilityEngineering (RE) as a nuisance

I was one of those EEs who realized that REs actually made our jobs easier and we could offload some of our work on them.

EERecordist · Jul 21, 2024

pseudoid said:
I was one of those EEs who realized that REs actually made our jobs easier and we could offload some of our work on them.

My project out of school was killed over quality. In the process, I used the expertise of our new reliability analysis lab senior director hire from a government lab. On my project, they had a network of advanced contract labs. That was fuel to build up the internal lab and to bring those tools in house.

Subsequently I worked further in reliability and met Deming. I built a global manufacturing reliability database which allowed every released product to trace back to every reliability input measurement. RE were partners with trust in both directions. No one wants mass production of defects.

I'm in the reliability and quality religion. I bring it to every project.

pseudoid · Jul 21, 2024

pseudoid said:
Great summary... I think the thread discussion is about "Reliability Engineering", not 'science' per se!

EERecordist said:
I'm in the reliability and quality religion.

We can all stand up for reliability and quality but that last word is... :facepalm:

Salt · Jul 21, 2024

MTBF of Males is 78 years, Females 84 years here, statistically.
That's the human aspect.

Meaning MTBF as used in technical stuff my personal experience is D. : if DUT survives the first 4-8 weeks, it will work until it's not needed anymore.

Smaestro · Jul 24, 2024

pseudoid said:
I am getting a feeling -- from these above statements -- that indicate a certain impossibility to determining the MBTF of electronics hardware and nothing could be further from the truth.
Reliability, QualityAssurance and Serviceability are all technical Engineering disciplines and aren't black sciences. To many, they may look elusive; though they are achievable.

This was the old war days but military learned the hard way that ignorance of operational readiness/reliability needed standardization and never to be handled as an after-thought... after the WWII. This necessity (for standardization) was applied to all defense department' procurements. Fortified/additional standards were later mandated for NASA programs. Not one single of NASA's Shuttles would have been launched without years of exhaustive MTBF calculations for every piece-part was accounted for. No black sciences involved here either.

Relatively milder reliability and quality requirements for commercial aircraft and vehicles also became necessary. Consumer hardware and electronics manufacturers were not immune, if they wanted to compete for consumer $$s.
Then, some weird inversion happened with the cart and the ox (reverse incest?): In the past 2 decades or so, COTS has been driving defense acquisitions.

I did not mean this to be so blabbery but if you really like such a topic, check out all 1000+ pages of MIL-HDK-338B or if you dig statistics and want the TL&DR; search for "Bayesian Statistics in Reliability Analysis".
yawn.gif

I don't disagree with your post. I will clarify however that 99,9% of all electrical equipment in the world is not made to NASA spec or even MIL spec. In the industrial world (chem plants, factories, trains, elevators), electronics do behave more or less randomly. If that is unacceptable, then it gets improved with fail-safes and redundancies, rather than with NASA spec design, manufacturing and testing.

Bringing it back to the topic of audio equipment, I assume no manufacturer comes close to either these standards, but I can't say. Edit: I missed your remark on Sony.

Edit: a small critique on MTBF:
In operations, MTBF is just a mean, which is IMO good for management summaries on reliability and general monitoring of asset function. But if you want your space shuttle to not fail, or understand the behavior of your fleet is trains, distributions is where it's at. I never liked MTBF because a couple of 'hero' parts can mask reliability issues.

Reliability Science in Audio Equipment

Senior Member

Master Contributor

Senior Member

Senior Member

Master Contributor

Addicted to Fun and Learning

Member

Senior Member

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Grand Contributor

Senior Member

Addicted to Fun and Learning

Master Contributor

Master Contributor

Senior Member

Master Contributor

Major Contributor

Member

Similar threads