G'day everyone. So thankful to Amir and the staff and everyone here for the value in great info on this forum. I like the humour too. The humour also came out on Amir's youtube channel, which also has that same awesome standard of valuable but also enjoyable content that I found here. So I'm new and in In trying to give back to you all and contribute, maybe the yarn I have to spin here might be useful or interesting to some.
TL\DR: No time for a yarn, just skip down to not the summary bullet points but Conclusion
Hope it helps, thanks folks
A Yarn on Digital Risks With Audio
There is many reasons why digital audio like music or duplex voice communications or many other types of audio might be important. Important to a family because they have late grandmas song, a song she sung and now lasts ongoing captured in an encoded digital bitstream. Important to a sovereign nation state protecting operations with their strategic ballistic nuclear submarine fleet. Important to the young man full of hopes and dreams that he will be able to build a recording business and move out of his moms home. There are risks. More than once, I've been deeply saddened when its gone wrong and good people are looking for help with dealing with the consequences of those risks being realised. Were talking losing money, loosing human lives, loosing property. Sometimes, painfully so at tremendous consequence.
Let's chat about Mr Murphy and the laws of murphy. It's an air force thing so it's a bit painful to give it recognition hahaha. The might of the navy may navigate by the stars, a grand army might sleep under the stars, and here we have the soft ass Air Force rating their hotels by stars
All's fair in love and war so yeah - what can happen is probably gunna happen at some point. Best to have a plan. Best to deliver treatment of risks and avoid the consequences on realisation of risks.
One of those digital risks is about having integrity in data processing architectures, as well as having robust data storage architectures. Let's say there is a sensitive goth emo type in a family, who has been entrusted to look after late grandmas audio, and has been stressed in the past with all this and being a bit of a precious snowflake is passively combative on treating risks. "How dare you Bathrone! Grandmas song Ill have you know is totally ok. We've got grandfather father son method backups, including being distributed across multiple sites, plus like we have a multi disk storage array from this famous company who are a bunch of experts and cost a fair bit at the store, that comes with inherent redundancy through RAID there too. If I hear another word from anyone in the family on this, and something happens to Grandma's song, well they can bloody well take my left nut!"
Placing bets on the left nut is an Aussie tradition. Were all descended from convicts, so don't ask haha. Importantly however, if any of you ever come to Australia, never ever never get it wrong and say the right testicle. For you see, that's what Hitler had, and were the good guys. It's the equivalent of directing artillery fire and in having audio problems needing clarification, failing to indicate say again on the radio, by mistake asking to repeat, and having a drop short horror show of allied limbs and guts flying everywhere. So, its say again not repeat, and its left nut.
So. Right. Mr grasshopper thinks he's sweet with grandams stuff cos he's all high tech and all hi fi and stuff. In helping him not be a turkey to his late grandmother, we do the right thing and help out. Here I'll spare you the bizarre nuances of Australian culture between what is a grasshopper and what the definition of a turkey is. For it is enshrined biblically in a movie we made called Mad Max and apart from watching that film, the nebulous vagaries of Aussies can only be understood in person over many beers if any of you visit. Countless beers are required to grasp the subtlety of for example why calling yourself or your mates a phallus head for doing something useful or good is a badge of honour, and how the subtle variance of being a sphincter head needs careful application in select circumstances only.
Digital signals that are subject to processing can and do go wrong for many reasons. Things like clocks, like buffers, like even errors in the circuit logic itself causing processing errors beyond anything to do with problems in clock timing or in buffers. How can a bunch of experts who are market leaders globally in for example CPU's have processing errors? Can that be right? Most certainly it is, case in point reference is the famous Intel Pentium bug: https://en.wikipedia.org/wiki/Pentium_FDIV_bug
That bug caused massive consequences globally in so many areas. Medicine, Civil Engineering, Vehicle/Aeronautical/Space engineering, Defence/Intelligence and many many others.
What's necessary, is to understand the entire architecture end to end of how digital signal streams are processed, to identify risk and either treat those risks effectively, or worse case assume the risks and have plans for handling the consequences of them. It takes a deep technical understanding of the architecture being analysed. There is cases of defensive techniques put in place within the design, that need to be assessed as to effectiveness. A common one is defending by using Error Correction Code (ECC) within the architecture. The amount of buffering used in the stream within the sum context of clock jitter, temperature differences, maximums in latency, different RF noise/inductance/capacitance in different signal traces when pushing higher bus frequencies etcetc. Basically, understanding what could possibly go wrong and defending against it.
Another part of this, isn't about digital stream processing, but in the architecture of digital storage. The same method can be applied. It's helpful to take these distinct aspects and analyse those, rather than trying to chew the elephant across the enormity of the whole DAW, or whole storage array or whole backup architecture. Doing the whole elephant is prone to missing important details in the analysis.
What if Mr Grasshopper responds with "Well I just buy from expert companies, I buy hardware used globally and is proven and known, therefore everyone knows the real story and you know nothing so on your bike mate"
I'd be like "LOL yeah mate" which is yet another Aussie nuance, and if in laughter I actually cared and wanted to help the bloke not unknowing mess up grandma's precious stuff, I'd say, well one, things like thinking the experts have it covered cos their experts alone is an appeal to authority which is a fallacy. And two, is thinking along the lines of global sale, global use and the notion of "everyone" is an argumentum ad populum; meaning an appeal to the number of people who believe` something alone makes it therefore true which is another obvious fallacy in thought. Thirdly, it's kinda strawman fallacy tactic, to suggest that significant parts of the global population all "somehow know and agree" that there is no possibility of processing or storage errors in those architectures commonly found for sale in stores with things like laptops, desktops, small NAS storage etcetc as is common architectures in say DAW setups. Strawman is about trying to falsely portray a statement by misrepresenting or weakening someone's position in an illogical attempt to "refute" it. Fourth, I'd say even the most casual observation is going to conclude that an argument based on "you know nothing" or your "dumb" and concluding "therefore I'm right" is another fallacy; the ad hominem fallacy.
Let's say all hope in humanity isnt lost on good ol Mr Grasshopper and he's able to demonstrate due rationalism, cos aftercall he's a convict like the rest of us bastards down under. So on reflection he says "Well struth mate! What ya saying?! Christ"
I'm saying, even if your not a designer engineering a system, not a commander running operations in a strategic nuclear ballistic submarine, not a test and evaluation laboratory expert being paid to identify and fix a megabuck critical problem, let's say you simply want to do cool stuff and be awesome in use cases like buy audio communication equipment for your part time fireman crew, like buy storage for granny, like buy a backup system for granny, like live your dreams and hopes of making money in a home studio recording to get out of mums house into your own place. With those use cases, I'd say this, know what your buying and buy appropriately to your needs having full understanding of risks, then consider costs/features/performance etctec cos Mr Murphy is painful.
"Ummm yeah cool but that's as nebulous as the subtleties of Aussie lingo". Yep. I'm sympathetic. The problem is the world is a messy and complex place. The field of digital processing and storage is very much like that. People in the industry like to build models and build architectures to more rapidly do cool things, more rapidly focus on delivering solutions they care about, and I think over time the industry has made progress on using techniques like abstraction layers for example. Windows uses a hardware abstraction layer, and many other layers in doing cool DAW stuff.
"Awww cmon mate me left nut is bloody on the line! mmmmaaaaatttteeeeee its grannys song!!" This harmony of the influentially intoxicating "mmmmaaaaatttteeeeee" is a super power all Aussie have. It is a Siren Call, from an ancient power source that illicits sympathy and empathy to all true convicts within audible hearing. It compels to render immediate assistance, to ol mate. Now, ol mate mind you can refer to a complete and utter stranger, who did nothing more than demonstrate the artistry of elucidating the mate Siren call.
Here's a specific one to help out ol mate. I'm yet to see anyone properly look at globally used common hardware in I dunno, I'll randomly pick two use cases:
So your busy ticking boxes in this approach of looking at each small element of the architecture at a time, even if your just simply trying to buy something not to design something. Its looking pretty cool for Intel based laptop/desktop/tablet DAW and Intel based NAS storage running RAID in software on a mini OS using an Intel processor. Then, you realise Intel do not have ECC implemented even at the most basic foundation level, on the architecture of the random access memory storage system. Dammit. Even storing the silly little BIOS code for POST has checksums and measures for defending against corruption. Dammit, what are the consequences of not having any mechanism on RAM, for both DAW and NAS use cases?
Now Mr Grasshopper, has progressed into ol mate, and the noodle has very much turned for him. His realisation, never seemed to be mentioned on youtube or the web, is gunna protect granny's treasures because he knows a risk, and is able to assess that risk and explore treatments.
Summary of Specific Risk
NOTE: I'm using the absence of ECC RAM architectures in Intel consumer grade processes as just an example to demonstrate. There is no commercial or emotional bent beyond it, just the facts of the situation
* In memory chips, just like many other parts of digital processing and digital storage, things go wrong
* Things going wrong isnt necessarily of significance and effective systems have robust treatments to deal with failure modes without compromising systems integrity or indeed output in some cases
* One type of failure mode is the "bit flip", the binary encoding of memory chips changing bit state where obviously in binary it can be one of two states only. Hence it flips
* This error mode has been studied and is reasonably solid in terms of our model for how often the failure mode will occur, what factors influence the rate of the failure mode, what treatments exist to defend from very basic ones to more complex ones having multiple levels of strategy that feature correction in many circumstances without upsetting output integrity
* The way modern lithography works with how circuit details such as the distance of the gate lengths in manufacturing is under intense and continued development effort to reduce those lengths to improve both performance and reduce the physical size needed. Thus the smaller modern process node means the rate of failure is significantly higher than in legacy larger gate length process nodes
* Compounding to that, is because of various things about how memory chips are so simple in comparison to the design of other types of chips they are almost always the first types of chips against all other types to be moved to a new process node when we have a viable new one that's good enough in yields to warrant shifting to the new process. Memory chips are so simple and cheap to make that even if it means more duds on a new process node until we optimise it, that's cool and once the process node is more mature in our development we can then do the harder types of chips
* Compounding again, in an ever ongoing pursuit of awesome were trying to engineer architectures running on lower wattage, more performance for every watt, reduce waste heat, reduce power consumption. So we drive the chips with lower and lower voltages and the smaller gate lengths in the smaller process nodes means we have to be super careful around currents with things like leakage being lost outside the circuit into the substrate
* Compounding again, our pursuit means we keep defining need standards in things like signalling frequency, in the protocols used and so on. We keep switching the memory chip state in an ever faster standard
* It is intuitive that the more memory cells you have within each chip, and the more chips to each memory module, along with all the other compounding factors above, the rate of failure for bit flips is very different from a bank of 8 modules each of 16GB in size, against say the amount of memory in a CPU cache being say 1MB, or the 256MB of cache within a HDD.
* Theres a pretty good job in a general sense done on things like verification and validation, in test and evaluation, in having reliable science in things like specifications, boundaries to keep operational states in and so on. Where it becomes a nuisance is the environment causes problems with things like environmental radiation sources. In most applications keeping man made radiation sources shielded from these chips where it must be, isnt super tricky cos those use cases are very narrow and have massive budgets to play with treating it, however the natural sources of radiation are a pretty big hassle to deal with because the natural sources are everywhere and ongoing for every memory chip. As much as the sun helps us live, its annoying when it emites solar bursts for example that mess things up. And, it gets worse when the chip isnt at ground level but at flight levels in aeronautics, or even worse in space where being outside the atmosphere and being exposed to the radiation in space is a total nightmare.
* There is specific standards some industries have adopted, specific standards for specific uses. Like in space, and to a lesser extent aeronautics differences exist. At a practical level it means anything going into space is a pain in the butt to be involved with because its gunna be hot, large, power hungry and laughably slow because anything to be reliable up there is ancient in every way
* Some natural causes are especially a pain in the butt because it can go through buildings at ground level and is dam well pesky in it's determination to messup your wonderful digital stuff. Even worse, is your sitting in a basement nice and quiet for your DAW recording session or where you think its safer in the basement to store your NAS rather than floor 30 where your apartment is, its a pain because the rate of environmental problem is so uncontrolled - the sun could behave for say five days and then it has an aggro fit and does a coronal mass ejection blowing away our nice little models on rates of occurance and risk.. Its kinda like the weather in our atmosphere - variable.
* Its also intuitive to appreciate that these memory architectures are only susceptible during operation. Memory chips are not architectures for permanent storage apart from some specialised types which are kinda permanent. If we accept that generally its when its on that the failure mode occurs in these devices, thats fair
* Bit flips unmanaged for data that matters, like storing Grannys singing, are especially nasty cos if you fail to ensure every aspect of your system treats these risks, what happens is the dreaded "cancer rot of corruption" starts getting into not just the RAM, but it like a nasty dreaded cancer then gets into the online storage in your NAS lun for example, then it gets into you son backup, then your father, then your grandfather backup at which point, if your super unlucky and it just happens to be grannys song, well, youve got a corruption problem ol mate. A nasty one, which cannot be easily remediated from either RAID NAS or backups even at three levels.
* It remains entirely possible for sensible people to do things like DAW and NAS for granny, even if that person could so every minute every hour every day for years continuously. Its done for mission critical tasks, for human critical tasks.
* There is a "cost" in defence in managing bit flips. It is in bounded by one performance, two cost. The cost is very very minor and I doubt anyone sane would try to suggest given the risk the cost is a problem. The performance impact, it does exist. However, multiple solutions are present. One, the latency impact and so forth is stuff all and in many circumstances, processing tasks in the digital world using modern x64 cpu and memory architectures, as a result of the standards, tasks are not typically memory constrained. In the rare use cases where it really matters, faster RAM certified to less latency and the ability within the architecture to play around with memory timings means a given architecture can be memory tuned to not just remove the cost of ECC, but actually to make the ECC solution with higher spec memory faster than the standard gig with no ECC anyway. The other thing, when your free of x64 and into say ASIC or FPGA or GPGPU or whatever, there is many different types and many of them are simply ridiculous in terms of both low latency and bandwidth per clock and clock rate they operate. For all practical measures, given the risk its exceptionally dumb for a company selling things where the data of whats being used on the device is important, for that company to not have ECC on RAM memory, when especially for example in the case of x64 architectures ECC is everywhere else like CPU, HDD, GPU etcetc
* With Intel for example. Bit flip risk depends on your purchase choice as the different architectures you get when you buy a given Intel chip, not all universally have even the most basic level of ECC defence strategies. This is epically, epically bad. I'm sweet with my microwave having checksums in its firmware and it if bit flips out, Ill unplug it and repower it, Im good. How a company can be selling a NAS device on shelves that operates 24/7 for important data use cases, while commonly featuring capabilities like RAID to make it look robust, can with a straight face be selling a NAS without ECC RAM and presenting how its fit for purpose, blows my engineering propeller head. Where for me it crosses into being culpable, is when the company have another model on the same shelf for sale at 5x the price, and this one says it too has an Intel cpu and memory architecture in the device but this time it has ECC ram in six point font on page 70 of the user manual, well, words fail me.
Conclusion
* If your doing cool stuff with audio processing (playing, recording, editing etcetc) under use cases where that audio data being processed is important, I recommend choosing platforms that have effective defence against bit flips because no other method including putting it in a building basement will prevent natural environmental radiation sources messing up your day. Everything about modern memory cells in our chips makes them weaker to bit flips every new memory generation, and were using more and more RAM all the time too. The natural world changes allot, like how our weather in our atmosphere changes, some days of doing cool stuff with audio will be far worse than others. And, avoid at all costs doing cool stuff with audio at high altitude in planes. Not being on the ground means a whole new approach.
* In a similar context, you are far more exposed to much higher risk with not processing but storage. Storage tends to be 24/7. Try to buy second hand gear for storage from servers and the like if your really budget focused don't buy a new low end NAS without ECC. If your faced with having to beg/borrow/steal a non-ECC portable array drive or NAS etc, turn it on, put your storage on it, turn it off, assume the risk.
TL\DR: No time for a yarn, just skip down to not the summary bullet points but Conclusion
Hope it helps, thanks folks
A Yarn on Digital Risks With Audio
There is many reasons why digital audio like music or duplex voice communications or many other types of audio might be important. Important to a family because they have late grandmas song, a song she sung and now lasts ongoing captured in an encoded digital bitstream. Important to a sovereign nation state protecting operations with their strategic ballistic nuclear submarine fleet. Important to the young man full of hopes and dreams that he will be able to build a recording business and move out of his moms home. There are risks. More than once, I've been deeply saddened when its gone wrong and good people are looking for help with dealing with the consequences of those risks being realised. Were talking losing money, loosing human lives, loosing property. Sometimes, painfully so at tremendous consequence.
Let's chat about Mr Murphy and the laws of murphy. It's an air force thing so it's a bit painful to give it recognition hahaha. The might of the navy may navigate by the stars, a grand army might sleep under the stars, and here we have the soft ass Air Force rating their hotels by stars
One of those digital risks is about having integrity in data processing architectures, as well as having robust data storage architectures. Let's say there is a sensitive goth emo type in a family, who has been entrusted to look after late grandmas audio, and has been stressed in the past with all this and being a bit of a precious snowflake is passively combative on treating risks. "How dare you Bathrone! Grandmas song Ill have you know is totally ok. We've got grandfather father son method backups, including being distributed across multiple sites, plus like we have a multi disk storage array from this famous company who are a bunch of experts and cost a fair bit at the store, that comes with inherent redundancy through RAID there too. If I hear another word from anyone in the family on this, and something happens to Grandma's song, well they can bloody well take my left nut!"
Placing bets on the left nut is an Aussie tradition. Were all descended from convicts, so don't ask haha. Importantly however, if any of you ever come to Australia, never ever never get it wrong and say the right testicle. For you see, that's what Hitler had, and were the good guys. It's the equivalent of directing artillery fire and in having audio problems needing clarification, failing to indicate say again on the radio, by mistake asking to repeat, and having a drop short horror show of allied limbs and guts flying everywhere. So, its say again not repeat, and its left nut.
So. Right. Mr grasshopper thinks he's sweet with grandams stuff cos he's all high tech and all hi fi and stuff. In helping him not be a turkey to his late grandmother, we do the right thing and help out. Here I'll spare you the bizarre nuances of Australian culture between what is a grasshopper and what the definition of a turkey is. For it is enshrined biblically in a movie we made called Mad Max and apart from watching that film, the nebulous vagaries of Aussies can only be understood in person over many beers if any of you visit. Countless beers are required to grasp the subtlety of for example why calling yourself or your mates a phallus head for doing something useful or good is a badge of honour, and how the subtle variance of being a sphincter head needs careful application in select circumstances only.
Digital signals that are subject to processing can and do go wrong for many reasons. Things like clocks, like buffers, like even errors in the circuit logic itself causing processing errors beyond anything to do with problems in clock timing or in buffers. How can a bunch of experts who are market leaders globally in for example CPU's have processing errors? Can that be right? Most certainly it is, case in point reference is the famous Intel Pentium bug: https://en.wikipedia.org/wiki/Pentium_FDIV_bug
That bug caused massive consequences globally in so many areas. Medicine, Civil Engineering, Vehicle/Aeronautical/Space engineering, Defence/Intelligence and many many others.
What's necessary, is to understand the entire architecture end to end of how digital signal streams are processed, to identify risk and either treat those risks effectively, or worse case assume the risks and have plans for handling the consequences of them. It takes a deep technical understanding of the architecture being analysed. There is cases of defensive techniques put in place within the design, that need to be assessed as to effectiveness. A common one is defending by using Error Correction Code (ECC) within the architecture. The amount of buffering used in the stream within the sum context of clock jitter, temperature differences, maximums in latency, different RF noise/inductance/capacitance in different signal traces when pushing higher bus frequencies etcetc. Basically, understanding what could possibly go wrong and defending against it.
Another part of this, isn't about digital stream processing, but in the architecture of digital storage. The same method can be applied. It's helpful to take these distinct aspects and analyse those, rather than trying to chew the elephant across the enormity of the whole DAW, or whole storage array or whole backup architecture. Doing the whole elephant is prone to missing important details in the analysis.
What if Mr Grasshopper responds with "Well I just buy from expert companies, I buy hardware used globally and is proven and known, therefore everyone knows the real story and you know nothing so on your bike mate"
I'd be like "LOL yeah mate" which is yet another Aussie nuance, and if in laughter I actually cared and wanted to help the bloke not unknowing mess up grandma's precious stuff, I'd say, well one, things like thinking the experts have it covered cos their experts alone is an appeal to authority which is a fallacy. And two, is thinking along the lines of global sale, global use and the notion of "everyone" is an argumentum ad populum; meaning an appeal to the number of people who believe` something alone makes it therefore true which is another obvious fallacy in thought. Thirdly, it's kinda strawman fallacy tactic, to suggest that significant parts of the global population all "somehow know and agree" that there is no possibility of processing or storage errors in those architectures commonly found for sale in stores with things like laptops, desktops, small NAS storage etcetc as is common architectures in say DAW setups. Strawman is about trying to falsely portray a statement by misrepresenting or weakening someone's position in an illogical attempt to "refute" it. Fourth, I'd say even the most casual observation is going to conclude that an argument based on "you know nothing" or your "dumb" and concluding "therefore I'm right" is another fallacy; the ad hominem fallacy.
Let's say all hope in humanity isnt lost on good ol Mr Grasshopper and he's able to demonstrate due rationalism, cos aftercall he's a convict like the rest of us bastards down under. So on reflection he says "Well struth mate! What ya saying?! Christ"
I'm saying, even if your not a designer engineering a system, not a commander running operations in a strategic nuclear ballistic submarine, not a test and evaluation laboratory expert being paid to identify and fix a megabuck critical problem, let's say you simply want to do cool stuff and be awesome in use cases like buy audio communication equipment for your part time fireman crew, like buy storage for granny, like buy a backup system for granny, like live your dreams and hopes of making money in a home studio recording to get out of mums house into your own place. With those use cases, I'd say this, know what your buying and buy appropriately to your needs having full understanding of risks, then consider costs/features/performance etctec cos Mr Murphy is painful.
"Ummm yeah cool but that's as nebulous as the subtleties of Aussie lingo". Yep. I'm sympathetic. The problem is the world is a messy and complex place. The field of digital processing and storage is very much like that. People in the industry like to build models and build architectures to more rapidly do cool things, more rapidly focus on delivering solutions they care about, and I think over time the industry has made progress on using techniques like abstraction layers for example. Windows uses a hardware abstraction layer, and many other layers in doing cool DAW stuff.
"Awww cmon mate me left nut is bloody on the line! mmmmaaaaatttteeeeee its grannys song!!" This harmony of the influentially intoxicating "mmmmaaaaatttteeeeee" is a super power all Aussie have. It is a Siren Call, from an ancient power source that illicits sympathy and empathy to all true convicts within audible hearing. It compels to render immediate assistance, to ol mate. Now, ol mate mind you can refer to a complete and utter stranger, who did nothing more than demonstrate the artistry of elucidating the mate Siren call.
Here's a specific one to help out ol mate. I'm yet to see anyone properly look at globally used common hardware in I dunno, I'll randomly pick two use cases:
- DAW on laptops, desktop, tablet etc
- Online (not in a cloud sense but online as in allocated LUNs available with formatted space) storage using redundant multiple disks (RAID) on say a consumer NAS etcetc
So your busy ticking boxes in this approach of looking at each small element of the architecture at a time, even if your just simply trying to buy something not to design something. Its looking pretty cool for Intel based laptop/desktop/tablet DAW and Intel based NAS storage running RAID in software on a mini OS using an Intel processor. Then, you realise Intel do not have ECC implemented even at the most basic foundation level, on the architecture of the random access memory storage system. Dammit. Even storing the silly little BIOS code for POST has checksums and measures for defending against corruption. Dammit, what are the consequences of not having any mechanism on RAM, for both DAW and NAS use cases?
Now Mr Grasshopper, has progressed into ol mate, and the noodle has very much turned for him. His realisation, never seemed to be mentioned on youtube or the web, is gunna protect granny's treasures because he knows a risk, and is able to assess that risk and explore treatments.
Summary of Specific Risk
NOTE: I'm using the absence of ECC RAM architectures in Intel consumer grade processes as just an example to demonstrate. There is no commercial or emotional bent beyond it, just the facts of the situation
* In memory chips, just like many other parts of digital processing and digital storage, things go wrong
* Things going wrong isnt necessarily of significance and effective systems have robust treatments to deal with failure modes without compromising systems integrity or indeed output in some cases
* One type of failure mode is the "bit flip", the binary encoding of memory chips changing bit state where obviously in binary it can be one of two states only. Hence it flips
* This error mode has been studied and is reasonably solid in terms of our model for how often the failure mode will occur, what factors influence the rate of the failure mode, what treatments exist to defend from very basic ones to more complex ones having multiple levels of strategy that feature correction in many circumstances without upsetting output integrity
* The way modern lithography works with how circuit details such as the distance of the gate lengths in manufacturing is under intense and continued development effort to reduce those lengths to improve both performance and reduce the physical size needed. Thus the smaller modern process node means the rate of failure is significantly higher than in legacy larger gate length process nodes
* Compounding to that, is because of various things about how memory chips are so simple in comparison to the design of other types of chips they are almost always the first types of chips against all other types to be moved to a new process node when we have a viable new one that's good enough in yields to warrant shifting to the new process. Memory chips are so simple and cheap to make that even if it means more duds on a new process node until we optimise it, that's cool and once the process node is more mature in our development we can then do the harder types of chips
* Compounding again, in an ever ongoing pursuit of awesome were trying to engineer architectures running on lower wattage, more performance for every watt, reduce waste heat, reduce power consumption. So we drive the chips with lower and lower voltages and the smaller gate lengths in the smaller process nodes means we have to be super careful around currents with things like leakage being lost outside the circuit into the substrate
* Compounding again, our pursuit means we keep defining need standards in things like signalling frequency, in the protocols used and so on. We keep switching the memory chip state in an ever faster standard
* It is intuitive that the more memory cells you have within each chip, and the more chips to each memory module, along with all the other compounding factors above, the rate of failure for bit flips is very different from a bank of 8 modules each of 16GB in size, against say the amount of memory in a CPU cache being say 1MB, or the 256MB of cache within a HDD.
* Theres a pretty good job in a general sense done on things like verification and validation, in test and evaluation, in having reliable science in things like specifications, boundaries to keep operational states in and so on. Where it becomes a nuisance is the environment causes problems with things like environmental radiation sources. In most applications keeping man made radiation sources shielded from these chips where it must be, isnt super tricky cos those use cases are very narrow and have massive budgets to play with treating it, however the natural sources of radiation are a pretty big hassle to deal with because the natural sources are everywhere and ongoing for every memory chip. As much as the sun helps us live, its annoying when it emites solar bursts for example that mess things up. And, it gets worse when the chip isnt at ground level but at flight levels in aeronautics, or even worse in space where being outside the atmosphere and being exposed to the radiation in space is a total nightmare.
* There is specific standards some industries have adopted, specific standards for specific uses. Like in space, and to a lesser extent aeronautics differences exist. At a practical level it means anything going into space is a pain in the butt to be involved with because its gunna be hot, large, power hungry and laughably slow because anything to be reliable up there is ancient in every way
* Some natural causes are especially a pain in the butt because it can go through buildings at ground level and is dam well pesky in it's determination to messup your wonderful digital stuff. Even worse, is your sitting in a basement nice and quiet for your DAW recording session or where you think its safer in the basement to store your NAS rather than floor 30 where your apartment is, its a pain because the rate of environmental problem is so uncontrolled - the sun could behave for say five days and then it has an aggro fit and does a coronal mass ejection blowing away our nice little models on rates of occurance and risk.. Its kinda like the weather in our atmosphere - variable.
* Its also intuitive to appreciate that these memory architectures are only susceptible during operation. Memory chips are not architectures for permanent storage apart from some specialised types which are kinda permanent. If we accept that generally its when its on that the failure mode occurs in these devices, thats fair
* Bit flips unmanaged for data that matters, like storing Grannys singing, are especially nasty cos if you fail to ensure every aspect of your system treats these risks, what happens is the dreaded "cancer rot of corruption" starts getting into not just the RAM, but it like a nasty dreaded cancer then gets into the online storage in your NAS lun for example, then it gets into you son backup, then your father, then your grandfather backup at which point, if your super unlucky and it just happens to be grannys song, well, youve got a corruption problem ol mate. A nasty one, which cannot be easily remediated from either RAID NAS or backups even at three levels.
* It remains entirely possible for sensible people to do things like DAW and NAS for granny, even if that person could so every minute every hour every day for years continuously. Its done for mission critical tasks, for human critical tasks.
* There is a "cost" in defence in managing bit flips. It is in bounded by one performance, two cost. The cost is very very minor and I doubt anyone sane would try to suggest given the risk the cost is a problem. The performance impact, it does exist. However, multiple solutions are present. One, the latency impact and so forth is stuff all and in many circumstances, processing tasks in the digital world using modern x64 cpu and memory architectures, as a result of the standards, tasks are not typically memory constrained. In the rare use cases where it really matters, faster RAM certified to less latency and the ability within the architecture to play around with memory timings means a given architecture can be memory tuned to not just remove the cost of ECC, but actually to make the ECC solution with higher spec memory faster than the standard gig with no ECC anyway. The other thing, when your free of x64 and into say ASIC or FPGA or GPGPU or whatever, there is many different types and many of them are simply ridiculous in terms of both low latency and bandwidth per clock and clock rate they operate. For all practical measures, given the risk its exceptionally dumb for a company selling things where the data of whats being used on the device is important, for that company to not have ECC on RAM memory, when especially for example in the case of x64 architectures ECC is everywhere else like CPU, HDD, GPU etcetc
* With Intel for example. Bit flip risk depends on your purchase choice as the different architectures you get when you buy a given Intel chip, not all universally have even the most basic level of ECC defence strategies. This is epically, epically bad. I'm sweet with my microwave having checksums in its firmware and it if bit flips out, Ill unplug it and repower it, Im good. How a company can be selling a NAS device on shelves that operates 24/7 for important data use cases, while commonly featuring capabilities like RAID to make it look robust, can with a straight face be selling a NAS without ECC RAM and presenting how its fit for purpose, blows my engineering propeller head. Where for me it crosses into being culpable, is when the company have another model on the same shelf for sale at 5x the price, and this one says it too has an Intel cpu and memory architecture in the device but this time it has ECC ram in six point font on page 70 of the user manual, well, words fail me.
Conclusion
* If your doing cool stuff with audio processing (playing, recording, editing etcetc) under use cases where that audio data being processed is important, I recommend choosing platforms that have effective defence against bit flips because no other method including putting it in a building basement will prevent natural environmental radiation sources messing up your day. Everything about modern memory cells in our chips makes them weaker to bit flips every new memory generation, and were using more and more RAM all the time too. The natural world changes allot, like how our weather in our atmosphere changes, some days of doing cool stuff with audio will be far worse than others. And, avoid at all costs doing cool stuff with audio at high altitude in planes. Not being on the ground means a whole new approach.
* In a similar context, you are far more exposed to much higher risk with not processing but storage. Storage tends to be 24/7. Try to buy second hand gear for storage from servers and the like if your really budget focused don't buy a new low end NAS without ECC. If your faced with having to beg/borrow/steal a non-ECC portable array drive or NAS etc, turn it on, put your storage on it, turn it off, assume the risk.