• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Good hard drive for streaming lossless??

Svperstar

Senior Member
Joined
Jul 4, 2018
Messages
342
Likes
222
MTBF numbers are mostly meaningless. The relevant parameter for SSDs is write endurance, which I can't find anywhere on the Sabrent website. Not a good sign.

Sorry that is what I meant to say. For the 2 tb model from amazon:

Hello, thank you for your question, the answer is TBW 3115 as for endurance. hope this helps, if you have any other question or concern about any of our products please contact our technical support department at 323-266-0911 or email at [email protected]. Best regards.
By Sabrent Support on June 18, 2019
 

Berwhale

Major Contributor
Forum Donor
Joined
Aug 29, 2019
Messages
3,935
Likes
4,925
Location
UK
I don't see what RAID has to do with anything. It doesn't alter the overall access pattern seen by the drive.

It depends on the RAID level. When using RAID5/6, where data is stripped across all drives in the array, a single SMR drive exhausting its write buffers can take down the whole array. This is more likely happen during a rebuild or array expansion when all data is re-written to all drives.

Personally, I buy the cheapest possible disks for my server (usually shucked from USB enclosures), these drives are invariably SMR now. So I stopped running RAID at home (after nearly 20 years). Instead, I run identical data disks in a virtual Synology NAS and a physical one and back up the former to the latter (with occasional off-line copies to a USB JBOD enclosure).
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
Don't think I see your point. Have you read the entire story over at The Register? The problem concerns the WD Red range, which is specifically sold as the drive of choice for RAID use. When used outside a RAID environment* the worst case is that writing will be a lot slower than reading. If that suits you, fine. However, if you use a new low capacity “Red” (I believe <8TB) as a theoretically identical replacement for a failed drive within a RAID (“resilvering”), the process of reconstructing the RAID may fail as the write performance of the new drive is far worse than that of its older RAID partners.

This particular issue is RAID-related and RAID is the intended and advertised role of the Red family. Got it now?
No, I still don't get what would make a drive specifically suitable for RAID applications regardless of what the WD Red is or isn't. In the context of this debacle, rebuilding a RAID after a drive replacement should be one long, continuous write. SMR drives like long, continuous writes. They do not like short, random writes. Once up and running, the access pattern seen by individual drives should be similar to that of the RAID as a whole. Where you may have a point is that the typical use cases for RAID and for SMR don't really overlap that much.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
It depends on the RAID level. When using RAID5/6, where data is stripped across all drives in the array, a single SMR drive exhausting its write buffers can take down the whole array. This is more likely happen during a rebuild or array expansion when all data is re-written to all drives.
Sounds like spectacularly poor design to me. If the buffers fill up, the drive should delay accepting further write commands, not start throwing errors. I guess I should have learned by now, but I'm still being surprised by the level of utter crap being put into consumer products.
 

somebodyelse

Major Contributor
Joined
Dec 5, 2018
Messages
3,682
Likes
2,962
No, I still don't get what would make a drive specifically suitable for RAID applications regardless of what the WD Red is or isn't. In the context of this debacle, rebuilding a RAID after a drive replacement should be one long, continuous write. SMR drives like long, continuous writes. They do not like short, random writes. Once up and running, the access pattern seen by individual drives should be similar to that of the RAID as a whole. Where you may have a point is that the typical use cases for RAID and for SMR don't really overlap that much.
The problem is a specific behaviour of the drive's firmware causing long (>1minute) stalls during sustained writes, and the RAID software interpreting the stalls as drive failure. Details here

EDIT: the other part of the problem is that WD obfuscated until presented with irrefutable proof that they were selling shingled drives without telling anyone, and that it was causing problems in one of the use cases that the drives were advertised as being suitable for.
 

Pluto

Addicted to Fun and Learning
Forum Donor
Joined
Sep 2, 2018
Messages
990
Likes
1,631
Location
Harrow, UK
No, I still don't get what would make a drive specifically suitable for RAID applications
  • 24/7 operation. The drive manufacturers now seem to have so much statistical information on their products that they are confident to specify (without any promises, of course), within a surprisingly tight margin, the amount of data throughput that a given drive will manage before it goes terminal.
  • Ability to cope with the (presumably) higher temperature of a small RAID enclosure.
  • TLER TIme Limited Error Recovery. A RAID5 configuration is inherently capable of error checking without assistance from the drive(s) internal error management as the mathematics of RAID5 essentially ensure that (at least) one member of the array can be in error without loss of data. For this reason it is undesirable for one member of the array to ‘waste’ time attempting to correct an error that the array itself is inherently capable of managing. This is obviously in contrast to a non-RAID installation where, in all likelihood, you would like a drive to do everything it can (and take as long as it needs) to achieve accurate data recovery.
Most RAID hardware assumes (quite reasonably in my view) that each member of the array has a similar performance. When a new member is introduced that, quite unexpectedly, has a far slower write capability than its siblings – despite it being sold as a “replacement for…” – we have a problem…
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
24/7 operation
I wouldn't even consider buying a drive not suitable for that. What exactly is degraded in the other ones anyway?

TLER TIme Limited Error Recovery. A RAID5 configuration is inherently capable of error checking without assistance from the drive(s) internal error management as the mathematics of RAID5 essentially ensure that (at least) one member of the array can be in error without loss of data. For this reason it is undesirable for one member of the array to ‘waste’ time attempting to correct an error that the array itself is inherently capable of managing. This is obviously in contrast to a non-RAID installation where, in all likelihood, you would like a drive to do everything it can (and take as long as it needs) to achieve accurate data recovery.
Well, you don't want the array switching to degraded mode from errors that could have been corrected by the drive either.

I'm sticking to my enterprise drives. Hopefully these 4GB Seagate Constellation ES.3 have some life in them yet.
 

Pluto

Addicted to Fun and Learning
Forum Donor
Joined
Sep 2, 2018
Messages
990
Likes
1,631
Location
Harrow, UK
you don't want the array switching to degraded mode from errors that could have been corrected by the drive
And that is the reasoning underlying Time Limited Error Recovery. The drives themselves are given so long to present the data. That timeout period is far shorter on a drive specified for RAID use than one designed for normal, free-standing, operation. I believe that addresses your earlier point…
I still don't get what would make a drive specifically suitable for RAID applications
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
And that is the reasoning underlying Time Limited Error Recovery. The drives themselves are given so long to present the data. That timeout period is far shorter on a drive specified for RAID use than one designed for normal, free-standing, operation. I believe that addresses your earlier point…
No, it doesn't. You're describing a drive that is decidedly _unsuitable_ for anything but RAID. That's very different from making it _good_ for RAID. The reliability of the RAID is a combination of redundancy with the reliability of individual drives. Better drives give a more reliable storage solution, RAID or not.
 

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,785
Sounds like spectacularly poor design to me. If the buffers fill up, the drive should delay accepting further write commands, not start throwing errors.

And that is what happens. Problem with that is rebuild can last reaaally long.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
And that is what happens. Problem with that is rebuild can last reaaally long.
Is the sustained write rate not published? Not tested by review sites? As a buyer, how would you not know about that limitation, even if the use of SMR specifically wasn't disclosed?
 

Pluto

Addicted to Fun and Learning
Forum Donor
Joined
Sep 2, 2018
Messages
990
Likes
1,631
Location
Harrow, UK
:eek: this is going round in ever-decreasing circles.

It started as a point of interest about the possibly questionable integrity of WD which will (eventually) be examined, and answered, by the Court.

It was not intended to become a masterclass in the design subtleties of drives for RAID use.

I’m out.
 

QMuse

Major Contributor
Joined
Feb 20, 2020
Messages
3,124
Likes
2,785
Is the sustained write rate not published? Not tested by review sites? As a buyer, how would you not know about that limitation, even if the use of SMR specifically wasn't disclosed?

It is not published by WD, specs say only "Transfer rate up to xxx MB/s". I don't know about review sites.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
It is not published by WD, specs say only "Transfer rate up to xxx MB/s". I don't know about review sites.
Well, then that's a drive I wouldn't consider buying. Any parameter not published can safely be assumed poor.
 

Berwhale

Major Contributor
Forum Donor
Joined
Aug 29, 2019
Messages
3,935
Likes
4,925
Location
UK
Sounds like spectacularly poor design to me. If the buffers fill up, the drive should delay accepting further write commands, not start throwing errors. I guess I should have learned by now, but I'm still being surprised by the level of utter crap being put into consumer products.

It's not crap, it's engineering to a set of parameters. SMR is a pretty clever way of getting more data onto a given area of magnetic substrate (known as areal density). The downside is that it takes longer to lay down each piece of information (because the magnetic domains are 'shingled' like the tiles on a roof). Caches are used to mitigate this delay by allowing writes to happen in the background. However, caches are of a finite size because cache is expensive. If you hammer the drive with writes, you will fill up the cache and you end up writing data at a rate determined by the shingled write process, rather than the cache speed. This is no different to many other computer systems that employ cache to boost performance, it's especially true of microprocessors where 3 levels of cache is now the norm (and where concepts such as speculative execution and branch prediction have led to their own problems).

BTW, SMR is not limited to consumer drives, it's also used in enterprise grade drives marketed for archive purposes. If you think about most peoples home use of storage, it's much closer to this use case (write a little, read a lot) than your typical data centre read/write profile.

My recommendation: Don't worry about who makes the drives or their published MTBF. Buy the cheapest ones you can ($/GB) and back them up.
 

Berwhale

Major Contributor
Forum Donor
Joined
Aug 29, 2019
Messages
3,935
Likes
4,925
Location
UK
Here's is the runtime of the four 3TB drives I pulled from my home RAID array early last year...

1591113028436.png


At the time I purchased the top 3 drives, they were the cheapest available from a £/GB perspective, they were all shucked from USB enclosures. The P300 was purchased 3 years later when one of the original drives (not shown) failed - this wasn't the cheapest drive (£/GB), but I needed another 3TB drive to complete the array (no point in buying a bigger drive with better £/GB, then leaving a chunk of it unused).

As some of these drives were getting on for 5 years old, I replaced this array with non-RAIDed 4TB and 6TB drives in both my main server and backup server. I didn't buy the drives all at the same time - I waited for Black Friday and other deals to keep costs low. There's also a mix of manufacturers and I think only one of the 6TB drives is SMR. The four 3TB are still in service, they sit in a 4 way USB JBOD enclosure and hold the tertiary off-line backups I mentioned above.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
It's not crap, it's engineering to a set of parameters. SMR is a pretty clever way of getting more data onto a given area of magnetic substrate (known as areal density). The downside is that it takes longer to lay down each piece of information (because the magnetic domains are 'shingled' like the tiles on a roof). Caches are used to mitigate this delay by allowing writes to happen in the background. However, caches are of a finite size because cache is expensive. If you hammer the drive with writes, you will fill up the cache and you end up writing data at a rate determined by the shingled write process, rather than the cache speed.
I'm well aware of how caches work, thanks. This is no different in principle from RAM caches used on every hard disk since the dawn of time. What struck me as odd was when you said the slow writes would "take down the whole array." That sounds, to me, like something much worse than the rebuild merely taking a long time.

SMR drives are divided into 256MB zones that must be written sequentially. As long as this rule is followed, write speed should be comparable to a normal drive. If you update a smaller block within a zone, the entire 256MB must be rewritten, and this gets very slow. If the RAID rebuild somehow triggers this mode of operation, it will obviously slow to a crawl. What I still don't understand is how this happens. The rebuilding ought to be a single continuous stream from start to end onto a pristine disk. An SMR drive should love this.
 

Berwhale

Major Contributor
Forum Donor
Joined
Aug 29, 2019
Messages
3,935
Likes
4,925
Location
UK
What I still don't understand is how this happens. The rebuilding ought to be a single continuous stream from start to end onto a pristine disk. An SMR drive should love this.

Good question. My understanding is that the disks within the array are independent, i.e. each disk is free to service each write request in what ever way it deems best. Even if the stripe data is written in a nice continuous stream to one drive, it does not mean that the other disks respond in the same way. i.e. an array of disks is not equivalent to an array of platters within a drive, where the writes to each platter would be synchronous because all the heads move together.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
Good question. My understanding is that the disks within the array are independent, i.e. each disk is free to service each write request in what ever way it deems best. Even if the stripe data is written in a nice continuous stream to one drive, it does not mean that the other disks respond in the same way. i.e. an array of disks is not equivalent to an array of platters within a drive, where the writes to each platter would be synchronous because all the heads move together.
Consider a RAID-1 with two disks. One of these disks has just been replaced. The rebuild entails copying all the data from the other disk to the new one. This is best done in linear fashion from start to end, and there is no reason whatsoever anyone would do it differently. Linux software RAID, which is what most of those NAS boxes use, certainly does it this way. With RAID-5 there's a more complicated calculation to determine what to write to the new drive, but the rebuild still progresses linearly from start to finish.

Something that would ruin performance is if the new disk had a few scattered blocks written before the rebuild. Then any shingled zone containing one of those blocks could end up being rewritten multiple times. I don't know how Linux RAID handles writes to unfinished areas during a rebuild. If these go through to the incomplete disk, it would likely cause slowdowns with SMR drives. If no writes happen during the rebuild, I don't see how anything would go bad.
 
Top Bottom