Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

RAID's Days May Be Numbered

kdawson posted more than 4 years ago | from the time-to-try-flit dept.

Data Storage 444

storagedude sends in an article claiming that RAID is nearing the end of the line because of soaring rebuild times and the growing risk of data loss. "The concept of parity-based RAID (levels 3, 5 and 6) is now pretty old in technological terms, and the technology's limitations will become pretty clear in the not-too-distant future — and are probably obvious to some users already. In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data."

cancel ×

444 comments

simple idea (2, Interesting)

shentino (1139071) | more than 4 years ago | (#29463913)

Don't consider an entire drive is dead if you get a piddly one-sector error.

Just mark it read only and keep chugging.

reallocate on write (2, Informative)

Spazmania (174582) | more than 4 years ago | (#29463933)

Or just regenerate and write the one sector from the parity data since all modern hard disks reallocate bad sectors on write.

Fuck Raid In The Goat-Ass (0, Troll)

Anonymous Coward | more than 4 years ago | (#29463957)

What's hard to remove is nigger grease. If you have ever gone to a swimming pool only to find that a bunch of blacks are already swimming in it then you know what nigger grease is all about. It is easiest to see right after they exit the water. It's a thin film on the surface of the water deposited by the vast quantities of oil that their overactive glands produce. Seriously we can be carbon neutral tomorrow if we found a way to make biodiesel out of it as the supply is abundant. Anyway, if you swim in a pool like that you will feel yourself coated with the nigger grease. It is not pleasant. Is there a filter somewhere that can remove even KFC-fortified nigger grease?

Re:Fuck Raid In The Goat-Ass (-1)

Anonymous Coward | more than 4 years ago | (#29464183)

Am I the only one who finds this rather amusing, if offtopic?

Re:Fuck Raid In The Goat-Ass (-1)

Anonymous Coward | more than 4 years ago | (#29464277)

Yes, we are

Re:reallocate on write (4, Informative)

Erik Hensema (12898) | more than 4 years ago | (#29464191)

That's what any raid controller worth their salt does. I've seen 3ware and areca controllers do this, and those aren't the most expensive controllers on the market by far.

Re:simple idea (0)

Anonymous Coward | more than 4 years ago | (#29463935)

Do you have any idea what a RAID is? By your post, I assume not.

Re:simple idea (4, Informative)

paulhar (652995) | more than 4 years ago | (#29463985)

Enterprise arrays copy all the good data off the drive to a spare drive, use RAID to recover the failed sector(s), then fail the broken disk.

Re:simple idea (5, Insightful)

Anonymous Coward | more than 4 years ago | (#29464767)

Enterprise arrays are also very VERY different from what most people know as RAID. Smart controllers, smart drive cages, drives that are a magnitude better than the consumer grade garbage.

The Summary talks about how speed has not kept up with capacity, Yes that is correct in the low grade consumer junk. Enterprise server class RAID drives are a different story. The 15,000 RPM drives I have in my RAID 50 array here on the Database server are insanely fast. Plus server class drives are not silly unstable capacities like 1Tb or 1.5Tb they area "OMG small" 300gb size but are stable as a rock.

So I guess the question is, Is the summary talking about RAID on junk drives or RAID on real drives?

Re:simple idea (4, Insightful)

Eric Smith (4379) | more than 4 years ago | (#29464321)

The drives already do that internally. By the time they're reporting errors, bad things are happening, and it really IS time to replace the drive. Anyhow, drives are inexpensive. It's more cost effective to replace them than to spend a lot of time screwing around with them.

Re:simple idea (3, Informative)

paulhar (652995) | more than 4 years ago | (#29464405)

They do to varying degrees of success but just because a disk can't read a particular sector doesn't mean that the drive is faulty - it could be a simple error on the onboard controller that is causing the issue.

FC/SAS drives mostly leave error handling up to the array rather than doing it themselves because the arrays can typically make better decisions as to how to deal with the problem and helps cope with time sensitive applications. The array can choose to issue additional retries, reboot the drive while continuing to use RAID to serve the data, etc.

Consumer SAS drives on the other hand try really hard to recover from the problem - for example retrying again and again with different methods to get the sector and while admiral that leads to behaviours we see in consumer land where the PC just "locks up". The assumption here is that there is no RAID available and so reporting an error back to the host is "a bad thing". The enterprise SAS drives we're seeing on the market are starting to disable this automatic functionality to make them behave correctly when inserted into RAID arrays.

Usually ;-)

Solved a Long Time Ago (4, Informative)

BBCWatcher (900486) | more than 4 years ago | (#29463963)

Honestly, there really aren't that many unsolved problems in computing if you are sufficiently aware enough to include mainframes and mainframe operating disciplines in your consideration. The basic way the mainframe community solved this particular problem long ago was to, first, take a holistic view about mitigating data loss. Double concurrent spindle failures are just one possible risk element. What about, for example, an entire data center exploding in a spectacular fireball? (Or whatever.) IBM, for example, came up with several different flavors of GDPS [ibm.com] and continues to refine them, and they include multiple approaches to data storage tiering across geographies, depending on what you're trying to achieve. Data loss, whether physical or otherwise (such as security breaches), is not a particular problem with this class of technology and associated IT discipline, nor does there seem to be any signs of a growing problem in this particular technology class.

Re:Solved a Long Time Ago (0)

Anonymous Coward | more than 4 years ago | (#29463973)

A great variety of solutions fit the "geocluster" scheme, it's certainly not a mainframe exclusive.

Re:Worked-around a Long Time Ago (5, Interesting)

Anonymous Coward | more than 4 years ago | (#29464213)

But really none of that should be necessary for the general case. Storing data in different physical locations is a good but entirely unrelated issue- the main problem of disk reliability is still very much in need of a solution. That's pretty much the point of the article: You can come up with various solutions which move the problem around, give multiple fallbacks for when something goes wrong.. but there's still the problem of things going wrong in the first place. I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

Re:Worked-around a Long Time Ago (5, Funny)

Fred_A (10934) | more than 4 years ago | (#29464303)

I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

You're trying to weasel out of paying IBM protection money !

Re:Worked-around a Long Time Ago (4, Interesting)

plover (150551) | more than 4 years ago | (#29464609)

Actually, storing data in a multiple data center / high availability environment is a completely related issue. The summary above talks of "entirely different paradigms." Cloud storage would be multiple data center based, which is entirely different from keeping the only copy on your local drives. In this concept, your machine would have enough OS to boot, and enough hard drive space to download the current version of whatever software you are leasing. Your personal info would always be maintained in the data centers, and only mirrored locally. Have a home failure? Drop in a new part or even a new PC, (possibly with an entirely different operating system, such as Chrome,) connect to the service, and you're 100% back.

It's no longer a novel concept for the home market. Consider Google Docs. It's not even being sold as "safer than RAID", it's being touted as "get it from anywhere" or "share with your friends". Safer than RAID is just a bonus.

So are we ready to move all our personal information to clouds? I certainly am not, but Google Docs are wildly popular and a lot of people are. I long ago learned that I can't look to myself to judge what the mainstream attitudes are in many things.

Re:Solved a Long Time Ago (2, Insightful)

Odinlake (1057938) | more than 4 years ago | (#29464721)

As /.:ers so eagerly point out whenever RAID is mentioned: it's not for backup. It's for reducing downtime when hd's fail. So I assume that's the issue the original poster was thinking of. Not that I know what the solution would possibly be, but there's the correct question at least.

Re:Solved a Long Time Ago (2, Informative)

DarkOx (621550) | more than 4 years ago | (#29464943)

Well, the point the of the article is that if it takes your array 6 hours to rebuild instead of 4 because the capacities have gone up but the failure rate of the hardware is unchanged you have a problem. The problem is that you are more likely to experience another failure before the first one has been mitigated. If you have that additional failure on most raids (unless you are doing 5-5 or 1-5 or some other RAID over RAID scheme) you get down time. The volume is off line and must be restored from some other location.

The solution is usually a cluster or remote hotsite or something like that. It would be nice to have fast rebuild times back. There are lots of situations were 5 nines is not a requirement but downtime still should be avoided, shorter exposure windows for array rebuilds are a good thing.

Bogus outdated thinking (5, Interesting)

twisteddk (201366) | more than 4 years ago | (#29463967)

The author says it himself in the article:

"And running software RAID-5 or RAID-6 equivalent does not address the underlying issues with the drive. Yes, you could mirror to get out of the disk reliability penalty box, but that does not address the cost issue."

but he hasn't adressed the fact that today you get 100 times as much diskspace for the same cost as you did 10 years ago when cost was a factor. In real life cost isn't a factor when it comes to datastorage, simply because it's really low in real life projects, as compared to the other costs in a project requiring storage. So if you want the reliability you go get a mirror. Drivespace is dirt cheap.

As for the rebuildtimes, fine, go buy FASTER drives. I dont see the problem. HP and many other vendors have long been trying to sell combined raid soltions (like the EVA) where you mix high storage with high performance drives (like SSD vs. SATA).

The only real argument for the validity of this article is the personal use of drives/storage. And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

Re:Bogus outdated thinking (5, Insightful)

TechnoFrood (1292478) | more than 4 years ago | (#29464365)

I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss. I could justify replacing a raid-0 set up with a SSD.

That said I definitely couldn't afford an SSD that would be able to replace the raid-5 in my pc (4x500GB usable space of 1.34TB), the largest SSD listed on ebuyer.com are 250GB @ £360 each, I would need 8 to match my raid 5 setup which is £2880 which is probably enough to build 2 reasonable machines both with a 1.34TB raid-5 using normal HDDs.

Re:Bogus outdated thinking (1)

c6gunner (950153) | more than 4 years ago | (#29464437)

That said I definitely couldn't afford an SSD that would be able to replace the raid-5 in my pc (4x500GB usable space of 1.34TB), the largest SSD listed on ebuyer.com are 250GB @ £360 each, I would need 8 to match my raid 5 setup which is £2880 which is probably enough to build 2 reasonable machines both with a 1.34TB raid-5 using normal HDDs.

In today's prices, it'd be enough to build 2 machines with MCH higher capacity (5TB+). Remember that a 1 TB drive today costs less than what you probably spent per 500 gig drive.

Re:Bogus outdated thinking (2, Insightful)

tg123 (1409503) | more than 4 years ago | (#29464541)

I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss........

I hope your not mixing up Raid with a backup.

Raid when used for protecting your computer will not protect your data it just makes your system able to tolerate hard drive failure.

Re:Bogus outdated thinking (1)

lorenlal (164133) | more than 4 years ago | (#29464619)

I think the point is that SSDs are one way to address the problem of rebuilds and thus, reliability. The limiting factor in disk bandwidth is the mechanical process of spinning the disk and getting the head over the right part and reading. A 6 Gb/s SATA interface was approved this year, but that was mostly due to the emergence of these SSDs. Yes, it's great that you can have a huge RAID-5 setup at home, and it's probably very little consequence if the array rebuilds for a couple days.

It's a whole different matter if you're a large business and you're sweating it out for those 2 days waiting for the rebuild to finish. Not because you'd lose your data... That's why you have backups. But if you have a second failure during that rebuild, you're looking at having to start the rebuild over again, and then you have to restore from tape. The solution for that is to have a second server with the setup... And depending on the cost of the first, may end up costing a lot more than those SSDs, especially if you factor in energy usage.

Since SSD's are still relatively new, yes, they're expensive. I also remember how 10 years ago, a 10 GB mechanical drive was about $100... The deal is, it's all about needs.. You don't need SSD at home... You don't need 99.999 uptime... But there are businesses and people that do. They'll be plenty happy to pay.

Re:Bogus outdated thinking (0)

Anonymous Coward | more than 4 years ago | (#29464871)

But there are businesses and people that do. They'll be plenty happy to pay.

And if those businesses have their critical data on some lousy Windows servers they were not happy enough to pay the cost for a system designed to do the job. There's so many SAN manufacturers out there creating decent solutions, that the best solution for an enterprise waiting 2 days for a disc rebuild is to find the people responsible for that decision not to go with a SAN and hang them by their balls.

Re:Bogus outdated thinking (5, Insightful)

drsmithy (35869) | more than 4 years ago | (#29464383)

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

Huh ? That's like saying show me 3 people who have a nice pair of running shoes and I'll show you 3 guys who can't afford a car.

Re:Bogus outdated thinking (3, Funny)

Shakrai (717556) | more than 4 years ago | (#29464849)

Huh ? That's like saying show me 3 people who have a nice pair of running shoes and I'll show you 3 guys who can't afford a car.

We need a +1 car analogy mod.... ;)

Re:Bogus outdated thinking (1)

FlyingBishop (1293238) | more than 4 years ago | (#29464875)

It's more like saying show me 3 guys that own an SUV and I'll show you 3 guys who can't afford a hybrid car. They're different engines, for different tasks.

Re:Bogus outdated thinking (0)

Anonymous Coward | more than 4 years ago | (#29464963)

I have a nice pair of running shoes and can't afford a car. That's one.

Re:Bogus outdated thinking (3, Informative)

daybot (911557) | more than 4 years ago | (#29464399)

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

Yeah, every time an article on storage catches my eye, I have to check laptop SSD prices. So far, each time I do this, for the cost of a drive the size I need, I could buy a new snowboard, or a laptop, bike, half a holiday, room full of beer... etc. I really want one, but so far I haven't been able to look at that list and say "I'd rather have an SSD!"

Re:Bogus outdated thinking (3, Funny)

plover (150551) | more than 4 years ago | (#29464637)

Half a holiday is overrated. Buy the SSD! :-)

Re:Bogus outdated thinking (1, Informative)

Anonymous Coward | more than 4 years ago | (#29464461)

Actually I run a RAID 5 array off of a server in my home. 8 147 GB SAS 15000rpm drives. I'm a photographer and have tens of thousands of images that I need an affordable storage solution for, and RAID 5 does the trick, along with off site back up.

For one SSD's are simply not practical, nor cost efficent yet, and certainly not in the size and quantity I would require. For two, your argument simply doesn't wash, even if you use SSD's it doesn't eliminate the need for a RAID array, not for someone that truly needs the fault tollerance and redundancy which is the reason for having an array in the first place. Your argument is simply to build an array at four times the cost. Sure, I can afford to spend 6 grand building an SSD RAID but the real question is why would I when I can have an enterprise class solution for $1500. Your summation is just ridiculous.

On a side note, if someone has a true need for RAID and they're using a software RAID solution then they're asking for problems. A hardware solution should be the ONLY consideration for a real RAID setup.

Re:Bogus outdated thinking (0)

Anonymous Coward | more than 4 years ago | (#29464557)

And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

Why don't you name three people you know who have 2TB (or more) of SSD storage on their personal PCs?

Good luck with that.

Re:Bogus outdated thinking (1)

J4 (449) | more than 4 years ago | (#29464653)

Argument sounds good on paper yet faster drives don't have a lot of impact because improvements in capacity/$ outpace improvements in speed. 2 x not fast enough=still not fast enough.

When morguefile.com went down they spent 3 days in limbo waiting for a RAID-6 array to rebuild and when it finally finished it was
garbage. The site was down for 2 weeks due to extenuating circumstances but what ate the time was the sheer amount of data that had to be processed.
One day I'll get around to writing down what really happened.

Re:Bogus outdated thinking (4, Informative)

Lumpy (12016) | more than 4 years ago | (#29464955)

The problem is IT guys and PHB's that think RAID=Backup.

It's not and it never has been a backup solution. RAID is high availability and nothing more.

RAID does it's job perfectly for high availability and will continue to do so for decades. Sorry but I have yet to see any other technology deliver the capacity I use for my small 30TB Database we have at work. Our Raid 50 array works great. We also realtime mirror that to the Backup SQL server (not for backup of data but backup of the entire server so that when SQL1 goes offline SQL2 picks up the work.)

SQL2 is backed up to a SDAT tape magazine nightly.

RAID does what it's supposed to do perfectly, it's days are not numbered because no other technology other than RAID can provide high availability.

Enlighten me (3, Insightful)

El_Muerte_TDS (592157) | more than 4 years ago | (#29463971)

(Certain) RAID (levels) address the issue of potential dataloss due to hardware malfunction. How does moving to an Object-Based Storage Device address this issue better? Actually, I don't see how RAID and OSD are mutually exclusive.

Harddisks, not RAID (5, Insightful)

Anonymous Coward | more than 4 years ago | (#29463995)

Now that's a stupid article.

It basically says, you can't read a harddisk more than X times before you get an error on some sector, so RAID is dead. That's a logical nonsequitur. RAID is a generic technology that also applies to flash memory cards, USB sticks, anything you can store data on basically. The base technique says "given this reliability, you can up the reliability if you add some redundancy". There's no link to harddisks other than that that's what they're used for right now.

Re:Harddisks, not RAID (0)

Anonymous Coward | more than 4 years ago | (#29464595)

The solution is simple. Just ask Jane to take care of it. She can do anything.

Oh wait, we don't have the ansible communication networks up yet. Nevermind....

Re:Harddisks, not RAID (2, Insightful)

J4 (449) | more than 4 years ago | (#29464907)

RAID is here to stay for a while no doubt, but it's a response to a series of problems that has problems of it's own. You can take 5+1 drives make an array where one bad chassis slot can indeed take the whole thing out, or you make a bunch of mirrors at the expense of capacity, or you can stripe one scary large fragile volume.In production it's about performance & availability. Realize that the whole data integrity thing is relative and merely an illusion. It's kinda like on Futurama when they had the tanker with 1k hulls. The only solution to the first case is double the hardware, which is a major investment and recurring cost (rack space/electricity, stamps). Murphy's law tell's us that indeed "shit happens", so there are no guarantees.

Although I didn't read the article I suspect it's promoting the cloud paradigm, which is the current ultimate expression of redundancy.

There are always more solutions... (1, Interesting)

Anonymous Coward | more than 4 years ago | (#29464009)

Probably the next meta solution after RAID 6 will be something like ZFS, where the filesystem that works not just on the fs-specific layer, but on the LVM layer so it can log CRCs of files and immediately be able to tell if a file got corrupted (and perhaps fix it with some ECC records.) One can see a filesystem not just writing a RAID layer, but taking recovery data and storing that away as filesystem metadata.

Of course, there is always doing redundant arrays of RAID clusters, say three groups, two data, one parity, or mirroring RAID 5 volumes. You have the usual tradeoffs: The more fancy the RAID scheme, the more disks you need, and the more computing you have to do for every bit thrown at and read off the array.

Long term solution? A move to something other than magnetic storage. This could be optical, it could be SSD if some advance allows very large density increases, or something unknown. The technology would have to have a rate of failure magnitudes better than magnetic, as well as a cost on par with magnetic for it to completely work. Holographic storage has languished for a while, perhaps as the technology improves for that, we may see drives using 3D blocks of that replacing the old fashioned spindles.

Ask what does Google do (0)

Anonymous Coward | more than 4 years ago | (#29464019)

Ask, ask, ask

Re:Ask what does Google do (0, Offtopic)

K. S. Kyosuke (729550) | more than 4 years ago | (#29464075)

I thought the modern version of the old saying was "Ask what would Jesus google."

Re:Ask what does Google do (2, Insightful)

Carewolf (581105) | more than 4 years ago | (#29464621)

A search engine doesn't mind losing data, most of the storage is essentially just a cache or summary of the internet and can be regenerated. That said, Google already have so many mirrors for performance reasons that actual data loss is practically impossible.

RAID is here to stay (5, Insightful)

paulhar (652995) | more than 4 years ago | (#29464029)

Disclaimer: I work for a storage vendor.

> FTA: The real fix must be based on new technology such as OSD, where the disk knows what is stored on it and only has to read and write the objects being managed, not the whole device
OSD doesn't change anything. The disk has failed. How has OSD helped?

> FTA: or something like declustered RAID
Just skimming that document it seems to claim: only reconstruct data, not white space, and use a parity scheme that limits damage. Enterprise arrays that have native filesystem virtualisation (WAFL for example) already do this. RAID 6 arrays do this.

Lets recap. Physical devices including SSDs will fail. You need to be able to recover from failure. The failure could be as bad as the entire physical device failing, or as bad as a single sector being unreadable. In the former case a RAID reconstruct will recover the data but you'll hit RAID recovery errors due to the raw amount of data that needs to be recovered. Enterprise arrays mitigate the risk of recovery errors by using RAID 6. They could even recover the data from a DR mirrored system as part of the recovery scheme.

And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.

One assumption is, at some point in the future, reconstructions will be a continual occurring background task just like any other background task that enterprise arrays handle. As long as there is enough resiliency and performance isn't impacted then it doesn't matter if a disk is being rebuilt.

Re:RAID is here to stay (4, Informative)

Kjella (173770) | more than 4 years ago | (#29464359)

And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.

I don't think you've quite understood the problem described. You can have an infinite number of parity disks, but it does you no good if recovering one data disk causes another data disk to fail.

Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.

Now we try doing the same with ten 10TB disks and the same reliability. The last disk dies and you replace it, only now you must read 10TB from each disk. Instead of adding 1% to the lifetime it adds 10% so that they've spent 20, 30, 40, 50, 60, 70, 80, 90, 100 and 10% (your new disk) of their lifetime. But now another disk fails, you can recover that but then another will fail and another and another and another.

Basically, parity does not solve that issue. If you had a mirror, you would instead copy the mirrored disk with significantly less wear on the disks. RAID is very nice as a high-level check that the data isn't corrupted but it's a very inefficient way of rebuilding a whole disk.

Re:RAID is here to stay (0)

Anonymous Coward | more than 4 years ago | (#29464481)

I don't think you can read. You can complain all you want but he already addressed your point here

One assumption is, at some point in the future, reconstructions will be a continual occurring background task just like any other background task that enterprise arrays handle. As long as there is enough resiliency and performance isn't impacted then it doesn't matter if a disk is being rebuilt.

Re:RAID is here to stay (2, Interesting)

paulhar (652995) | more than 4 years ago | (#29464485)

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.

In RAID 6 you start the rebuild and you get a single sector error from one of the drives you're rebuilding from. At that point you've got yet another parity scheme available (in the form of the RAID 6 bit) that figures out what that sector should have been and then continues the rebuild. Then you go back and decide what to do about that drive that had the second error.

A lot of drive failures aren't full head crashes or motor errors but just single sector, track, bits of dirt on the platter style errors. Other than the affected area the drive can be read.

With RAID 6 you can fail two disks completely and still access the data. You're still reading from the same ten 10TB disks in your example and if the implementation of RAID 6 is optimal (RAID-DP) you aren't having to read additional data from the same physical disks.

In the world you describe with 10TB drives it sounds like you'd just not be able to use the disks at all since any process that reads from the disks will kill them. There are a few things that could happen:

1. Disks get more reliable. Hasn't happened much yet but...
2. We switch to different packaging. Instead of making disks larger we cram more of them into the same space similar to CPU cores - same MTBF per disk but lots of them presented out by one physical interface.
3. We change technologies completely. SSD (interesting failure modes there too... needs RAID)

I guess we'll find out in only a few years...

Re:RAID is here to stay (2, Insightful)

dpilot (134227) | more than 4 years ago | (#29464851)

Even this doesn't handle the other side of the scenario...

Buy your box of drives and put them in a RAID-6. Chances are you just bought all of the drives at the same time, from the same vendor, and they're probably all the same model of the same brand. Chances are also very good that they're from the same manufacturing lot. You've got N "identical" drives. Install them all into your drive enclosure, power the whole thing up, build your RAID-6, put it into service.

Now all of your "identical" drives are running off of the same power supply, getting the same voltage. There's likely to be some temperature gradient inside the box, but overall they're all at similar temperatures. They have the same number of POH, the same number of read requests, same number of write requests. In essence, they remain very nearly "identical" through their service life.

Next, let one drive fail. What are your chances of having a second drive failure, especially when you power the RAID down to replace the first failing drive?

That's what I've heard some anecdotal evidence from, from those who manage this type of thing where I work. RAIDs tend not to have single-drive failures, or at least tend to have "time clustered" drive failures. Plan for it.

Re:RAID is here to stay (1)

jittles (1613415) | more than 4 years ago | (#29464763)

I'm no expert on physical media here but as storage space increases wouldn't you expect the drive to be able to handle more reads?

Re:RAID is here to stay (0)

Anonymous Coward | more than 4 years ago | (#29464825)

To assume that we'll have 10TB disks which will give us on average one read error every 100TB of reads seems a little pessimistic. There are two kinds of read errors that a hard disk can give you: The fatal ones and the spurious ones. Fatal ones are caused by head crashes, material fatigue and other effects which will cause the whole drive to fail in short order. Spurious errors can be caused by vibration, surface impurities and electrical interference. These errors are more likely with increasing density (or to put it another way: As the capacity increases, the frequency of these errors remains constant, so they more often over the full capacity.) These errors can cause data loss, but they don't affect the rest of the drive. This means that the disk itself can lower the failure rate through internal redundancy schemes. The typical fatal errors on the other hand do not increase with capacity. A head crash is a head crash, whether the disk stores 1GB or 1TB. That means the frequency of fatal errors goes down as the capacity goes up. (Actually as read-write speed goes up.) It is true that it is futile to build a RAID with hard disks that can only be completely read/written 10 times before a fatal error occurs, but just as futile would using such a hard disk for anything else be.

Re:RAID is here to stay (1, Interesting)

Anonymous Coward | more than 4 years ago | (#29464879)

Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.

Except that it doesn't work anywhere NEAR like that. The lifetime of disks are much, much greater than the read cost to rebuild a failed drive. You're definitely not spending 1% of its total lifetime. You're not even spending 1/1000th of that 1%. You couldn't measure the difference between having to rebuild that drive and just using the disk as a new one.

The vast majority of hard disk failures are manufacturing issues, not end-of-lifetime for an average drive issue. You don't have a raid system because the average lifetime of a harddisk is small, you get a raid system because of the outliers. Every once in a while, a manufacturer puts out a deathstar, and it's going to fail a month after you put it in. At the same time, you'll have disks in there that are going to keep chugging away for five years straight, and you'll eventually replace them because you want a bigger disk, not because they've failed.

Re:RAID is here to stay (1)

Junta (36770) | more than 4 years ago | (#29464749)

Enterprise arrays that have native filesystem virtualisation

Do we really have to put the word 'virtualisation' on everything? I don't see what aspect of the concept is remote 'virtual'. Filesystem-level or filesystem aware RAID schemes I wouldn't mind, but 'virtualisation' is being tossed around to the point of becoming a meaningless buzzword, completely stripped of its original, specific meaning.

Other than that one word, I agree with the sentiment. RAID is a sufficiently generecized concept that can cover the dumbest array configs (unclean arrays require full device resync, one bad sector read putting an array into degraded mode instantly, no filesystem awareness resorting in managing unimportant data) to smarter cases (unclean arrays being a near impossibility or resync aided by a journal to know which specific parts could have stale parity, bad sector read inducing sector rewrite if hard drive can still rewrite and issueing a warning, and schemes that know the difference between used and unused space). If you focus on the low end and think it the end-all, be-all of RAID, then yes, you'll think it's in need of immediate attention. If you understand the more sophisticated implementations, you'll realize it scales no worse than the data you use.

Re:RAID is here to stay (1)

secmartin (1336705) | more than 4 years ago | (#29464801)

In fact, ZFS has just gained support for triple-parity RAID [storage-news.com] precisely because of the long rebuild times with current-generation drives.

But given the every-increasing size of drives, moving to RAID-10 might be a good alternative; you'll need more disks to reach a certain desired array size, but rebuild times will be far lower because you don't need to do parity calculations. With RAID-1 and RAID-10, a 2TB drive can be completely rebuilt is less than 8 hours, depending on how busy it is; and you don't suffer the extreme performance penalty you get when using a RAID-5 array in degraded mode.

Hardware RAID is dead (3, Interesting)

PiSkyHi (1049584) | more than 4 years ago | (#29464049)

Hardware RAID is dead - software for redundant storage is just getting started. I am looking forward to making use of btrfs so I can have some consistency and confidence to how I deal with any ultimately disposable storage component.

The ZFS folks have been doing it fine for some time now.

Hardware RAID controllers have no place in modern storage arrays - except those forced to run Windows

Re:Hardware RAID is dead (4, Insightful)

Chrisje (471362) | more than 4 years ago | (#29464109)

First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot. For low-cost, low performance systems, software can run on your main box to perform this task, but for high-end applications you'll want dedicated hardware to take care of it, so your machine can do what it needs to do with more zeal.

So my guess is that you're not working for a storage vendor. I haven't seen many people switch to SW RAID recently. If anything, the Unix world is finally crawling out of its "lvm striping" hole. Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.

Re:Hardware RAID is dead (2, Informative)

paulhar (652995) | more than 4 years ago | (#29464141)

> First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot.

I'm not sure where in my post you saw anything about a comparison between Hardware RAID or Software RAID.

> So my guess is that you're not working for a storage vendor. I haven't seen many people switch to SW RAID recently.

I work for NetApp. I didn't think it mattered much in the post I made though. To your second point, as all of the NetApp Enterprise storage systems use software based RAID I can happily confirm that many hundreds of thousands of customers have switched to software RAID.

As you mentioned earlier though the point is moot since when you're delivering an enterprise array to a customer it doesn't matter if the array uses RAID cards provided by a 3rd party vendor, uses RAID cards built in-house, or uses software RAID to write the data that the customer gives you. The ingress point for the customer is a physical port (IP/FC typcially) and that port provides RAID capabilities. Maybe that's also hardware RAID?

Re:Hardware RAID is dead (0)

Anonymous Coward | more than 4 years ago | (#29464347)

Hint: he wasn't replying to you

Re:Hardware RAID is dead (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#29464349)

To your second point, as all of the NetApp Enterprise storage systems use fakeRAID I can happily confirm that we have been ripping off many hundreds of thousands of customers who we've to deceived they have switched to hardware RAID.

I fixed your typos.

Re:Hardware RAID is dead (3, Informative)

RulerOf (975607) | more than 4 years ago | (#29464377)

FWIW, I'm a happy 3ware customer... saddened by their sellout to LSI, but I digress.

When I think of software RAID, I think of parity data being handled by the operating system, being done on x86 chips as part of the kernel or offloaded via a driver (thinking Fake-RAID).

If you're abstracting your storage away from the operating system that uses it, say via iSCSI or NFS or SMB to a dedicated storage box, like a NetApp filer or a Celerra, then I would consider that hardware RAID, personally speaking. If you're saying that these dedicated storage boxes manage parity, mirroring and so on all done with the same chip that's also running their local operating systems, then I have to admit that yes, that sounds like software RAID to me, but the real distinction I've come to draw between software and hardware RAID is a matter of performance and feature set. If said boxes give the same or better performance (I/Ops and throughput) to a workload as a dedicated, internal storage system managed by something like my 9650SE, then hell..... who cares, right? Aside from being rather impressed that such is possible without dedicated XOR chips, that is.

Re:Hardware RAID is dead (1)

amorsen (7485) | more than 4 years ago | (#29464569)

I don't see customers ship back the SmartArray controllers.

There's no need to ship them back, especially the low end versions. The high end ones can't handle non-RAID-formatted disks, which is a bit of a pain, but they perform ok as long as you avoid using real RAID (anything that requires calculating parity.) So stick with RAID-1, and SmartArray is fine -- just have a spare controller or server around, because you can't switch controller vendor without losing your data.

Re:Hardware RAID is dead (1)

StormReaver (59959) | more than 4 years ago | (#29464949)

Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.

That doesn't mean it isn't happening. We had three SmartArray controllers fail in rapid succession. Each one we replaced failed within days, until the fourth one finally worked. That was certainly a rare situation, but SmartArrays are not a magic bullet. They sometimes fail, and they sometimes fail spectacularly, just like everything else.

Non-issue ... (3, Interesting)

Lazy Jones (8403) | more than 4 years ago | (#29464053)

Modern RAID arrays show no dramatic performance degradation while rebuilding, also with RAID-50/RAID-60 arrays, only a fraction of the disk accesses is slower than usually when a single drive is replaced.

For enterprise level storage systems, this is also a non-issue because of thin provisioning.

I thought RAID was about spindle count (4, Insightful)

BlueParrot (965239) | more than 4 years ago | (#29464073)

I admit I'm not an expert, but I was under the impression that RAID was mainly about ensuring you a large number of spindles and some redundancy so you can serve data quickly even if a couple of drives fail while the servers are under pressure. Surely you would not rely on a RAID to avoid data loss since you should be keeping external backups anyway?

Re:I thought RAID was about spindle count (4, Informative)

gedhrel (241953) | more than 4 years ago | (#29464245)

You don't rely on RAID to avoid data loss; you rely on it as a first line in providing continuity. We run backups of large systems here, but we tend to do other things too: synchronous live mirroring between sites of the critical data. And beter system design. There are some systems where, whilst we _could_ go back to tape (or VTL) at a pinch, having to do so would be a disaster in itself.

We're designing systems that permit rapid service recovery (the most live critical data) and a second tier of online recovery to get the rest back. We just can't afford the downtime.

Double-spindle failures on RAID systems are just one of those things that you _will_ see. Deciding whether a system deserves some other measure of redundancy is mostly an actuarial, rather than a technical, decision.

Re:I thought RAID was about spindle count (1)

Sobrique (543255) | more than 4 years ago | (#29464445)

Yeah, RAID is just playing statistics - you're taking a chance that during your rebuild window, you don't get a second drive outage in the same RAID set. The bigger the RAID set, the lower the chance is, but the chance is always present. Even if you go to extremes like triple mirror, remote site replicas... the chance of a compound 6 drive failure exists - it's just the odds are phenomenally low, that at that point you're far more likely that what's happened is that a plane has fallen out of the sky onto your datacentre instead.

well they are right about one thing (0)

Anonymous Coward | more than 4 years ago | (#29464129)

the rebuiling times are really astronomical. I don't know how my arrays do it, but it routinely costs me 3+ hours to rebuild

that, and the various scanning / fixing / searching tasks.. endless, if you work with the larger drives, even if sata attached

'course it's nice to have large HD's on desktop pc's, but when you have to fix bosses' PC and it takes 8 (!!) hours to clone, scan and repair.. while he can't work.. that's no good.

my 2cts

Wrong assumptions (5, Insightful)

vojtech (565680) | more than 4 years ago | (#29464131)

The article assumes that when within a RAID5 array a drive encounters a single sector failure (the most common failure scenario), an entire disk has to go offline, be replaced and rebuilt.

That is utter nonsense, of course. All that's needed is to rebuild a single affected stripe of the array to a spare disk. (You do have spares in your RAID setups, right?)

As soon as the single stripe is rebuilt, the whole array is again in a fully redundant state again - although the redundancy is spread across the drive with a bad sector and the spare.

Even better, modern drives have internal sector remapping tables and when a bad sector occurs, all the array has to do is to read the other disks, calculate the sector, and WRITE it back to the FAILED drive.
The drive will remap the sector, replace it with a good one, and tada, we have a well working array again. In fact, this is exactly what Linux's MD RAID5 driver does, so it's not just a theory.

Catastrophic whole-drive failures (head crash, etc) do happen, too. And there the article would have a point - you need to rebuild the whole array. But then - these are by a couple orders of magnitude less frequent than simple data errors. So no reason to worry again.

*sigh*

Re:Wrong assumptions (2, Insightful)

Anonymous Coward | more than 4 years ago | (#29464351)

Even if only a sector in a disk has failed, I'd mark the entire disk as failed and replace it as soon as I could. Maybe I'm paranoid, but I've seen many times that when something starts to fail, it continues failing at increasing speed.

Re:Wrong assumptions (1)

Junta (36770) | more than 4 years ago | (#29464837)

Hence why it is a warning condition.

First off, people keep saying 'disks never report bad blocks until they've exceeded their bad block count'. That is just wrong, it holds true for write operations, but it *cannot* magically do that on read (if it could read the sector, it isn't an error, if it can't, than it certainly can't reconstruct the data that it is missing. There may be some more complications involved, but that describes one scenario accurately.

If the bad sector relocation count is exceeded, then the drive is failed. Maybe frequent relocations before exhausting that overhead would be a sign too, but a one-off bad block on read is not something to be overly worried about.

Re:Wrong assumptions (0)

Anonymous Coward | more than 4 years ago | (#29464915)

Unfortunately, many RAID implementations do exactly that -- a single read/write error and the drive is marked bad and taken offline, and a rebuild is required. That can be avoided. If, assuming the article is correct, you get a second disk failure while rebuilding one out of 234 rebuilds, then you need to REDUCE the frequency of rebuilds. If you have 1 rebuild a year, that means you're good for 200 years. I'm OK with that.

But still, by using intelligent remapping with a single error rather than taking the whole drive offline and requiring a rebuild, you can drastically reduce the number of rebuilds.

Other technologies, such as unRAID, make failure much less of a problem -- striped raid you loose the whole array and have to try expensive RAID recovery services. With non-striped raid, such as unRAID, each drive in the array is a standard filesystem, and has entire files, and can be simply put in another system and the data copied off of it.

If you want smaller drives... (4, Interesting)

asdf7890 (1518587) | more than 4 years ago | (#29464133)

If you want smaller drives to speed up rebuild times then, erm, buy smaller drives? You can get ~70Gb 10Krpm and 15Krpm drives fairly readily - much smaller than the 500-to-2000-Gb monsters and faster too. You can still buy ~80Gb PATA drives too, I've seen them when shopping for larger models, though you only save a couple of peanuts compared to the cost of 250+Gb units.

If you can't afford those but still don't want 500+Gb drives because they take too long to rebuild if the array is compromised and needs a rebuild, and management won't let you buy bog standard 160Gb (or smaller) drives as they only cost 20% less than 750Gb units without the speed benefits of the high cost 15Krpm ones, how about using software RAID and only using the first part of the drive? Easily done with Linux's software RAID (partition the drives with a single 100Gb (for example) partition, and RAID that instead of the full drive) and I'm sure just as easy with other OSs. You'll get speed bonuses too: you'll be using the fastest part of the drive in terms of bulk transfer speed (most spinning drives are arranged such that the earlier tracks have higher data density) and you'll have lower latency on average as the heads will never need to move the full diameter of the platter. And you've got the rest of the drive space to expand onto if needed later. Or maybe you could hide your porn stash there.

ZFS, Anyone? (2, Interesting)

Tomsk70 (984457) | more than 4 years ago | (#29464145)

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.

However the principle is sound, and I'm sure this will become standard before long - the only trouble being that HP, Dell and the like can't simply offer upgrades for existing RAID cards - due to the nature of ZFS, it needs a 'proper' CPU and a gig or two or RAM. Even so, it does protect against many of the problems now besetting RAID (which was never meant to handle modern, gargantuan disk sizes).

Re:ZFS, Anyone? (1)

c6gunner (950153) | more than 4 years ago | (#29464493)

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.

That's horrible!!! Even when I was running ZFS under FUSE on Ubuntu, it didn't take out any of my data. I did that for well over a year, and felt pretty nervous about it, but never had an issue. You need to ditch FreeNAS ASAP and get your server on OpenSolaris.

Re:ZFS, Anyone? (1)

Cheeze (12756) | more than 4 years ago | (#29464771)

I ran ZFS/FUSE on Ubuntu 64-bit for about 3 months. Aside from some performance issues, it worked great up until about 20-30 reading and writing threads, when it crashed. It was easy enough to restart the file system, but I also had to restart the 15 VMs I had running on it. It would crash predictably though, so that's something.

ZFS under FreeBSD or Solaris is so much nicer. The performance even on the same hardware is many times better in straight reading and writing throughput.

Fountain codes? (3, Interesting)

andrewagill (700624) | more than 4 years ago | (#29464155)

What about fountain codes [wikipedia.org] ? The coding there is capable of recovering from a greater variety of faults.

Re:Fountain codes? (1, Interesting)

Anonymous Coward | more than 4 years ago | (#29464311)

Why fountain codes ? Any other erasure code http://en.wikipedia.org/wiki/Erasure_codes [wikipedia.org] will do the job. Parity and Reed Solomon codes used in RAID are in fact erasure codes.

This video would disagree... (0)

Anonymous Coward | more than 4 years ago | (#29464167)

http://www.youtube.com/watch?v=96dWOEa4Djs

ZFS (5, Informative)

DiSKiLLeR (17651) | more than 4 years ago | (#29464179)

This is something the ZFS creators have been talking about for some time, and been actively trying to solve.

ZFS now has triple parity, as well as actively checksumming every disk block.

Re:ZFS (5, Informative)

DiSKiLLeR (17651) | more than 4 years ago | (#29464209)

I thought I should add:

ZFS speeds up rebuilding a RAID (called resilvering) over traditional non-intelligent or non-filesystem based RAIDS by only rebuilding the blocks that actually contain live data; there's no need to rebuild EVERYTHING if only half the filesystem is in use.

ZFS also starts the resilvering process by rebuilding the most IMPORTANT parts first; the filesystem metadata and works its way down the tree to the leaf nodes rebuilding data. This way, if more disks fail, you have attempted to rebuild the most data possible. If filesystem metadata is hose, everything is hosed.

ZFS tells you which files are corrupt, if any are, and insufficient replicas exist to due failed disks.

All this on top of double or triple parity. :)

Re:ZFS (1, Insightful)

Anonymous Coward | more than 4 years ago | (#29464677)

But does on run on Linux?

I wish someone would just make a friggin kernel patch to add real ZFS support to Linux. You can't distribute pre-built Linux kernels with ZFS support due to licensing issues, BUT you could distribute a kernel patch that we can then apply to our kernels and compile ourselves and everything would be OK legally as long as you don't redistribute the patched binaries.

Re:ZFS (0)

Anonymous Coward | more than 4 years ago | (#29464863)

But when will you be able to raidz across mirrors?

Our current setup (ZFS mirror across two hw-raid external units) allows an entire unit to fail (power to rack, link or whatever) and then a disk to fail in the other unit. It keeps the ZFS data integrity and recovery aspect at the cost of the probably-not-that-likely "RAID5 write hole".

I can't see how you do that yet with a pure ZFS solution. The ability to raidz across mirrored pairs (each side being in a different external JBOD) would eliminate the "write hole" while still coping with the suggested failure mode (which it isn't too hard to imagine happening).

Parity declustering (4, Interesting)

Biolo (25082) | more than 4 years ago | (#29464273)

Actually I like the parity declustering idea that was linked to in that article, seems to me if implemented correctly it could mitigate a large part of the issue. I have personally encountered the hard error on RAID5 rebuild issue, twice, so there definitely is a problem to be addressed...and yes, I do now only implement RAID6 as a result.

For those who haven't RTFATFALT (RTFA the f*** article links to), parity declustering, as I understand it, is where you have, say, an 8 drive array, but where each block is written to only a subset of those drives, say 4. Now, obviously you loose 25% of your storage capacity (1/4), but consider a rebuild for a failed disk. In this instance only 50% of your blocks are likely to be on your failed drive, so immediately you cut your rebuild time in half, halving your data reads, and therefore your chance of encountering a hard error. Larger numbers of disks in the array, or spanning your data over fewer drives, cuts this further.

Now, consider the flexibility you could build into an implmentation of this scheme. Simply by allowing the number of drives a block spans to be configurable on a per block basis, you could then allow any filesystem that is on that array to say, on a per file basis, how many disks to span over. You could then allow apps and sysadmins to say that a given file needs to have the maximum write performance, so diskSpan=2, which gives you effectively RAID10 for that file (each block is written to 2 drives, but with multiple blocks in the file is likely to be written to a different pair of drives, not quite RAID10, but close). Where you didn't want a file to consume 2x its size on the storage system, you could allow a higher diskSpan number. You could also allow configurable parity on a per block basis, so particularly important files can survive multiple disk failures, temp files could have no parity. There would need to be a rule however that parity+diskSpan is less than or equal to the number of devices in the array.

Obviously there is an issue here where the total capacity of the array is not knowable, files with diskSpan numbers lower than the default for the array will reduce the capacity, numbers higher will increase it. This alone might require new filesystems, but you could implement todays filesystems on this array as long as you disallowed the per-block diskSpan feature.

This even helps for expanding the array, as there is now no need to re-read all of the data in the array (with the resulting chance of encountering a hard error, adding huge load to the system causing a drive to fail, etc). The extra capacity is simply available. Over time you probably want a redistribution routine to move data from the existing array members to the new members to spread the load and capacity.

How about you implement a performance optimiser too, that looks for the most frequently accessed blocks and ensures they are evenly spread over the disks. If you take into account the performance of the individual disks themselves, you could allow for effectively a hierarchical filesystem, so that one array contains, say, SSD, SAS and SATA drives, and the optimiser ensures that data is allocated to individual drives based on the frequency of access of that data and the performance of the drive. Obviously the applications or sysadmin could indicate to the array which files were more performance sensitive, so influencing the eventual location of the data as it is written.

Remembering an article earlier this week: (3, Interesting)

Chrisq (894406) | more than 4 years ago | (#29464315)

Will scalable distributed storage systems like Hadoop [wikipedia.org] and Google File System take over from RAID?

RAID concept is fine, it's that HDs are too big (5, Interesting)

trims (10010) | more than 4 years ago | (#29464323)

As others have mentioned, this is something that is discussed on the ZFS mailing lists frequently.

For more info there, check out the digest for zfs-discuss@opensolaris.org

and, in particular, check out Richard Elling's blog [sun.com]

(Disclaimer: I work for Sun, but not in the ZFS group)

The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years. Fundamentally, today's hard drive is no more than 100 times as fast (both in throughput and latency) than a 1980s one, while it holds well over 1 million times more.

ZFS (and other advanced filesystems) will now do partial reconstruction of a failed drive (that is, they don't have to bit copy the entire drive, only the parts which are used), which helps. But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5). It's all due to the horribly small throughput, maximum IOPs, and latency of the hard drive.

SSDs, on the other hand, are no where near the problem. They've got considerably more throughput than a hard drive, and, more importantly, THOUSANDS of times better IOPS. Frankly, more than any other reason, I expect the significant IOPS of the SSD to signal the death knell of HDs in the next decade. By 2020, expect HDs to be gone from everything, even in places where HDs still have better GB/$. The rebuild rates and maintenance of HDs simply can't compete with flash.

Note: IOPS = I/O Per Second, or the number of read/write operations (irregardless of size) which a disk can service. HDs top out around 350, consumer SSDs do under 10,000, and high-end SSDs can do up to 100,000.

-Erik

Re:RAID concept is fine, it's that HDs are too big (1)

c6gunner (950153) | more than 4 years ago | (#29464529)

But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5).

Huh? How big is your array? 2,500 Petabytes?

I've got 2 RAIDZ zpools on one server, one is 5x500GB, the other is 6x1TB. When I ran some tests with the arrays half-filled, the 500GB drives rebuild in around 2 hours, the 1TB in around 5. That gives me a rebuild time of around 10 hours for a FULL terabyte array. That's quite a bit shorter than your 2-3 weeks.

Re:RAID concept is fine, it's that HDs are too big (1)

Schraegstrichpunkt (931443) | more than 4 years ago | (#29464835)

Do you understand the difference between a pathological case and a common case?

Re:RAID concept is fine, it's that HDs are too big (1)

Joce640k (829181) | more than 4 years ago | (#29464885)

He said "pathological case", not "average".

Wrong title. Or dramatization again? (1)

dostick (69711) | more than 4 years ago | (#29464381)

Article should be titled "Parity - based RAID days are numbered". There's nothing wrong with RAID 1,0, 10

Re:Wrong title. Or dramatization again? (2, Informative)

defireman (1365467) | more than 4 years ago | (#29464491)

RAID 0 does not offer any redundancy. Just a performance increase from reading simultaneously from 2 drives.

1 error = 1TB rebuild (1)

valentyn (248783) | more than 4 years ago | (#29464401)

The real problem with "classic" RAID is that 1 single error means a total rebuild of the array.

Look the solution is obvious (5, Funny)

jayhawk88 (160512) | more than 4 years ago | (#29464417)

The cloud. Just cloud it, baby. Nothing bad ever happens in the cloud; they're so white and fluffy after all.

doesn't raid 10 solve this? (2, Interesting)

davros-too (987732) | more than 4 years ago | (#29464441)

Um, don't schemes like raid 1+0 solve the parity rebuild problem? Even in the worst case of full disk loss, only one disk needs to be rebuilt and even for a large disk that doesn't take very long. Am I missing something?

Tahoe-LAFS (0)

Anonymous Coward | more than 4 years ago | (#29464465)

The RAID concept can be extended to multiple PCs forming a storage grid. One open-source implementation is Tahoe LAFS [allmydata.org] .

RAID 4 has a dedicated parity drive, not 5 (4, Interesting)

Targon (17348) | more than 4 years ago | (#29464615)

RAID 4 is where you have one dedicated parity drive. RAID 5 solves this by spreading the parity information for each drive to all the other drives in the array. RAID 6 adds a second parity block for increased reliability, but as a result of the increased write for that extra parity block, it slows down write speeds.

The real key to making RAID 4, 5, or 6 work is that you really need 4-6 drives in the array to take advantage of the design. I wouldn't say that it will fall out of favor though, because having solid protection from a single drive going bad really is critical for many businesses. Backups are all well and good for if your system crashes, but for most businesses, uptimes are more critical yet. So, backups for data so corruption problems can be rolled back, and RAID 5,6,10 for stability and to avoid having the entire system die if one drive goes bad. What takes more time, doing a data restore from a backup for when an individual application has problems, or having to restore the entire system from a backup, with the potential that the backup itself was corrupted?

With that said, web farms and other applications can get away with just using a cluster approach instead of a single well designed machine(or set of machines) have become popular, but there are many situations which make a system with one or more RAID arrays a better choice. The focus on RAID 0 and 1 for SMALL systems and residential setups has simply kept many people from realizing how useful a 4-drive RAID 5 setup would be.

Then again, most people go to a backup when they screw up their system, not because of a hard drive failure. With techs upgrading hardware before they run into a hard drive failure, the need for RAID 1, 4, 5, and 6 has dropped.

I will say this, since a RAID 5 array can rebuild on the fly(since it keeps working even if one drive fails), the rebuild time itself does not significantly impact system availability. Gone are the days when a rebuild has to be done while the system is down.

Crystals? (1)

Hitman_Frost (798840) | more than 4 years ago | (#29464623)

Sorry if this is a bit offtopic, guys.

I noticed the crystals tag on the story, which reminded me of the old Star Trek episodes where someone would open a case of storage crystals, select one, and then access some tremendously huge amount of data on a local terminal using it.

The thought that popped into my head the other day was this - how do they always seem to know what crystal to select? There would often be 20 - 25 in a case, and they were all unlabelled!

I had an amusing image of some kind 100 petabyte crystal technology marred by the user's sticking labels on the sides of them with "movie collection" scrawled in biro!

To summarize the summary, (0)

Anonymous Coward | more than 4 years ago | (#29464695)

In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data.

So, in summary, current RAID designs have problems with large drives. This basically means that you'll encounter issues. The simplest way of saying this is that it fails. A quick recap: RAID has problems.

RAID6 with enterprise hardware is reliable (2, Interesting)

niola (74324) | more than 4 years ago | (#29464713)

I use RAID6 for several high-volume machines at work. Having double parity plus a hot spare means rebuild time is no worry.

But if you are not a fan you can always throw something together with ZFS's RAIDZ or RAIDZ2 which is also distributed parity but the ZFS filesystem checksums and keeps multiple (distributed) copies of every block to detect and fix data corruption before it becomes a bigger problem.

People using ZFS have been able to detect silent data corruption from a faulty power supply that other solutions would never have found just because of the checksumming process.

Raid does not equal a valid backup (0)

Anonymous Coward | more than 4 years ago | (#29464821)

No raid based solution equals having a backup... period.

If your worried about down time, build in redundancy as was mentioned several times above...

We run raid5 and raid 10 on various systems for their data, and backup to multiple destinations (tape and hard drive)... we can afford some downtime if 0things fail...

Not sure I seal a real problem...

I'm not sure I get it (2, Interesting)

Joce640k (829181) | more than 4 years ago | (#29464913)

Is he saying that you can never read a whole hard disk because it will fail before you get to the end?

That's what it seems like he's saying but my hard disks usually last for years of continuous so I'm not sure it's true.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...