RAID Failures: RAID Gets Drive-Hungry

There are various forms of RAID often referred to as levels.  This, however, is misleading because it gives an impression that higher "levels" are better.  Some may argue that higher levels offer more reliability but, truth be told, they are just different ways of allowing multiple devices (usually drives) to appear as one.

Most people who deal with a lot of drives are familiar with RAID5, which is described as striped set with distributed parity.  All the technical stuff aside, that means you can lose any single disk and the rest can rebuild the data that was contained on it.  All you have to give up is one drive worth of space.  When a drive fails, you just replace it and tell the device or software that is doing all this work for you to rebuild the information - possibly by just the act of physically replacing the drive.  You may even use a spare drive so it can be done completely automatically (recommended).

What's all this about?  Well, drives fail fairly often and a grouping of multiple drives fail even more often.  Recreating data on a drive takes time.  As drives get bigger, they take more time.  If you are actively using these disks, it takes even more time.  All the while you are stressing these disk to their maximum increasing their chance of failure.  To make things worse, basically all RAID setup will lose all your data if you have a failure during these rebuilds.  So what to do?
Enter RAID6, Striped set with dual-distributed parity.  This means you can lose any single disk during a drive rebuild and still have no problem  - and at the cost of only one extra drive worth of data compared to RAID5.  Of course, it begs the question: When, if not now, will we need a fix that allows even more failures?

Enter raidz3. Yup you guest it. It allows a single drive to fail when you are rebuilding two others.  Lucky for those of you who are sick of the repetition, that's the end of it ... for now.  Where do you get raidz3?  It is currently available as part of ZFS in OpenSolaris and in the Sun Storage 7000 Series. Coming to Solaris 10 soon. There are also raidz and raidz2 which are variations of RAID5 and RAID6, respectively.  I'm thinking the next addition to raidz technology (raidzx?) will allow for a variable number of failures.  If you feel the need for more now, you could setup a 5-way (or more) mirror with ZFS.  Of course, that would consume at least 4 out of every 5 of your drives, allowing a maximum data usage of 20% of total raw capacity at best.

The point of these double and triple parity protection schemes is to keep your volume/pool from failing during a rebuild, but in order to do that you will need hot spares.  One for every tolerable failure is probably a good place to start.  If not, you'll need at least one less than that (ex: 2 spares for raidz3).  Otherwise you'll be missing the point, because the larger number of parity drives won't be much, if any, better.  Now if you don't plan on regularly swapping out drives, and plan on using your array for a  significant amount of time, you will need even more spares.  If you have someone onsite constantly monitoring and replacing failures immediately in your systems, you may be able to get away with less or even no spares but I wouldn't bet my data on that.  Or someone else's for that matter.

As you can see, the parity schemes have an increasing demand for drive count.  Raidz3 cannot be done with less than 4 drives and by what I am suggesting you need at least 7.  Using 7, however, would only allow for one drive worth of space which isn't a particularly attractive option so I'd suggest more.  Since ZFS offers it, and there is often a huge capacity here you should probably, for performance sake, use at least 1 read cache drive and 2 mirrored drives for a write cache.  I'll talk more about ZFS cache drives another time but I mention them now because it's 3 more drives for a minimum recommended configuration.  Ten drive slots filled to represent one drive worth of data.  Of course drive usage actual improves with drive count for these schemes.  For example a X4500 has 48 internal drive slots which, if filled, could improve the previously mention 1 in 10 slot config to 39 out of 48. This would allow you to use approximately 81% of the total possible raw capacity.  Much better drive usage than 2-way(standard) mirroring which is always 50%.  Be careful how far you stretch this, though, or not only will you always have drives rebuilding but they may not finish in time and the pool of storage will fail.

Something I can't stress enough is that all this protection does not remove the need for backup!

There is a great deal more that I can say about this subject but I think that is more than enough for now.

Comments (0)Add Comment

Write comment

busy

Disclaimer

Copyright ©2012 Acclinet Corporation. All rights reserved. Sun, Sun Microsystems, Java, Netra, Solaris, Sun Fire and Sun StorEdge are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and/or other countries. Other product, service and company names mentioned herein may be trademarks of their respective owners. Check out our online store.