RAID stands for "redundant array of independent disks", which Wikipedia describes as:
... a data storage virtualization technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy or performance improvement.
Why Should I Care?
The slowest part of a computer is usually the permanent storage where data is kept. RAID can increase the bandwidth at which data is read or written which can remove key bottlenecks in your application. RAID can also provide us with redundancy so that if a single drive fails, our system carries on as if nothing happened. However this is no substitute for creating backups, as it does not prevent data from being deleted by malicious users etc.
There are many levels of RAID but a lot of them are actually just subtle changes to the main three which I will describe first.
Data is written evenly split across all of the drives. This results in the best performance of any RAID level, but is also means that there is no redundancy, so if any of the drives fail, all the data is the array is lost. Adding drives to a the array linearly increases the likelihood of losing all of the data due to a drive failure.
Instead of splitting data across drives, it is duplicated, or "mirrored" as it is written. This means that write speeds are as fast as the slowest drive, but theoretically, the array should be able to gain increased read speeds. It also means that the drives effectively lose half of their storage capability.
The main reason for choosing this type of RAID is that the system is guaranteed to be able to handle a single drive failure. The more drives there are in such a system, the more likely it is that the array will able to handle multiple simultaneous drive failures.
One can further increase the mirror level so that more than one copy is made of the data in order to further increase redundancy. A mirror level of 2 means that the array is guaranteed to be able to handle two simultaneous drive failures, but is likely to be able to handle more. The array only fails if all copies of a single piece of data is lost.
RAID 5, at its most basic level, provides protection from a single drive failure, whilst only sacrificing 33% of storage capacity compared to the 50% that RAID 1 suffers. It also achieves faster write speeds than RAID 1 by continuing to split data across drives similarly to RAID 0. This sounds too good to be true, but is achieved through "parity".
Drive parity revolves around the idea that if you have two bits (literally) of information, you can calculate the third. One of these bits is called the "parity" bit and only indicates if the two bits are the same or different. In this example, we will use 1 to mean that the two bits are different, and 0 means that they are the same.
drive 1, drive 2, Parity Bit 1 1 0 0 0 0 1 0 1 0 1 1
Looking at the chart above, if we were to imagine that drive 2 failed, we could calculate its values by comparing drive 1 and the parity bit. E.g. when drive 1's bit is 1 and the parity bit is 0, drive 2's bit must have been 1. If drive 1 failed, we could do the same with drive 2 and the parity bits, and if the drive containing the parity bits failed, then what does it matter anyway?
Note: RAID 5 is unlikely to keep data sorted in this way where all the parity bits are kept on one drive. This would result in terrible performance if one of the data drives failed because every request for data on that drive would require a calculation. By spreading the parity bits across all of the drives, one can ensure a consistent "load" on the CPU whichever drive were to fail. The only requirement is that each set of bits is kept on separate drives.
Since RAID 5 offers increased write speed over RAID 1, yet still offers redundancy whilst sacrificing less storage in order to do that why don't we always use that?
- Because each bit in a set needs to be on a a different drive, and there are 3 bits, RAID 5 requires at least 3 drives compared to only 2 with RAID 0 OR RAID 1.
- When writing chunks of data, your CPU has to keep calculating and updating the parity which reduces write speed, whilst increasing CPU utilization.
- When a drive fails, the RAID 5 system has to spend a long time rebuilding and recalculating the entire array, unlike RAID 1 where it can just copy data which is faster and simpler, especially with RAID 1 across more than 2 drives, where it can be reading from multiple sources. This rebuilding stage can be intensive on the other drives and it is not uncommon for another drive to fail during this period, resulting in the loss of the array.
- Some people try to resolve this issue by using two parity blocks per set of bits instead of 1. Thus the system could handle another drive failure during the rebuild process, but this also puts the minimum number of drives up to four instead of three. This is often referred to as RAID 6.
This form of RAID is just a combination of RAID 1 and 0. One gains the redundancy through mirroring data, whilst also gaining the performance boost that striping data provides. Also, RAID 10 still only loses half the storage capacity, the same as RAID 1. Since parity is not being used there is less of a load on the CPU, and rebuilding the array is less intensive on the remaining drives.
In theory, one should be able to implement RAID 10 on just 3 drives as shown in the diagram below where "m" denotes a mirror copy. However, most systems require at least 4 drives to implement RAID 10 as they desire for each bit in a set to be on a different drive.
a1 a2 a1m a2m b1 b2 b1m b2m c1 c2 c1m c2m
As you can see, there are two copies of each piece of data, and data is still striped across different drives. If any single one of the drives fails, the system would have another copy of the data on the remaining drives. However, every set of bits has at least two bits on the same drive.
Another downside to RAID 10 is that different tools for implementing it can have different restrictions. For example, whilst mdadm will allow you to create RAID 10 across any number of drives with a minimum of 4, LVM will only allow you to create it across an even number of drives. Thus if you have 5 drives, mdadm may be your solution.
Note: Whilst RAID 10 is a combination of RAID 0 and 1, one should specify RAID 10 when building the array wherever possible, rather than manually building a RAID 1 array over RAID 0, or vice-versa. This allows the system to perform mirroring at the stripe level instead of the logical volume level as the layout doesn't need to be symmetrical.
You may have head of RAID 11. This is essentially RAID 10 with a drive sitting in the system ready to act as a replacement as soon as another fails. Thus, the owner never needs to worry about swapping a drive as soon as one fails in order to start the rebuild process. Instead, they can buy the replacement drive at that point in time and swap out the dud when the replacement arrives.