What is RAID (Redundant Array of Inexpensive Disks)?

What is RAID (Redundant Array of Inexpensive Disks)

RAID is a storage technology based acronym which stands for Redundant Array of Independent Disks or Redundant array of Inexpensive disks. The letter “I” stands for two words, which are mentioned above; but the term “Independent” sounds more appropriate and is far or less subjective to it and depends on the context of the conversation.

Calculate which RAID works best for you with our RAID calculator.

From the layman’s point of view, RAID can be termed as a confusing technical phrase, but if subjected to a good explanation, it can be as simple as it is. RAID is configuration of two or multiple hard drives to work as a single unit on a single computer system. Although, the configuration may vary, as per the intended use, but the main concept of it is to provide high- availability installation, which has to isolate the whole data storage from situations like system failures or infrastructure failures.

In the world of Information Technology, the term “high-availability” is obtained by achieving data redundancy. This can be done, by deploying RAID arrays into the data center, which reduce the likelihood of system failures, occurring due to hard drive failures. This prevents the properly configured disk arrays from failing and so, its general operations will also be unaffected making users of data center, never experience downtime.

In this article, below mentioned technological terms will be repeated occasionally and so to simplify things to the reader, the terms have been explained in brief.

Redundancy – Redundancy refers to having numerous components which offer the same function, so that the system functioning can continue in the event of partial system failure.

Fault Tolerance – fault tolerance means, in the event of a component failure, the system is designed in such a way that a backup component will be available, in order to prevent loss of service.

Types of RAID

RAID technology deployment is possible by using a software as well as hardware.

Software Raid vs. Hardware Raid

Software RAID
– Most of the popular operating systems are coming up with software raid support and so, the cost of deployment decreases a bit. However, software raid works on partition level and so it increases complexity if the number of partitions is increased and at this stage hardware raid comes into affect. In case of power failure, software raid fails completely, while in hardware raid, the battery back up continues with the pending writes. The software raid offers high performance levels in Raid 0 and Raid 1 levels. But as the levels increase and in nested raid levels, the performance decreases as it is dependent on CPU performance and current load. Hot swapping of disks is supported in software raid, which means replacing of failed hard disk with a new one. Software RAID offers hot spare support, where spare hard drives are installed in raid arrangement, which stay inactive until an active drive fails and then automatically takes on the working of the fault disk to reduce downtime. Software Raid usage is seen in no vendor lock-ins, single server workstations, in low cost solutions and is better for RAID 0 and RAID 1 levels and is ideal for home and small businesses.

Hardware RAID
– The cost factor is a bit high, if an enterprise deploys hardware raid in its data center. But the factors such as complexity in management are low and the back up caching is also available in this raid through battery backup. The performance of hardware RAID is excellent and it supports disk hot swapping and hot spare support. Hardware Raid supports higher write throughputs and offers faster reconstruction of data loss. Ideal for all mission critical and performance oriented solutions.

Data storage in RAID architecture

It is a fact that RAID technology tricks the computer system into believing that it is a single hard disk, but the fact is that it distributes the reproduced data across several hard disk drives. The distribution of data is done in two ways, which are two generalized concepts. Mirroring is one of the ways, which offers replication of data on another disk and the second concept is striping, where splitting of reproduced data takes place across the available disk drives. Parity is a concept, which is also streamlined into RAID technology as another way of storage method. It involves saving of information across the disk arrays, so that, the same information can be used to recreate or reconstruct the affected data, which is otherwise filled with errors or data loss, when disk drive fails.

The concept of redundancy comes into affect, as soon as the disk drives fail, while rest of the disk arrays continue to function. But the wise duty of the IT administrator is to solve the disk drive failure, as long as it is diminutive. Its prolonged existence can make other drives vulnerable to failures. This can be done, by replacing the faulty drives, without affecting the whole system and this terminology is called as hot swapping. But this type of replacement flexibility, without affecting the whole system working is seen in only certain types of RAID types. So, the data recovery strictly depends on the RAID level, which is deployed on the data center. While planning the RAID architecture, the hot swapping technique must be given utmost importance, in order to replace the faulty drives without affecting the whole working system.

In order to ensure foolproof redundancy, enterprises opt for Nested RAID levels which constitute the combination of two or more RAID configurations that offer advantages of both the methods. A paradigm is RAID 10, which is a combination of RAID level 1 and RAID level 0, where RIAD level 1 configurations works on the RAID 0 technique.

Requirements for Deploying RAID

Hardware
– RAID can be deployed on hard drives which include SATA, ATA and SCSI. The number of hard disks, required will be depending on selection of the RAID level configuration. It is always recommended for the use of matched hard drives of same capacities. It is a fact that most of the arrays will be using the capacity of the smallest drive. So, if a 250GB capacity drive is deployed in the RAID configuration, which has an 80GB hard disk drive, then the 170GB will be waste and will only be useful in JBOD Or just a bunch of disk raid level. Moreover, the hardware drives must not only match in capacities, they must in terms of writing speeds, transfer rates and so on. RAID controllers are deployed for SCSI, SATA and ATA hard disks and some systems also allow RAID arrays to be operated across controllers of different formats.

For those who are ignorant about RAID technology, here is a detail. RAID controller is a hardware, through which all the hard drives are connected and this is responsible for processing of data. It is similar to the motherboard arrangement where typical drive connections are found.

The requirement for a hot swappable drive bay is also essential, in conditions, where a hard disk drive turns faulty and needs to be replaced. It allows the replacement of failed hard drive from a live system by the simple method of unlocking the drive cage out of the case and then sliding in the locked into a the place.

Software requirements
– RAID technology can be deployed on any modern operating system, if it has all appropriate drivers provided by manufacturers of RAID controllers. The operating system and the software needs should be upgraded from the beginning and prior to this step, all the data must be backed up, so that it can be restored on to the newly laid RAID technology. If the RAID array is specifically maintained for data storage and not for any other operating system run, then things get simple.

RAID Level Configurations

There are almost dozen RAID level schemes prevailing in RAID technology. But below only the most prevailing RAID level schemes are summarized below.

RAID Level 0
– This RAID level 0 configuration doesn’t provide redundancy and so it doesn’t exactly fit into the arrays of RAID technology. In this RAID level 0, two disks are used to write data to two drives in an alternating way, which is striping. This can be explained with a paradigm. Let us assume, ten chunks of data say 1,2,3,4,5,6,7,8,9,10. From it 1, 3,5,7,9 will be written to drive one and the rest i.e. 2, 4, 6,8,10 will be written to second drive and that too in sequential order. This splitting of data will allow doubling of speed of a single hard drive and will also enhance its performance. But if one drive fails in this array, then data loss is incurred. The capacity of Raid 0 is equal to the whole sum of the individual drives. That is, if two 80GB hard disks will be deployed into RAID, then the total capacity will be around 160GB of the RAID 0 level.

RAID Level 1
– From this RAID level 1, the redundancy factor becomes influential. Minimum use of hard drives in this level will be two and the data is written to both drives. It is like cloning or mirroring the data of the first drive to the second one and making drive one identical to the second one. If the first drive fails, then the data backup will be available from the drive 2. In this RAID level, the actual capacity of RAID 1 level array will be equal to half the capacity of the sum of individual drives. That is, if two 160GB drives are deployed, then the total capacity will be just 160GB only.

RAID Level 2
– In this form of RAID data is striped in a way that each sequential bit is on different drive. Each data word is having its own hamming code and on each read, the Hamming code verifies the data accuracy and also corrects the single disk errors. In this level, the array can recover from multiple and simultaneous hard drive failures. Minimum two drives are required in this RAID level 2.

RAID Level 3
– In this level of RAID a minimum of three drives is required for implementation. The data block is split and is written on data disks. Stripe parity is generated on writes and this writing is recorded on parity disks and can be checked on reads. A raid 3 array can recover from hard drive failures and is deployed in environments where applications need high speed throughputs, which can be video production, video editing and live streaming. In this level the read/write speed is high.

RAID Level 4
– In this RAID level the requirement three drives is seen and in this level, the entire block is written on a disk. Parity is generated in writes and recorded on parity disks and checked on reads. This level of RAID has high reading speeds and is highly efficient.

RAID Level 5
– In this RAID 5 level, three drives are implemented and data block is written on a data disks and parity which is generated from writes is distributed onto three drives and is checked on reads. In the situation of drive failure, the reads can be calculated from the distributed parity and the drive failure is masked from the user. But in situations of single drive failure, the performance of the entire array gets depleted.

RAID Level 6
– This level of raid 6 (corrected as RAID 6), offers fault tolerance in case of two drive failures and still the system functions. In this level, block level stripping is observed along with double distributed parity. In case of data recovery, the time for recovery takes place on the size of the disk drive. Double parity offers additional time for rebuilding the array without the data being at risk if single additional drive fails, while the data recovery through rebuild is happening.

Now come the nested levels of RAID which are also known as Hybrid raid levels, structured by the conjugation of two RAID levels.

RAID Level 10 (1+0)
– In this level of RAID minimum requirement of drives is 4. RAID 10 will also have fault tolerance and will also have redundancy. It will have the splitting of data feature seen in RAID 0 level and will also have mirroring feature seen in raid 1 level. RAID 10 array can recover from multiple and simultaneous hard drive failures and is ideal for high end server applications.

RAID Level 0+1
– This level of RAID requires a minimum of 4 drives and multiple drive failures can be handled by this raid level. This level is a combination of raid 0 and raid 1 level and is used in imaging applications meant file servers. It offers high performance and reduces emphasis on reliability. In the raid 0+1 a second striped set to mirror the primary set is created The array continues to operated with one or more drives failure in mirror set.

RAID 30

RAID 30, also known as RAID 3+0, is a nested RAID level that combines the features of RAID 3 and RAID 0. It offers a balance of performance, storage efficiency, and fault tolerance, making it suitable for environments that require high-speed data access and reliability. RAID 3 uses byte-level striping with a dedicated parity drive, which means data is split into bytes and distributed across multiple drives, while a single drive stores parity information for redundancy. RAID 0, on the other hand, stripes data across multiple RAID 3 arrays to improve performance.

In a RAID 30 setup, you need a minimum of 6 drives: two RAID 3 arrays (each requiring at least 3 drives) are combined using RAID 0. This configuration allows RAID 30 to withstand the failure of one drive in each RAID 3 array without data loss. The striping across RAID 3 arrays enhances read/write speeds, making RAID 30 ideal for applications like video streaming, large databases, and backup systems where both performance and data protection are critical. However, RAID 30 has significant storage overhead due to parity and requires careful management, making it less common in general use cases.

RAID 50

RAID 50, or RAID 5+0, is a nested RAID level that combines the benefits of RAID 5 and RAID 0. It provides a good balance of performance, storage efficiency, and fault tolerance, making it a popular choice for medium to large-scale storage systems. RAID 5 uses block-level striping with distributed parity, meaning data and parity information are spread across all drives in the array. This allows RAID 5 to survive the failure of one drive per array. RAID 0 stripes data across multiple RAID 5 arrays to improve performance.

A RAID 50 setup requires a minimum of 6 drives: two RAID 5 arrays (each requiring at least 3 drives) are combined using RAID 0. This configuration allows RAID 50 to withstand the failure of one drive in each RAID 5 array while delivering faster read/write speeds due to striping. RAID 50 is well-suited for applications like enterprise databases, virtualization, and file servers, where a combination of performance, storage capacity, and fault tolerance is essential. However, like other nested RAID levels, RAID 50 has storage overhead due to parity and can be complex to manage, requiring careful planning and maintenance.

RAID Level 60

RAID 60, also known as RAID 6+0, is a nested RAID level that combines the features of RAID 6 and RAID 0. It provides both high performance and robust data protection, making it ideal for environments that require large storage capacities and fault tolerance. The minimum number of drives required for RAID 60 is 8. RAID 6 requires a minimum of 4 drives (for data + double parity). RAID 0 stripes data across at least 2 RAID 6 arrays. So, 2 RAID 6 arrays x 4 drives each = 8 drives minimum.

It is essentially a striped set of RAID 6 arrays. For example, if you have 16 drives, you could create two RAID 6 arrays (each with 8 drives) and then stripe data across them using RAID 0.

JBOD RAID
– This is an abbreviation of just a bunch of disks and is officially known as spanning. In this JBOD raid level, the array of hard disks is made to appear as a single disk system. But there is no raid level implementation on it, which results in lack of fault tolerance.

RAID Server Failures and RAID Recovery Software

RAID technology is to offer redundancy for the stored data, but sometimes the servers, which are having RAID as protection, can also face failures and data loss issues. This is possible by the following ways-

Hardware conflicts.
Malware attacks caused by software corruption or operating system upgrades.
Addition of incompatible hard drives.
The whole configuration setup gets damaged or corrupted.
Multiple hard drive failures at a time.
Server crashes, which doesn’t remount the array or volumes.
RAID controller failure or when configuration is changed.

Things to consider when RAID fails

When the RAID technology fails in the data center, which is not uncommon, IT administrator must make sure that, they must not take any hasty decision.

In the case of logical problems, the work gets difficult on the analysis of the file system and the correctness in working on it. In the case of Mirrored RAID levels, the data can be mixed and matched to reconstruct a good drive, from the good sectors of the two drives. Data rebuilding can be done from the striped RAID schemes which use parity. This strip level rebuilding of data is easier than the per drive basis. If they are multiple bad sectors in more than one drive, then they have to be corrected individually.

If data goes missing, then the RAID controller will take a disk off-line when it fails and is operated in degraded mode for reconstruction of lost data from a missing disk and that too on demand.

If all this seems to be too tedious then here is a solution. Nowadays, Raid recovery software is available and is being developed and offered by many companies. This ensures that the valuable data in the data centers, protected by RAID technology can be recovered without much struggle, in case of unexpected failures.

Conclusion

Those who are unfamiliar with the concept of RAID technology, it may be sounding like a daunting task to implement it. But the benefits offered by its implementation will surely justify its technology, at the time of disaster recovery.