Skip to main content

RAID levels and benefits explained

Can you explain the different RAID levels and their pros and cons? What are the benefits of using more than the minimum drives -- is it additional capacity, more drive loss, fault tolerance? Under what circumstances can RAID types, specifically RAID 10, survive multiple simultaneous drive failures? Can you also provide some tips for the rebuild process for failed drives in RAID 10?

We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.X

SearchStorage

ALPHASPIRIT - FOTOLIA

ALPHASPIRIT - FOTOLIA

Q

PROBLEM SOLVE

RAID levels and benefits explained

4

RAID can enable better storage performance and higher availability, and there are many different RAID types. Read about the different levels of RAID and where they work best.

Erin Sullivan and Christopher Poelker

Editor's Note: This content was expanded and updated in October 2017.

Can you explain the different RAID levels and their pros and cons? What are the benefits of using more than the minimum drives -- is it additional capacity, more drive loss, fault tolerance? Under what circumstances can RAID types, specifically RAID 10, survive multiple simultaneous drive failures? Can you also provide some tips for the rebuild process for failed drives in RAID 10?

RAID used to stand for redundant array of inexpensive disks. Today, the term has been updated to redundant array of independent disks.

RAID is a way of grouping individual physical drives together to form one bigger drive called a RAID set. The RAID set represents all the smaller physical drives as one logical disk to your server. The logical disk is called a logical unit number, or LUN.

Using RAID has two main advantages: better performance and higher availability, which means it goes faster and breaks down less often.

Benefits of RAID

The primary benefits of using RAID are performance improvements, resiliency and low costs. Performance is increasedbecause the server has more spindles to read from or write to when data is accessed from a drive. Availability and resiliency are increased because the RAID controller can recreate lost data from parity information.

Parity is basically a checksum of the data that was written to the disks, which gets written along with the original data. RAID can be done in software on a host, such as Windows FTDISK volumes, or in hardware on the storage controllers. The server accessing the data on a hardware-based RAID set never knows that one of the drives in the RAID set went bad. The controller recreates the data that was lost when the drive goes bad by using the parity information stored on the surviving disks in the RAID set.

Standard vs. nonstandard RAID levels

The expansive number of RAID levels can be broken into three categories: standard, nonstandard and nested. Standard levels of RAID are made up of the basic types of RAID numbered 0 through 6.

A nonstandard RAID level is set to the standards of a particular company or open source project. Nonstandard RAID includes RAID 7, adaptive RAID, RAID S and Linux md RAID 10.

Nested RAID refers to combinations of RAID levels, such as RAID 01 (RAID 0+1), RAID 03 (RAID 0+3) and RAID 50 (RAID 5+0).

RAID levels explained

The RAID level you should use depends on the type of application you are running on your server. RAID 0 is the fastest, RAID 1 is the most reliable and RAID 5 is a good combination of both. The best RAID for your organization may depend on the level of redundancy you're looking for, the length of your retention period, the number of disks you're working with and the importance you place on data protection versus performance optimization.

Below is a description of the different RAID levels that are most commonly used in SAN storage arrays. Not all storage array vendors support all the various RAID types, so be sure to check with your vendor for the types of RAID that are available with their storage.

RAID 0: RAID 0 is called disk striping. All the data is spread out in chunks across all the disks in the RAID set. RAID 0 has great performance because you spread the load of storing data onto more physical drives. There is no parity generated for RAID 0. Therefore, there is no overhead to write data to RAID 0 disks.

RAID 0 is only good for better performance, and not for high availability, since parity is not generated for RAID 0 disks. RAID 0 requires at least two physical disks.

RAID 1: In RAID 1, also called disk mirroring, all the data is written to at least two separate physical disks. The disks are essentially mirror images of each other. If one disk fails, the other can be used to retrieve data.

Disk mirroring is good for very fast read operations. It's slower when writing to the disks, since the data needs to be written twice. RAID 1 requires at least two physical disks.

RAID 1+0: RAID 1+0, which is also called RAID 10, uses a combination of disk mirroring and disk striping. The data is normally mirrored first, and then striped. Mirroring striped sets accomplishes the same task, but is less fault-tolerant than striping mirror sets.

If you lose a drive in a stripe set, all access to data must be from the other stripe set because stripe sets have no parity. RAID 1+0 requires a minimum of four physical disks.

RAID 2: Using a Hamming code, RAID 2stripes data at the bit level. These days, Hamming codes are already used in the error correction codes of hard drives, so RAID 2 is no longer used.

RAID 3: RAID 3 uses something called a parity disk to store the parity information generated by the RAID controller on a separate disk from the actual data disks, instead of striping it with the data, as in RAID 5.

This RAID type performs poorly when there are a lot of requests for data, as in an application such as a database. RAID 3 performs well with applications that require one long sequential data transfer, such as video servers. RAID 3 requires a minimum of three physical disks.

RAID 4: RAID 4 uses a dedicated parity disk, as well as block-level striping across disks. While it is good for sequential data access, the use of a dedicated parity disk can cause performance bottlenecks for write operations. With alternatives such as RAID 5 available, RAID 4 is not used much.

RAID 5: RAID 5 uses disk striping with parity. The data is striped across all the disks in the RAID set, along with the parity information needed to reconstruct the data in case of disk failure.

RAID 5 is the most common method used because it achieves a good balance between performance and availability. RAID 5 requires at least three physical disks.

RAID 6: RAID 6 increases reliability by utilizing two parity stripes, which allow for two disk failures within the RAID set before data is lost. RAID 6 is seen in SATA environments, and solutions that require long data retention periods, such as data archiving or disk-based backup.

Adaptive RAID: Adaptive RAID lets the RAID controller figure out how to store the parity on the disks. It chooses between RAID 3 and RAID 5, depending on which RAID set type will perform better with the type of data being written to the disks.

RAID 7: RAID 7 is a nonstandard RAID level -- based on RAID 3 and RAID 4 -- that requires proprietary hardware. This RAID level is owned and trademarked by the now-defunct Storage Computer Corp.

Minimum drives and rebuilds for RAID levels

Regarding the question about the benefit of using more disks in a RAID set than the minimum, the answer is that you get more available storage and more actuators or spindles for the OS to use.

Most RAID arrays use a maximum of 16 drives within a RAID set due to higher overhead and diminishing returns in performance when exceeding that many drives. Up to eight drives seems to be a good rule of thumb for RAID 5 and RAID 10. If you need more space, you can create another RAID set with the other disks.

As another rule of thumb, try to keep different workload data types on separate RAID sets. You can use RAID 10 for best performance everywhere, but most budgets dictate the use of RAID 5 for database data volumes, with RAID 1 or RAID 10 used on database log volumes. The database volumes can be highly random I/O, and the logs tend to be sequential in nature.

Rebuild times depend on the kind of RAID. If you are using software-based RAID, then more spindles within the group means longer rebuild times. If it's hardware-based RAID, rebuild times are usually dictated by the size of the drives themselves, since the hardware usually does the sparing in and out of the set. A 146 GB drive takes longer to rebuild than a 73 GB drive.

How RAID is used today

Many experts say the need for RAID technology has shrunk. Erasure coding and solid-state drives have presented reliable -- if more expensive -- alternatives, and as storage capacity increases, the chance of RAID array errors increases, as well.

Erin Sullivan and Christopher Poelker asks:

Are you still using RAID in your storage environment or have you implemented erasure coding?

Join the Discussion

However, if RAID is dead, there are some major storage vendors that have not gotten the memo. Dell EMC recently released the Unity platform, which supports multiple RAID levels, and IBM and Intel have also released products that support and boost RAID performance.

Next Steps

Can RAID technology benefit object storage?

Supplement RAID with storage virtualization

How data growth affects the RAID and erasure coding battle

This was last published in October 2017

Related Resources

Performance Expectations of Hybrid Cloud NAS–PanzuraA Buyer's Guide to Hybrid Cloud NAS–PanzuraThink you have a NAS solution? Think Again–IBMWhy NAS File Migrations Don't Have to Be a Nightmare Anymore–Data DynamicsVIEW MORE

Dig Deeper on NAS devices

ALLNEWSGET STARTEDEVALUATEMANAGEPROBLEM SOLVE

Qumulo storage parts the clouds with its K-Series active archive

network-attached storage (NAS)

Server maker IXsystems sets sail with new TrueNAS flagship

Qumulo P-Series packs NVMe into scale-out NAS

Load More

Join the conversation

 Send me notifications when other members comment.Add My Comment

Michael Tidmarsh- 22 Sep 2014 1:07 PM

Even though I do think that erasure coding does offer better data protection, it's too costly so I'll stick with RAID

 SANdummy- 10 Nov 2014 7:00 PM

Are you still using RAID in your storage environment or have you implemented erasure coding?

 VasudevanSanjeevee- 10 Sep 2016 6:16 AM

i need to refer top to bottom of RAID in VMware environment ..What to do ?

 Surferboy- 10 Dec 2017 8:11 AM

I have no idea how a RAID is configured. So, can I have a 4 bay NAS and insert one drive into the later, insert another drive of the same size. Then later again insert two more drives of a different size and end up with a RAID 5 system.

Comments

Popular posts from this blog

Boot process hangs at dracut: Switching root

Environment Red Hat Enterprise Linux 6 Issue When server is booting the boot process hangs at  dracut: Switching root , and never displays anything else. Raw device-mapper: ioctl: 4.33.1-ioctl (2015-8-18) initialised: xx-xxxx@redhat.com udev: starting version 147 dracut: Starting plymouth daemon dracut: rd_NO_DM: removing DM RAID activation dracut: rd_NO_MD: removing MD RAID activation scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc120 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc128 irq 15 Refined TSC clocksource calibration: 2599.999 MHz. virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 virtio-pci 0000:00:07.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:08.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, high) -> IRQ 11 input: ImExPS/2 G...

Interpreting the output of lspci

On Linux, the lspci command lists all PCI devices connected to a host (a computer). Modern computers and PCI devices communicate with each other via PCI Express buses instead of the older Conventional PCI and PCI-X buses since the former buses offer many advantages such as higher throughput rates, smaller physical footprint and native hot plugging functionality. The high performance of the PCI Express bus has also led it to take over the role of other buses such as AGP ; it is also expected that SATA buses too will be replaced by PCI Express buses in the future as solid-state drives become faster and therefore demand higher throughputs from the bus they are attached to (see this article for more on this topic). As a first step, open a terminal and run lspci without any flags (note: lspci may show more information if executed with root privileges): lspci   This is the output I get on my laptop: 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRA...

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows Description NetBackup SAN Client is supported on the Linux , Solaris, Windows, HP-UX and AIX operating systems.  These environments provide the initiator device driver which can login to the SAN client media server and mount an pseudo   target device “ARCHIVE PYTHON” so that the backup or restore can be use the fiber transport (FT).  If there is an issue in the SAN environment, it is necessary to get the information/statistics from the SAN fabric for analysis.  The commands below can be used, on the respective operating system, to gather the necessary information. If the outputs show many or steadily increasing error counts, that indicates one or more issues with  the fabric  infrastructure. The issue(s) can be caused by cabling, SFP, san switch, DWDM, HBA or ISL and those components will need to be analyzed and...