With traditional and legacy storage systems, Redundant Array of Inexpensive Disks or RAID has been the means for long to ensure that the data is resilient on disk and is able to tolerate the loss of an entire disk in the array or even multiple disks in the array without data loss. However, with the capacities of today’s hard drives growing larger as well as the advancements in software defined storage becoming bigger, the industry is having to reevaluate the way data is protected on disk or across the nodes in a software defined solution. There are generally three latest ways of protecting storage systems:
- Erasure Coding
How does each of these data protection solutions work for storage systems? What are the advantages and disadvantages of each? Let’s take a closer look at the above-mentioned data protection mechanisms commonly implemented in storage systems today.
Redundant Array of Inexpensive Disks (RAID)
Anyone who has been around traditional storage systems or server architecture for the past few decades is familiar with Redundant Array of Inexpensive Disks or RAID. RAID allows for grouping a relatively small number of disks in a storage group and by using parity information spread across the disks, is able to recreate data in the event of a disk failure. There are various levels of RAID commonly implemented today that are able to provide a good mix of performance and reliability which includes:
- RAID1 – This is known as a “disk mirror”. Needs at least (2) drives. (1X) write penalty
- RAID5 – Needs at least (3) drives to create a RAID5 array. Can withstand (1) drive lost. (4x) write penalty.
- RAID6 – Needs at least (4) disks to create a RAID6 array. Can withstand (2) drives lost. (5x) write penalty.
- RAID10 – Needs at least (4) disks. Combines striping and mirroring. Has no parity and is a simple replica of the write.
RAID technology includes parity that is also implemented with CRC error correction. This helps to ensure that there is no data loss due to corruption. For performance in today’s RAID arrays, some sort of “cache” is generally utilized to sit in front of the RAID array and allows “write” operations to be confirmed before RAID parity calculations are performed. It is interesting to note however that the RAID controller itself can become both a bottleneck as well as an extra point of failure.
Another consideration when thinking about today’s modern RAID arrays is the size of the disks. The available sizes of modern disks have grown exponentially. There are consumer drives today topping out at a whopping 12TB! This is extremely important when considering RAID arrays as it directly affects the rebuild times when disks fail. In days gone by, RAID arrays with much smaller disk member sizes could rebuild a failed disk in minutes to hours. However, with multi-terabyte drives, rebuild times can potentially be measured in days or even weeks!
These excessively long rebuild times are unacceptable in most enterprise data center environments. During the rebuild time, you are even more exposed to another drive failure or even data corruption as the RAID array during a rebuild is referred to as in a degraded state. In the degraded RAID array, performance is also impacted as other drives are being queried for data to successfully rebuild the failed drive.
RAID certainly has been a staple for the enterprise datacenter for the past few decades. However, it is clear that as drive capacities increase and as newer software defined workloads are utilized in the datacenter, RAID as a data protection technology for storage systems is becoming less practical and does not scale very well. Let’s look at the next data protection technology for protecting storage systems and data – erasure coding.
Erasure Coding for Storage System Data Protection
If you have delved into the world of software defined storage, you have most likely run across the term – erasure coding. What is erasure coding and how does it work? The term “erasure code” refers to any scheme of encoding and partitioning data into fragments that allows you to recover data even when a few fragments are missing. Not to confuse the comparison, but RAID in itself is a type of erasure code. The RAID5 parity is an erasure code that is understood with bit parity.
Many of the well-known erasure codes fall under Reed-Solomon error-correcting codes that were developed by Irving S. Reed and Gustave Soloman in 1960. The error-correcting codes use polynomial calculations under binary arithmetic (XOR) to calculate the erasure code that allows reconstructing data.
With erasure coding, data is generally coded with a 10/16 ratio. This amounts to every 10 bits of data being encoded into 16 bits. This allows losing 6 parts of the data before it is unrecoverable. Erasure coding generally encodes all data, meaning that any part of the remaining data can be used to recover missing data.
Erasure coding as we are describing here is generally implemented with what is called Redundant Array of Independent Nodes or RAIN. So, here we are moving beyond a simple RAID array inside a single host and thinking more in terms of scale out or software defined systems which span multiple hosts. Software defined storage solutions like VMware vSAN employ erasure coding to protect data between nodes.
Erasure coding is much better suited for scale out systems, however, it does come with penalties in CPU overhead and disk writes. These traditional limitations with erasure coding have been offset by the power of today’s modern CPUs that include instruction sets like SSSE3 and AVX2 that make the erasure code operations with today’s systems extremely efficient. One of the main benefits of erasure coding is, the better space efficiency as opposed to replication. These space efficiency benefits come at the price of write amplification, however.
Replication for Data Protection
In the context of protecting storage systems, replication is synchronous to every write operation that creates data copies across different locations of the storage system. Replication can be deployed in a RAIN architecture as was mentioned earlier. If data loss happens, it can be recreated from another replica copy. Replication includes advantages of its own including being less CPU intensive than erasure coding and faster rebuilds. Write operations are simple and read performance can be boosted since we can pull from more than one location. Replication requires at least 2X the space. The space costs can be offset to some degree with compression and deduplication while using replication for storage system protection.
Today’s high performance and software defined storage workloads found in the latest enterprise data centers require looking beyond traditional techniques to protect data in storage systems. While RAID has proven to be a reliable means of data protection, it has been outpaced by ever-growing hard disk sizes and the needs of scaling workloads between multiple nodes or even between sites. Looking at the differences between RAID, erasure coding, and replication, you can see there are various advantages and disadvantages of each data protection technology for storage systems. Organizations today have to look at their specific workloads and make use of data protection technology that fits the specific use case. Regardless of the data protection utilized at the storage layer, you still must consider using a data protection software solution that is able to protect production workloads running on top of the technologies mentioned in this article. This allows you to withstand data loss as a result of disaster events that are not hardware related, such as ransomware attacks. Make use of very capable data protection solutions such as Vembu BDR Suite to protect workloads using 3-2-1 backup methodologies. This allows you to maintain business-continuity regardless of the disaster event.
Experience modern data protection with this latest Vembu BDR Suite v.3.8.0 FREE edition. Try the 30 days free trial here: https://www.vembu.com/vembu-bdr-suite-download/