Sunday 3 December 2023

Redundant Array of Independent Disks (RAID)

RAID, stands for "Redundant Array of Independent Disks," is a way of using many disks together to do things better. It puts data on different disks, making things faster because it can do many things at once. Even if one disk fails, the information on the other disks can be used to put the lost data back together.

 


The most common RAID types are

1. RAID 0 (striping),

2. RAID 1 (mirroring) and its variants,

3. RAID 5 (distributed parity), and

4. RAID 6 (dual parity)

 

RAID 0 (Striping)

Data striping is a way of spreading information across many disks or storage gadgets. It make data accessing faster because it lets different disks be used at the same time.

 

For instance, assume your computer having 4 disks, and imagine a file called a.txt divided into 9 parts (a1, a2, a3, a4, a5, a6, a7, a8, and a9). These parts would be stored one after another on all four disks. You can think of it like this in the diagram below.



In RAID-0, there's no backup for the data, so if one disk fails, you can't get the whole file back. RAID-0 makes data go faster because it reads and writes at a speed that's n times as high as a single drive, but with no data redundancy.

 

RAID 1

RAID 1 maintains exact copy of the data on two or more disks. In general, a mirrored image is a classic example of Raid 1. A mirrored image contain two disks, each disk contain same content. This setup doesn't do any fancy things like parity or spreading data across multiple disks. Instead, it simply copies the same data onto all the disks in the group. The size of the whole setup can only be as big as the smallest disk in it.

 


 

RAID-1 keeps working as long as there is at least one working drive in the group.

 

RAID 4 (Block level striping and parity disk)

RAID 4 consists of block-level striping with a dedicated parity disk.

 

 

RAID 4 is good at quickly finding and reading data from random places. However, it's not so fast at writing data randomly because it has to put all the parity information on one disk.

 

To understand how a parity block gets back lost data, think of having three hard disks. Two of them store data, and the third stores the parity. The parity block is made by doing simple X-OR operations on the data blocks. You can see this in the table below.

 

Data block on Disk1

Data block on Disk2

Parity block on Disk3

1010

0101

1111

1111

1100

0011

0000

1011

1011

0011

1101

1110

 

Parity for the first row data blocks

1010 (X-OR) 0101 = 1111

 

Parity for the second row data blocks

1111 (X-OR) 1100 = 0011

 

Parity for the third row data blocks

0000 (X-OR) 1011 = 1011

 

Parity for the fourth row data blocks

0011 (X-OR) 1101 = 1110

 

Scenario : Assume disk 2 crashes

To recover the data of Disk2, we can apply X-OR operation on data block on disk 1 with corresponding parity block on Disk3.

 

Block 1 of Disk2 = block1 of disk1 (X-OR) parity block1 of disk3

                         = 1010 (X-OR) 1111

                        = 0101

 

Block 2 of Disk2 = block2 of disk1 (X-OR) parity block2 of disk3

                         = 1111 (X-OR) 0011

                        = 1100

 

Block 3 of Disk2 = block3 of disk1 (X-OR) parity block3 of disk3

                         = 0000 (X-OR) 1011

                        = 1011

 

Block 4 of Disk2 = block4 of disk1 (X-OR) parity block4 of disk3

                         = 0011 (X-OR) 1110

                        = 1101

 

 

RAID 5 (distributed parity)

RAID 5 uses something called distributed parity to protect against problems. Unlike RAID 4, where all the parity information is put on one drive, in RAID 5, this information is spread out among all the drives.




RAID6

In RAID 5, if one disk breaks, the data is still safe because it has a special block called parity. But RAID 6 is even safer because it can handle two disks failing at the same time, by having two parity blocks.

 

However, RAID 6 doesn't use space as efficiently as RAID 5 because it uses two parity blocks, which means some storage is used just for safety. This does make it more secure, though.

 

When it comes to how fast it can write data, RAID 6 is a bit slower than RAID 5 because it has to do more calculations with the extra parity. But when it comes to reading data, both RAID 5 and RAID 6 are usually about the same in terms of speed.

 

Is RAID replace backup plan?

No, using RAID doesn't mean you can skip having a backup plan. RAID helps keep your data safe if a disk breaks, but it can't protect against other issues like mistakes, software problems, or viruses. A good backup plan should cover all these possibilities, and RAID can be a part of that plan.

 

                                                             System Design Questions

No comments:

Post a Comment