Every article that i posted related to dba only, here is a diversion .Here i am giving a good notes about RAID. As we are using raids in different purpose in IT industry , a dba should know atleast the basic functioning of RAID.Hope it will helps to somebody. :)
A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an term for data storage schemes that divide and/or replicate data among multiple hard drives. RAID can be designed to provide increased data reliability or increased I/O performance, though one goal may compromise the other. Raid provide fault tolerance by stripping , mirroring or parity .
RAID can be implemented on a hardware or software level. On a hardware level, you can have hard disks connected to a RAID hardware controller, usually a special PC card. Your operating system then accesses storage through the RAID hardware controller. Alternatively, you can implement RAID as a software controller,IN a software RAID controller ,special program manage access to hard disks treated as RAID devices. The software version lets you use IDE hard disks as RAID disks. Linux uses the MD driver, supported in the 2.4 kernel, to implement a software RAID controller. Linux software RAID supports five levels (linear, 0, 1, 4, 5, and 6), whereas hardware RAID supports many more. Hardware RAID levels, such as 7–10, provide combinations of greater performance and reliability. Before you can use RAID on your system, make sure the RAID levels you want to use. If not, you will have for the kernel. Check the Multi-Driver Support component specify support for any or all of the RAID levels.
Note: we will get redundancy only when parity or mirroring is present on RAID.
Commonly used RAID levels for UNIX / Linux and Windows server
This level is achieved by grouping 2 or more hard disks into a single unit with the total size equaling that of the smallest of disks used. This is because RAID 1 keeps every bit of data replicated on each of its devices in the exactly same fashion, create identical clones. Hence the name, mirroring. Practical example: 2 disks, each 80GB in size can be used in a 80GB RAID 1 configuration.
Because of its configuration, RAID 1 reduced write performance, as every chunk of data
has to be written n times, on each of the paired devices. The read performance is identical to single disks. Redundancy is improved, as the normal operation of the system can be maintained as long as any one disk is functional. Of Course, the disks must be of equal size. If one disk is larger than another, your RAID device will be the size of the smallest disk.
If there are spare disks available, and if the system survived the crash, reconstruction of the mirror will immediately begin on one of the spare disks, after detection of the drive fault.
RAID 1 is suitable for data storage, especially with non-intensive I/O tasks.
RAID4 (striping with parity)
This RAID level is not used very often. It can be used on three or more disks. Instead of completely mirroring the information, it keeps parity information on one drive, and writes data to the other disks in a RAID-0 like way .If one of the data drives in array fails, the parity information can be used to reconstruct all data. However if more than one disks fail, whole of the data is lost.
The reason this level is not more frequently used, is because the parity information is kept on one drive. This information must be updated every time one of the other disks are written to. Thus, the parity disk will become a bottleneck, if it is not a lot faster than the other disks. However, if you just happen to have a lot of slow disks and a very fast one, this RAID level can be very useful.
In this configuration 25% of the combined dik space is used to store the parity information
RAID 10
IDE Drives
Serial ATA Drives
SCSI Drives
A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an term for data storage schemes that divide and/or replicate data among multiple hard drives. RAID can be designed to provide increased data reliability or increased I/O performance, though one goal may compromise the other. Raid provide fault tolerance by stripping , mirroring or parity .
RAID can be implemented on a hardware or software level. On a hardware level, you can have hard disks connected to a RAID hardware controller, usually a special PC card. Your operating system then accesses storage through the RAID hardware controller. Alternatively, you can implement RAID as a software controller,IN a software RAID controller ,special program manage access to hard disks treated as RAID devices. The software version lets you use IDE hard disks as RAID disks. Linux uses the MD driver, supported in the 2.4 kernel, to implement a software RAID controller. Linux software RAID supports five levels (linear, 0, 1, 4, 5, and 6), whereas hardware RAID supports many more. Hardware RAID levels, such as 7–10, provide combinations of greater performance and reliability. Before you can use RAID on your system, make sure the RAID levels you want to use. If not, you will have for the kernel. Check the Multi-Driver Support component specify support for any or all of the RAID levels.
Note: we will get redundancy only when parity or mirroring is present on RAID.
RAID 0 (Striping)
This level is achieved by grouping 2 or more hard disks into a single unit with the total size equaling that of all disks used. Practical example: 3 disks, each 80GB in size can be used in a 240GB RAID 0 configuration. You can also use disk having different sizes.
RAID 0 works by breaking data into fragments and writing to all disk simultaneously. Read and write operations are improved as compared to single disk system, since the load is shared across many channel and are done in parallel on the disks.On the other hand, no single disk contains the entire information for any bit of data committed. This means that if one of the disks fails, the entire RAID is rendered inoperable, with unrecoverable loss of data.
However since there is no redundancy, it doesn't provide fault tolerance. If even one drive fails, data across the drive array will be lost.
RAID 0 is suitable for non-critical operations that require good performance, like the system partition or the /tmp partition where lots of temporary data is constantly written. It is not suitable for data storage.
Suggested application:- Video Production and Editing , Image Editing etc
RAID 1 (Mirroring)
This level is achieved by grouping 2 or more hard disks into a single unit with the total size equaling that of the smallest of disks used. This is because RAID 1 keeps every bit of data replicated on each of its devices in the exactly same fashion, create identical clones. Hence the name, mirroring. Practical example: 2 disks, each 80GB in size can be used in a 80GB RAID 1 configuration.
Because of its configuration, RAID 1 reduced write performance, as every chunk of data
has to be written n times, on each of the paired devices. The read performance is identical to single disks. Redundancy is improved, as the normal operation of the system can be maintained as long as any one disk is functional. Of Course, the disks must be of equal size. If one disk is larger than another, your RAID device will be the size of the smallest disk.
If there are spare disks available, and if the system survived the crash, reconstruction of the mirror will immediately begin on one of the spare disks, after detection of the drive fault.
RAID 1 is suitable for data storage, especially with non-intensive I/O tasks.
This RAID level is not used very often. It can be used on three or more disks. Instead of completely mirroring the information, it keeps parity information on one drive, and writes data to the other disks in a RAID-0 like way .If one of the data drives in array fails, the parity information can be used to reconstruct all data. However if more than one disks fail, whole of the data is lost.
The reason this level is not more frequently used, is because the parity information is kept on one drive. This information must be updated every time one of the other disks are written to. Thus, the parity disk will become a bottleneck, if it is not a lot faster than the other disks. However, if you just happen to have a lot of slow disks and a very fast one, this RAID level can be very useful.
RAID 5 (Stripping with distributed parity)
Raid 5 support both striping as well as redundancy in terms of parity. RAID 5 improves on RAID 4 by striping the parity data between all the disks in the RAID set. RAID-5 can be used on three or more disks, with zero or more spare-disks.
This is a more complex solution, with a minimum of three devices used. If one of the RAID 0 devices malfunctions, the array will continue operating, using the parity device as a backup. If spare disks are available, reconstruction will begin immediately after the device failure. If two disks fail simultaneously, all data are lost.Like raid 4 RAID-5 can survive one disk failure, but not two or more.
RAID 5 improves the write performance than raid 4, as well as redundancy and is useful in mission-critical scenarios, where both good throughput and data integrity are important.
In this configuration 25% of the combined dik space is used to store the parity information
And around 75% of the total disk capacity is available for use.
Linear RAID
This is a less common level, although fully usable. Linear is similar to RAID 0, except that data is written sequentially rather than in parallel. Linear RAID is a simple grouping of several devices into a larger volume, the total size of which is the sum of all members. For instance, three disks the sizes of 40, 60 and 250GB can be grouped into a linear RAID the total size of 350GB.
Linear RAID provides no read/write performance, not does it provide redundancy; a loss of any member will render the entire array unusable. It merely increases size. It's very similar to LVM. Linear RAID is suitable when large data exceeding the individual size of any disk or partition must be used.
RAID6
RAID-6 is an extension of RAID-5 to provide additional fault tolerance by using dual distributed parity schemes. Dual parity scheme helps survive two disk failures at a time without data loss. It extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks distributed across all member disks. The read performance is same as RAID 5. However, its write performance is poorer than RAID 5 due to overhead associated with the additional parity calculations. But it does better than RAID 5 on file data protection because RAID 6 provides protection against double disk failures and failures while a single disk is rebuilding. RAID 5 fails to do so.
RAID 10 or RAID 1+0 - Combination of RAID 0 (data striping) and RAID 1 (mirroring).
- RAID 10 is also called as RAID 1+0
- It is also called as “stripe of mirrors”
- It requires minimum of 4 disks
- To understand this better, group the disks in pair of two (for mirror). For example, if you have a total of 6 disks in RAID 10, there will be three groups–Group 1, Group 2, Group 3 as shown in the above diagram.
- Within the group, the data is mirrored. In the above example, Disk 1 and Disk 2 belongs to Group 1. The data on Disk 1 will be exactly same as the data on Disk 2. So, block A written on Disk 1 will be mirroed on Disk 2. Block B written on Disk 3 will be mirrored on Disk 4.
- Across the group, the data is striped. i.e Block A is written to Group 1, Block B is written to Group 2, Block C is written to Group 3.
- This is why it is called “stripe of mirrors”. i.e the disks within the group are mirrored. But, the groups themselves are striped.
RAID 01
- RAID 01 is also called as RAID 0+1
- It is also called as “mirror of stripes”
- It requires minimum of 3 disks. But in most cases this will be implemented as minimum of 4 disks.
- To understand this better, create two groups. For example, if you have total of 6 disks, create two groups with 3 disks each as shown below. In the above example, Group 1 has 3 disks and Group 2 has 3 disks.
- Within the group, the data is striped. i.e In the Group 1 which contains three disks, the 1st block will be written to 1st disk, 2nd block to 2nd disk, and the 3rd block to 3rd disk. So, block A is written to Disk 1, block B to Disk 2, block C to Disk 3.
- Across the group, the data is mirrored. i.e The Group 1 and Group 2 will look exactly the same. i.e Disk 1 is mirrored to Disk 4, Disk 2 to Disk 5, Disk 3 to Disk 6.
- This is why it is called “mirror of stripes”. i.e the disks within the groups are striped. But, the groups are mirrored.
Main difference between RAID 10 vs RAID 01
- Performance on both RAID 10 and RAID 01 will be the same.
- The storage capacity on these will be the same.
- The main difference is the fault tolerance level. On most implememntations of RAID controllers, RAID 01 fault tolerance is less. On RAID 01, since we have only two groups of RAID 0, if two drives (one in each group) fails, the entire RAID 01 will fail. In the above RAID 01 diagram, if Disk 1 and Disk 4 fails, both the groups will be down. So, the whole RAID 01 will fail.
- RAID 10 fault tolerance is more. On RAID 10, since there are many groups (as the individual group is only two disks), even if three disks fails (one in each group), the RAID 10 is still functional. In the above RAID 10 example, even if Disk 1, Disk 3, Disk 5 fails, the RAID 10 will still be functional.
- So, given a choice between RAID 10 and RAID 01, always choose RAID 10.
Specially built hardware-based RAID disk controllers are available for both IDE and SCSI drives. They usually have their own BIOS, so you can configure them right after your system's the power on self test (POST). Hardware-based RAID is transparent to your operating system; the hardware does all the work.
If hardware RAID isn't available, then you should be aware of these basic guidelines to follow when setting up software RAID.
To save costs, many small business systems will probably use IDE disks, but they do have some limitations.
- The total length of an IDE cable can be only a few feet long, which generally limits IDE drives to small home systems.
- IDE drives do not hot swap. You cannot replace them while your system is running.
- Only two devices can be attached per controller.
- The performance of the IDE bus can be degraded by the presence of a second device on the cable.
- The failure of one drive on an IDE bus often causes the malfunctioning of the second device. This can be fatal if you have two IDE drives of the same RAID set attached to the same cable.
For these reasons, I recommend you use only one IDE drive per controller when using RAID, especially in a corporate environment.
Serial ATA Drives
Serial ATA type drives are rapidly replacing IDE drives as the preferred entry level disk storage option because of a number of advantages:
- The drive data cable can be as long as 1 meter in length versus IDE's 18 inches.
- Serial ATA has better error checking than IDE.
- There is only one drive per cable which makes hot swapping, or the capability to replace components while the system is still running, possible without the fear of affecting other devices on the data cable.
- There are no jumpers to set on Serial ATA drives to make it a master or slave which makes them simpler to configure.
- IDE drives have a 133Mbytes/s data rate whereas the Serial ATA specification starts at 150 Mbytes/sec with a goal of reaching 600 Mbytes/s over the expected ten year life of the specification.
If you can't afford more expensive and faster SCSI drives, Serial ATA would be the preferred device for software and hardware RAID
SCSI hard disks have a number of features that make them more attractive for RAID use than either IDE or Serial ATA drives.
- SCSI controllers are more tolerant of disk failures. The failure of a single drive is less likely to disrupt the remaining drives on the bus.
- SCSI cables can be up to 25 meters long, making them suitable for data center applications.
- Much more than two devices may be connected to a SCSI cable bus. It can accommodate 7 (single-ended SCSI) or 15 (all other SCSI types) devices.
- Some models of SCSI devices support "hot swapping" which allows you to replace them while the system is running.
- SCSI currently supports data rates of up to 640 Mbytes/s making them highly desirable for installations where rapid data access is imperative.
No comments:
Post a Comment