Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
The order in which you mirror and RAID matters in terms of reliability and failure modes.... And when you are building clusters with 80-160 spindles like I am these probabilities matter.... So let's assume the cheap consumer 1TB drives I'm buying have a MTBF of about 3 years... That means I've got 50% chance that a new drive will fail in the next 150 weeks. If the drive failures are evenly distributed over this time then the chance of a single drive failing in any given week is 1/300 or 0.33%. I'm using a week here since it is reasonable to assume that bad drives will be replace within a week of failure. Now let's look at the RAID models... All percentages are the chance that in a given week I'll have a failure which causes data loss. Single Disk: 1/300 = 0.33% RAID1 - 1 mirror: (1/300)^2 = 1/90000 = 0.001% RAID10 - 40 mirrors: (1/300)^2 * 40 = 0.04% RAID10 - 80 mirrors: (1/300)^2 * 80 = 0.08% RAID6 - 40 disks: (1/300)^3 * 40 * 39 * 38 = 0.22% RAID6 - 80 disks: (1/300)^3 * 80 * 79 * 78 = 1.83% RAID61 - 40 RAID6 disks mirrored: ((1/300)^3 * 40 * 39 * 38)^2 = 0.0005% RAID61 - 80 RAID6 disks mirrored: ((1/300)^3 * 80 * 79 * 78)^2 = 0.03% RAID16 - RAID6 of 40 RAID1 mirrors: ((1/300)^2)^3 * 40 * 39 * 38 = 0.000000008% RAID16 - RAID6 of 80 RAID1 mirrors: ((1/300)^2)^3 * 80 * 79 * 78 = 0.00000007% My stats are a bit rusty so please double check the above formulas ;-) Assuming my numbers are right, RAID16 is *much* better than RAID61 ;-) But to my surprise, RAID61 is better than RAID10 in terms of data reliability. That said, I don't think RAID61 is good for my application because the chance of having half the mirror fail is too high.... 1.83%/week means that on average I'll have 1 failure per cluster per year which requires a re-sync of 80TB across the network :- ( While RAID10 is less reliable overall, it's failures are always at the RAID1 mirror level and only require me to re-sync 1TB. What I really want is RAID16 since it has incredible reliability along with the low impact failure modes of RAID1/RAID10... Which brings me back to my original question.... Can I create a RAID6 array on top of DRBD devices. I already have the DRBD mirrors setup and know how to create RAID10 using LVM. I can use mdadm to create a RAID5/6 array on top of DRBD volumes and everything works on the initial node. The problem is that I can't assemble the array on the mirrored node :-/ Tim On May 30, 2008, at 4:06 AM, drbd at bobich.net wrote: > > > On Fri, 30 May 2008, Stefan Seifert wrote: > >> On Friday, 30. May 2008, drbd at bobich.net wrote: >>> I think you'll find that you are. With RAID 51, you can lose up to >>> 1 disk >>> per side. With RAID 15 you could lose all disks on one side >>> without even >>> needing to fail over to the backup node. >> >> If you could lose all disks on one side in a RAID 15 without having >> to fail >> over, why would you need to failover if one of the RAID 5s in a >> RAID 51 fails >> due to two drives failing? > > That is a fair point, but with RAID 51 you can withstand much fewer > disk failure combinations than with RAID 15. > >>> RAID isn't about speed, it's about fault tolerance, and RAID 15 is >>> more >>> fault tolerant than RAID 51. >> >> So in your RAID 15 you lose two hard drives of one node and before >> being able >> to replace it the other node goes down because of a failing power >> supply or >> whatever. Your cluster's down. >> On RAID 51 you lose two hard drives of one node and then the other >> node goes >> down. Your cluster's down, too. No difference here. > > You're throwing in PSU failures here which our out of scope for > RAID. It's not a reasonable comparison. > >> I've played it through with many other cases. In each of them I get >> exactly >> the same characteristics. The only difference is wether the RAID 1 >> or RAID 5 >> fails first which makes no difference at all on the cluster's status. > > It does. With RAID 15 you can tolerate failures of 1/2 of the disks > as long as you don't lose more than one mirror set alltogether. > > With RAID 51 you can tolerate failure of 1/2 of the disks only if > they are all on the same machine - which won't happen because that > machine will be down from the moment the 2nd disk fails. > > RAID 15 will yield better uptimes than RAID 51. If I have a moment > I'll post an equation for it. > > Gordan > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user