[DRBD-user] Trouble getting RAID5 on top of DRBD to work..

Tim Nufire drbd-user_tim at ibink.com
Fri May 30 18:54:05 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


The order in which you mirror and RAID matters in terms of reliability  
and failure modes.... And when you are building clusters with 80-160  
spindles like I am these probabilities matter....

So let's assume the cheap consumer 1TB drives I'm buying have a MTBF  
of about 3 years... That means I've got 50% chance that a new drive  
will fail in the next 150 weeks. If the drive failures are evenly  
distributed over this time then the chance of a single drive failing  
in any given week is 1/300 or 0.33%. I'm using a week here since it is  
reasonable to assume that bad drives will be replace within a week of  
failure.

Now let's look at the RAID models... All percentages are the chance  
that in a given week I'll have a failure which causes data loss.

Single Disk: 1/300 = 0.33%

RAID1  - 1 mirror: (1/300)^2 = 1/90000 = 0.001%

RAID10 - 40 mirrors: (1/300)^2 * 40 = 0.04%
RAID10 - 80 mirrors: (1/300)^2 * 80 = 0.08%

RAID6  - 40 disks: (1/300)^3 * 40 * 39 * 38 = 0.22%
RAID6  - 80 disks: (1/300)^3 * 80 * 79 * 78 = 1.83%

RAID61 - 40 RAID6 disks mirrored: ((1/300)^3 * 40 * 39 * 38)^2 = 0.0005%
RAID61 - 80 RAID6 disks mirrored: ((1/300)^3 * 80 * 79 * 78)^2 = 0.03%

RAID16 - RAID6 of 40 RAID1 mirrors: ((1/300)^2)^3 * 40 * 39 * 38 =  
0.000000008%
RAID16 - RAID6 of 80 RAID1 mirrors: ((1/300)^2)^3 * 80 * 79 * 78 =  
0.00000007%

My stats are a bit rusty so please double check the above formulas ;-)

Assuming my numbers are right, RAID16 is *much* better than RAID61 ;-)  
But to my surprise, RAID61 is better than RAID10 in terms of data  
reliability. That said, I don't think RAID61 is good for my  
application because the chance of having half the mirror fail is too  
high.... 1.83%/week means that on average I'll have 1 failure per  
cluster per year which requires a re-sync of 80TB across the network :- 
( While RAID10 is less reliable overall, it's failures are always at  
the RAID1 mirror level and only require me to re-sync 1TB.

What I really want is RAID16 since it has incredible reliability along  
with the low impact failure modes of RAID1/RAID10... Which brings me  
back to my original question.... Can I create a RAID6 array on top of  
DRBD devices. I already have the DRBD mirrors setup and know how to  
create RAID10 using LVM. I can use mdadm to create a RAID5/6 array on  
top of DRBD volumes and everything works on the initial node. The  
problem is that I can't assemble the array on the mirrored node :-/

Tim

On May 30, 2008, at 4:06 AM, drbd at bobich.net wrote:

>
>
> On Fri, 30 May 2008, Stefan Seifert wrote:
>
>> On Friday, 30. May 2008, drbd at bobich.net wrote:
>>> I think you'll find that you are. With RAID 51, you can lose up to  
>>> 1 disk
>>> per side. With RAID 15 you could lose all disks on one side  
>>> without even
>>> needing to fail over to the backup node.
>>
>> If you could lose all disks on one side in a RAID 15 without having  
>> to fail
>> over, why would you need to failover if one of the RAID 5s in a  
>> RAID 51 fails
>> due to two drives failing?
>
> That is a fair point, but with RAID 51 you can withstand much fewer  
> disk failure combinations than with RAID 15.
>
>>> RAID isn't about speed, it's about fault tolerance, and RAID 15 is  
>>> more
>>> fault tolerant than RAID 51.
>>
>> So in your RAID 15 you lose two hard drives of one node and before  
>> being able
>> to replace it the other node goes down because of a failing power  
>> supply or
>> whatever. Your cluster's down.
>> On RAID 51 you lose two hard drives of one node and then the other  
>> node goes
>> down. Your cluster's down, too. No difference here.
>
> You're throwing in PSU failures here which our out of scope for  
> RAID. It's not a reasonable comparison.
>
>> I've played it through with many other cases. In each of them I get  
>> exactly
>> the same characteristics. The only difference is wether the RAID 1  
>> or RAID 5
>> fails first which makes no difference at all on the cluster's status.
>
> It does. With RAID 15 you can tolerate failures of 1/2 of the disks  
> as long as you don't lose more than one mirror set alltogether.
>
> With RAID 51 you can tolerate failure of 1/2 of the disks only if  
> they are all on the same machine - which won't happen because that  
> machine will be down from the moment the 2nd disk fails.
>
> RAID 15 will yield better uptimes than RAID 51. If I have a moment  
> I'll post an equation for it.
>
> Gordan
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list