[DRBD-user] Trouble getting RAID5 on top of DRBD to work..

Lars Ellenberg lars.ellenberg at linbit.com
Fri May 30 20:19:10 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, May 30, 2008 at 09:54:05AM -0700, Tim Nufire wrote:
> The order in which you mirror and RAID matters in terms of reliability  
> and failure modes.... And when you are building clusters with 80-160  
> spindles like I am these probabilities matter....
>
> So let's assume the cheap consumer 1TB drives I'm buying have a MTBF of 
> about 3 years... That means I've got 50% chance that a new drive will 
> fail in the next 150 weeks. If the drive failures are evenly distributed 
> over this time then the chance of a single drive failing in any given 
> week is 1/300 or 0.33%. I'm using a week here since it is reasonable to 
> assume that bad drives will be replace within a week of failure.
>
> Now let's look at the RAID models... All percentages are the chance that 
> in a given week I'll have a failure which causes data loss.
>
> Single Disk: 1/300 = 0.33%

Replication link: Xyz % ??

and if you lost the replication link, however briefly,
_all_ your RAID 1 (DRBD) are degraded at once.
now if _any_ two of them already had lost the local disk,
you just lost the RAID6.

and: DRBD still does not support replication groups, so we cannot
guarantee write ordering over the full RAID6 set you want to build on
top of that. if you lose the replication link,
with 40 (80) drbds there is a good chance that some of the component
disks have newer data than others once you try to assemble the RAID6 set
on top of them, because some drbds will notice before the others.

I think I'd set up several smallish local RAID6, and then have one drbd
per logic data group.

BUT.
as the OP described his use case as archive store, which is slowly
created, seldomly accessed and near to never actually accessed, 
you don't need synchronous replication there.
so I'd just skip the DRBD layer,
and replace that with rsync or csync.

> RAID1  - 1 mirror: (1/300)^2 = 1/90000 = 0.001%
>
> RAID10 - 40 mirrors: (1/300)^2 * 40 = 0.04%
> RAID10 - 80 mirrors: (1/300)^2 * 80 = 0.08%
>
> RAID6  - 40 disks: (1/300)^3 * 40 * 39 * 38 = 0.22%
> RAID6  - 80 disks: (1/300)^3 * 80 * 79 * 78 = 1.83%
>
> RAID61 - 40 RAID6 disks mirrored: ((1/300)^3 * 40 * 39 * 38)^2 = 0.0005%
> RAID61 - 80 RAID6 disks mirrored: ((1/300)^3 * 80 * 79 * 78)^2 = 0.03%
>
> RAID16 - RAID6 of 40 RAID1 mirrors: ((1/300)^2)^3 * 40 * 39 * 38 =  
> 0.000000008%
> RAID16 - RAID6 of 80 RAID1 mirrors: ((1/300)^2)^3 * 80 * 79 * 78 =  
> 0.00000007%
>
> My stats are a bit rusty so please double check the above formulas ;-)
>
> Assuming my numbers are right, RAID16 is *much* better than RAID61 ;-)  
> But to my surprise, RAID61 is better than RAID10 in terms of data  
> reliability. That said, I don't think RAID61 is good for my application 
> because the chance of having half the mirror fail is too high.... 
> 1.83%/week means that on average I'll have 1 failure per cluster per year 
> which requires a re-sync of 80TB across the network :-( While RAID10 is 
> less reliable overall, it's failures are always at the RAID1 mirror level 
> and only require me to re-sync 1TB.
>
> What I really want is RAID16 since it has incredible reliability along  
> with the low impact failure modes of RAID1/RAID10... Which brings me  
> back to my original question.... Can I create a RAID6 array on top of  
> DRBD devices. I already have the DRBD mirrors setup and know how to  
> create RAID10 using LVM. I can use mdadm to create a RAID5/6 array on  
> top of DRBD volumes and everything works on the initial node. The  
> problem is that I can't assemble the array on the mirrored node :-/
>
> Tim
>
> On May 30, 2008, at 4:06 AM, drbd at bobich.net wrote:
>
>>
>>
>> On Fri, 30 May 2008, Stefan Seifert wrote:
>>
>>> On Friday, 30. May 2008, drbd at bobich.net wrote:
>>>> I think you'll find that you are. With RAID 51, you can lose up to  
>>>> 1 disk
>>>> per side. With RAID 15 you could lose all disks on one side  
>>>> without even
>>>> needing to fail over to the backup node.
>>>
>>> If you could lose all disks on one side in a RAID 15 without having  
>>> to fail
>>> over, why would you need to failover if one of the RAID 5s in a RAID 
>>> 51 fails
>>> due to two drives failing?
>>
>> That is a fair point, but with RAID 51 you can withstand much fewer  
>> disk failure combinations than with RAID 15.
>>
>>>> RAID isn't about speed, it's about fault tolerance, and RAID 15 is  
>>>> more
>>>> fault tolerant than RAID 51.
>>>
>>> So in your RAID 15 you lose two hard drives of one node and before  
>>> being able
>>> to replace it the other node goes down because of a failing power  
>>> supply or
>>> whatever. Your cluster's down.
>>> On RAID 51 you lose two hard drives of one node and then the other  
>>> node goes
>>> down. Your cluster's down, too. No difference here.
>>
>> You're throwing in PSU failures here which our out of scope for RAID. 
>> It's not a reasonable comparison.
>>
>>> I've played it through with many other cases. In each of them I get  
>>> exactly
>>> the same characteristics. The only difference is wether the RAID 1  
>>> or RAID 5
>>> fails first which makes no difference at all on the cluster's status.
>>
>> It does. With RAID 15 you can tolerate failures of 1/2 of the disks as 
>> long as you don't lose more than one mirror set alltogether.
>>
>> With RAID 51 you can tolerate failure of 1/2 of the disks only if they 
>> are all on the same machine - which won't happen because that machine 
>> will be down from the moment the 2nd disk fails.
>>
>> RAID 15 will yield better uptimes than RAID 51. If I have a moment  
>> I'll post an equation for it.
>>
>> Gordan

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list