[DRBD-user] Trouble getting RAID5 on top of DRBD to work..

Fri May 30 11:15:24 CEST 2008

On Fri, 30 May 2008, Christian Balzer wrote:

>> The thread "drbd8 and 80+ 1TB mirrors/cluster, can it be done?"
>> suggests using RAID5 or RAID6 on top of  DRBD to improve redundancy...
>> A distributed RAID 15/16 :-)
>>
> No, what people were suggesting was to have RAID5 (or really RAID6
> if you use more than 20 or so drives per RAID) native with MD per node
> and DRBD on top of that.

Actually, no. That's RAID 51/61. I was talking about RAID 15/16, because 
in theory it provides better redundancy under random failures (there are 
more disk failure combinations that don't cause node failure). Same reason 
why RAID 10 is used rather than RAID 01.

> Again, the idea is to improve reliability _locally_ first, step by step.

RAID 15/16 improves reliability locally - it would reduce 
requirement to fail over to the backup node compared to RAID 51/61.

> And as far as I know DRBD _always_ has to sit on TOP of the IO layer,
> be it the actual disk or MD/LVM.

Since LVM can sit under or over DRBD I don't think there is a restriction 
on the stacking order.

>> I like this idea so I tried to build it in VMware but I can't get the
>> RAID5 array to assemble on the mirrored node :-/ I hope I'm missing
>> something basic but I could not find any documentation on the web
>> about this. My best guess is that drbd is not synchronizing the RAID
>> super blocks but I don't know how to change this... I know I can flip
>> this setup and use DRBD to mirror a RAID5 array but I would rather
>> mirror first improved reliability and better re-sync characteristics...
>>
> You are _not_ improving reliability with many RAID1s.

I think you'll find that you are. With RAID 51, you can lose up to 1 disk 
per side. With RAID 15 you could lose all disks on one side without even 
needing to fail over to the backup node.

> In your setup
> with 160 total drives statistics and good ole Murphy's law will see to it
> that 2 drives fail at the same time (or at least before you can replace
> and re-sync).

A more reasonable way to assess failure frequency would be according to 
MTBF. Maths tends to give a better ball park figure than Murphy's law. ;)

> The setup (RAID6 with several spares) suggested in the previous thread
> would be such that you never have to re-sync on the DRBD level except for
> a total node failure.
> In your configuration with all those DRBD RAID1s you would nearly
> constantly have some drives gone bad and in "DETACH"ed mode, thus
> resulting in very much reduced read speed. And with the above mentioned
> risk factor.

RAID isn't about speed, it's about fault tolerance, and RAID 15 is more 
fault tolerant than RAID 51.

Gordan