[DRBD-user] MD Raid-0 over DRBD or DRBD over MD Raid-0

Thu Feb 15 17:20:30 CET 2007

> re: paranoid:
> He is effectively trying to do a RAID10.  That is in theory the most
> reliable of the basic RAID levels. (better than raid3/4/5 for sure.  I
> don't know about raid6, raid50, or raid60.)  In all cases raid is only
> reliable if the system is well monitored and failed disks are rapidly
> replaced.  ie. if you leave a disk in a failed mode for a month you
> have a huge window of vulnerability for a second disk crash bringing
> down the whole raid.
The second disk failure would have to be the partner to the first (as  
you say later), but yes I agree.  With 4+ disks, raid10 is likely to  
be one of the better options.  You could go down the route of raid 15  
if you wanted to be eben more sure - but that's starting to get silly  
for almost all uses.  3 way mirroring would be more sensible (isn't  
there something about >2 nodes in DRBD in the roadmap? *grin*)

> Specifically, he wants to stripe together 16 mirror pairs.  Each
> mirror pair should be extremely reliably if the failed drive is
> rapidly detected, replaced, and resync'ed.  The RAID10 setup would be
> 1/16th as reliable, but in theory that should still be very good.
Striping the mirrored pairs is certainly the most sensible (under the  
majority of circumstances) of the stacked RAID options.  In the case  
of solaris you configure the system as raid 0+1, but it actually runs  
as raid 1+0 underneath (DiskSuite/LVM/whatever it is called this week).

> re: MD not cluster aware.
> I'm assuming the OP wants to have MD itself managed by heartbeat in a
> Active / Passive setup.  If so, you only have one instance of MD
> running at a time.  MD maintains all of its meta-data on the
> underlying disks I believe, so drbd should be replicating the drbd
> meta-data between the drives as required.
If you're running in active/passive, and are willing to do all of the  
logic in your own scripts, then there's no reason why this wouldn't  
work - hell you could run LVM on top of your MD device quite happily.   
DRBD will keep the block devices in sync, and you can do the rest  
yourself.  As long you take the appropriate precautions (make sure the  
MD devices are access SOLELY through drbd, make sure that they are NOT  
auto-initialised/started, etc) this should be fine.

However, the OP was talking about running GFS on the DRBD device, and  
that only really makes sense if you're going dual active.

> If you have a complete disk failure, drbd should initiate i/o shipping
> to the alternate disk, right? So the potential exists to have a
> functioning RAID10 even in the presence of 16 disk failures.  (ie.
> exactly one failure from each of the mirror pairs.)  OTOH, if you lose
> both disks from any one pair, the whole raid10 is failed.
I've had a dual primary setup with one of the mirrors out of sync, so  
it can certainly manage that.  If you're looking at this much data,  
you're going to be looking to source the disks from different  
manufacturers (ideally) or at least different batches.  Not all the  
stories about batches all failing at the same time are Urban Myths.

> What happens (from a drbd perspective) if you only have sector level
> failures on a disk?
That's a configuration option within DRBD itself - what to do on I/O  
errors.  Most people (I would have thought) are going to end up using  
the option to detach from the local copy, and serve the data from the  
remote version.  With adequate monitoring/hotswap, this can then be  
fixed and re-synced.  I believe, hardware willing, totally  
transparently to the higher levels of the system.

The only reasons why this solution would be a problem are if the  
design is supposed to be dual active - or the person setting it up  
didn't cover their options properly.

Graham