[DRBD-user] (off topic) alternative to drbd

Thu Feb 2 11:40:26 CET 2012

On Wed, 01 Feb 2012 22:24:15 -0500 (EST), "John Lauro"
<john.lauro at covenanteyes.com> wrote:
> I apologize for the off topic discussion.  If there is a more general
list
> about the dozens of network storage options please point me in the right
> direction.
> 
>  
> 
> I am testing various network redundancy options and failover options,
and
> throwing out an option for comment that I haven't really seen talked
about
> anywhere (here or elsewhere) as a viable configuration.  Only mentioning
> it because I did some tests in the lab and it worked better than I
> expected given the lack of it being mentioned as an option anywhere. 
The
> test consisted of two storage servers (I used NFS, but iSCSI, etc.
should
> work if someone wants to replicate), and a compute server that housed
VM's
> storage on the NFS servers.
> 
>  
> 
> Here is the twist.  instead of using drbd, or some other mechanism to
> provide the redundant storage, I setup a software RAID 1 inside the
guest
> with two different virtual disks on different NFS servers.  After all
> setup and running fine, I killed one of the NFS servers (technically
just
> blocked it with iptables) and the guest would freeze on some I/O for
> awhile.  At that time, cached I/O was fine so it was fine if you stuck
to
> common directories, etc., and sessions to non cached data would freeze,
> but you could open a new session just fine.  After a bit (maybe two
> minutes, but probably configurable somewhere) the software RAID disabled
> the failed drive and all was well within the vm to any processes.  The
> only noticible problem was the momentary hang of a few processes until
the
> drive was marked as failed.
> 

tested similar setup some time ago, but with iSCSI exports from 3+
storages. DRBD works with only 2 and if don't need more than 2 just use
DRBD

>  
> 
> Made the NFS server available again, and I had to manually re-add the
> failed device, but it quickly re-synced.  The software raid keeps a
rough
> track of the areas of the disk that changed.  Then I repeated the
process
> with the other NFS server to verify I could kill either NFS server
without
> significant downtime..
> 
>  
> 
>  
> 
> Pros:
> 
> No split brain risk, as the brain is in the VM instead of the storage
> nodes.

not quite true:
 stor1 fails -> VM keeps using stor2 -> power outage (all machines are
down) -> stor1 boots faster and VM starts using it ... oops

> 
> Load balanced reads - very fast I/O when you have multi-processes
reading.
> 
> Can have some VMs use just one server, and some use redundant storage
> without complex preplanning or LVM changing, etc.
> 
> (In addition, not exactly a pro as common with most commodity hardware
> options, however just noting it should be fine to have compute resources
> and storage resources on same physical box.  If you do, you can
optionally
> give local disk read priority in the software raid if desired.)
> 
> 

+ If you need more than 2 storages (for speed mostly and DRBD is not an
option here), then you can even go with RAID5/6

> 
> Cons:
> 
> More work in setup of the guest instead of the upfront extra work in the
> setup of storage and STONITH.  (Not that bad, and that's what templates
> are for).
> 
> Not sure how difficult booting would be with one of the storage units
> down.  Having disks go away and come back while running seems fine, but
> there may be extra work to force a guest online with one of the storage
> devices down.
> 
> Did not autorecover.  a reboot probably would easy-autorecover but
> assuming you want 0 downtime...  You just have to make sure you are
setup
> to monitor /proc/mdstat or similar to detect a a broken raid so you know
> it needs attention...
> 
> Could get complicated for automatic recovery if you want the compute
side
> failover/redundant at the same time, and want things to work smoothly
when
> one of the storage nodes is down at the same time.  Shouldn't be too bad
> if you are ok with either a storage node going down, or a compute node
> going down, but if 2 of the 4 go down at the same time manual
intervention
> may be required if you can't get the other storage unit back online.
> 
>  

- If you have a large data-set you risk for data loss, because of the
extended rebuild time

> 
>  
> 
> At this point I am not sure I would recommend/use it over drbd or any of
> the various cluster filesystems, etc.  just that it did test out well
> enough that I am at least considering it, given that most of my servers
I
> don't need the redundant network storage (maybe 3%) beyond what's built
> into the boxes, as the majority of our servers are active/active with
> redundant failover loadbalancers in front of them, or active/passive
with
> sync'd configs, or are simply not critical 24/7/365.

It has some use cases, but generally not recommended. I would probably use
such setup for load-balanced (mostly) read-only data or a small partition
with clustered (session cache) fs for a high load web cloud or really
(really) large storage pool with small chunks on each server combined in
LVM after the software raid

For 2 nodes always use DRBD