Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, 01 Feb 2012 22:24:15 -0500 (EST), "John Lauro" <john.lauro at covenanteyes.com> wrote: > I apologize for the off topic discussion. If there is a more general list > about the dozens of network storage options please point me in the right > direction. > > > > I am testing various network redundancy options and failover options, and > throwing out an option for comment that I haven't really seen talked about > anywhere (here or elsewhere) as a viable configuration. Only mentioning > it because I did some tests in the lab and it worked better than I > expected given the lack of it being mentioned as an option anywhere. The > test consisted of two storage servers (I used NFS, but iSCSI, etc. should > work if someone wants to replicate), and a compute server that housed VM's > storage on the NFS servers. > > > > Here is the twist. instead of using drbd, or some other mechanism to > provide the redundant storage, I setup a software RAID 1 inside the guest > with two different virtual disks on different NFS servers. After all > setup and running fine, I killed one of the NFS servers (technically just > blocked it with iptables) and the guest would freeze on some I/O for > awhile. At that time, cached I/O was fine so it was fine if you stuck to > common directories, etc., and sessions to non cached data would freeze, > but you could open a new session just fine. After a bit (maybe two > minutes, but probably configurable somewhere) the software RAID disabled > the failed drive and all was well within the vm to any processes. The > only noticible problem was the momentary hang of a few processes until the > drive was marked as failed. > tested similar setup some time ago, but with iSCSI exports from 3+ storages. DRBD works with only 2 and if don't need more than 2 just use DRBD > > > Made the NFS server available again, and I had to manually re-add the > failed device, but it quickly re-synced. The software raid keeps a rough > track of the areas of the disk that changed. Then I repeated the process > with the other NFS server to verify I could kill either NFS server without > significant downtime.. > > > > > > Pros: > > No split brain risk, as the brain is in the VM instead of the storage > nodes. not quite true: stor1 fails -> VM keeps using stor2 -> power outage (all machines are down) -> stor1 boots faster and VM starts using it ... oops > > Load balanced reads - very fast I/O when you have multi-processes reading. > > Can have some VMs use just one server, and some use redundant storage > without complex preplanning or LVM changing, etc. > > (In addition, not exactly a pro as common with most commodity hardware > options, however just noting it should be fine to have compute resources > and storage resources on same physical box. If you do, you can optionally > give local disk read priority in the software raid if desired.) > > + If you need more than 2 storages (for speed mostly and DRBD is not an option here), then you can even go with RAID5/6 > > Cons: > > More work in setup of the guest instead of the upfront extra work in the > setup of storage and STONITH. (Not that bad, and that's what templates > are for). > > Not sure how difficult booting would be with one of the storage units > down. Having disks go away and come back while running seems fine, but > there may be extra work to force a guest online with one of the storage > devices down. > > Did not autorecover. a reboot probably would easy-autorecover but > assuming you want 0 downtime... You just have to make sure you are setup > to monitor /proc/mdstat or similar to detect a a broken raid so you know > it needs attention... > > Could get complicated for automatic recovery if you want the compute side > failover/redundant at the same time, and want things to work smoothly when > one of the storage nodes is down at the same time. Shouldn't be too bad > if you are ok with either a storage node going down, or a compute node > going down, but if 2 of the 4 go down at the same time manual intervention > may be required if you can't get the other storage unit back online. > > - If you have a large data-set you risk for data loss, because of the extended rebuild time > > > > At this point I am not sure I would recommend/use it over drbd or any of > the various cluster filesystems, etc. just that it did test out well > enough that I am at least considering it, given that most of my servers I > don't need the redundant network storage (maybe 3%) beyond what's built > into the boxes, as the majority of our servers are active/active with > redundant failover loadbalancers in front of them, or active/passive with > sync'd configs, or are simply not critical 24/7/365. It has some use cases, but generally not recommended. I would probably use such setup for load-balanced (mostly) read-only data or a small partition with clustered (session cache) fs for a high load web cloud or really (really) large storage pool with small chunks on each server combined in LVM after the software raid For 2 nodes always use DRBD