Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
This is very timely feedback, thanks to everyone that has taken the time to respond in such detail :-) >> In fact - scratch that - the bottleneck will almost certainly be >> the network device you will be doing mirroring (DRBD) over, even if >> you are using multiple bonded Gb ethernet NICs. So the overhead of >> spending a bit of CPU on RAID6 is certainly not going to be what >> will be holding you back. Yes, network bandwidth is my limiting factor. Not only do I mirror the data via DRBD, but the data that's being written is coming across the network to begin with. >> Please read the archives of linux-raid as to what is the recommended >> raid5 size (as in number of drives); it's definitely below 20. > > On _anything_ RAID5 generally makes sense up to about 20 disks, > because the failure rate will mean you need more redundancy than > that. RAID6 should just about cope with 80 disks, especially if you > are looking at mirroring the setup with DRBD (effectively giving you > RAID61). Ah, my misunderstanding... I was thinking of RAID5 and RAID6 in groups of more like 5 disks which is why I thought the overhead was so high. > Resyncing will involve lots of pain anyway. Have you checked how > long it takes to write a TB of data?? RAID6 will keep you going with > 2 failed drives, and if you do it so that you have a RAID6 stripe of > mirrors (RAID16), with each mirror being a DRBD device, it would > give you pretty spectacular redundancy, because you would have to > lose three complete mirror sets. Syncing 8 1 TB drives across a GB switch is going to take 14+ hours :-/ But I'm a bit confused here.... My original proposal was essentially RAID1+JBOD with RAID1 provided by DRBD and JBOD by LVM. In this setup, a single drive failure would be handled transparently by the DRBD driver without the need for a cluster failover. Am I understanding this correctly? I also don't see why I would need to sync more than a single drives worth of data to recover from a failure..... Since drives are mirrored before being combined into LVM volumes, data loss will only happen if I lose both sides of a mirror. That said, I'm intrigued by the idea of using RAID5 or RAID6 instead of LVM to create my logical volumes on top of DRBD mirrors.... It adds a bit more redundancy at a reasonable price. In addition, while a drive failure in my setup would not cause a cluster failover, I think I would need to failover the cluster to *replace* the bad drive or even re-mirror the DRBD set to another drive. Is this correct? Am I correct in thinking that RAID5/6 would solve this? With the added complexity of heartbeat running on top of all this it will be interesting to see if I can get all this configured correctly ;-) >> http://www.addonics.com/products/raid_system/rack_overview.asp and >> http://www.addonics.com/products/raid_system/mst4.asp >> > Those don't seem to wind up being all the much cheaper, given the > density (is your rack space "free"?) and the lack of hot-swap (or > at least swap w/o having to dismantle the box) ability. It's not that rack space if free but rather than I'll run out of power in my current colo before I run out of space. As a result, density is not my primary goal. With the 4 drive enclosures my plan is to leave dead drives in place until I have at least 2 failures in an enclosure at which point I can pull/fix it as a single unit. The 4U rack case I link to above is more expensive but after a few drive failures I may decide it worth the price ;-) > At least I'm not suggesting that you should get a "Thumper", but I'm > sure > for some people that is the ideal and most economic solution. > (http://www.sun.com/servers/x64/x4500/) Wow, that's about $1.30 per unformatted, non-redundant raw storage. I'm currently at about $0.70/GB for formatted and mirrored storage in a "share nothing" cluster :-) My goal is to get below $0.50/GB but I'll need a little help from Moore's Law to get there. That or I figure out a way to get high-availability and data redundancy without a RAID1 mirror in the mix. At $200 for a 1TB SATA drive, I can't get below $0.25/GB ;-) > I'm always aiming for the cheapest possible solution as well, but > this is always tempered by the reliability and how serviceable the > result needs to be. Agree. And I'm not sure I've made the right trade-offs here... I've got lots of cuts on my hands from working with the 4 drive bays which suggest I've saved a bit too much money :-/ > Come again? I was suggesting an overhead of 2 drives, which comes to > 2.5% > with 80 drives. Other than that RAID5 is free (md driver) and you sure > were not holding back with CPU power in your specs (less, but faster > cores and most likely Opteron instead of Intel would do better in > this scenario). The 'overkill' on CPU was an accident of Dell promotional pricing.... They threw in the 2nd CPU for free ;-) >> I'm using DRBD in part because it both >> replicates data and provides high-availability for the servers/ >> services. I'll have some spare drives racked and powered so when >> drives go bad I can just re-mirror to a good drive leaving the dead >> device in the rack indefinitely. >> > Er, if a drive in your proposed setup dies that volume becomes > corrupted > and you will have to fail over to the other DRBD node and re-sync. I'm mirroring first and then using LVM to create JBOD volumes. So if one drive fails, DRBD will just handle it transparently. A fail-over would be required to fix the drive but even in that case the most I would need to resync is a single drive. RAID5 and RAID6 is similar in the sense that a failure will require resyncing the parity drives.... >> Has anyone else tried to do something like this? How many drives can >> DRBD handle? How much total storage? If I'm the first then I'm >> guessing drive failures will be the least of my issues :-/ >> > If you get this all worked out, drive failures and the ability to > address them in an automatic and efficient manner will be the issue > for > most of the lifetime of this project. ^_- Ah, so true.... :-/ Thanks again for all the great feedback! Tim