Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Feb 18, 2009 at 03:44:15PM -0700, David.Livingstone at cn.ca wrote: > Lars, > > Thanks for the reply. See below. > > > On Tue, Feb 17, 2009 at 03:52:16PM -0700, David.Livingstone at cn.ca wrote: > > > Hello, > > > > > > I currently have two two-node clusters running heartbeat and > > > drbd(see background below). I also have a two-node test which I > > > decided to update to the latest releases of all. In so doing I > > > downloaded and installed drbd 8.3.0(drbd-8.3.0.tar.gz) which > > > includes three-node setups using stacked clusters. Specifically > > > havng a third backup/brp node geograghically removed from our > > > production cluster is very appealing. > > > > > > I have looked at the online manual(http://www.drbd.org/users-guide/) > > > and read the current information for three-node setups and have some > > > observations/questions : > > > - An illustraion/figure of a three-node setup would help. > > > there are several ways to to it. > > you can also have four nodes: two two-node DRBD, the primary of which is > > the "lower" resource of a "stacked" DRBD. > > Are there some examples I can review somewhere ? I'll try to make you an ascii art picture so you better see what should be happening: ,---- replicated to 3rd |--------- FILE SYSTEM /data ---|/[*] |--------- drbd1 - on 1st ------|- drbd1-metadata| |--------- drbd0 - on 1st -----------------------| <==> replicated to 2nd node Alpha and Bravo may change roles at anytime (for failover/switchover), node Charlie will connect to the respective "Primary" of those. Because of that, [*] has to be the "floating ip" of the drbd cluster, which you wrote in your config file as the ip in the stacked-on-top-of section. for data to be replicated to all three nodes, it has to go through both drbd1, which replicates to the 3rd node, and drbd0 (where drbd1 passes its local writes to), which replicates to the local respective "other" node. thus our naming of "upper" and "lower" drbd: they are stacked. as soon as drbd1 is active, you can no longer mount drbd0 (and you should not, either, see above). you need to mount the "upper" drbd. I suggest to have _two_ "floating cluster ips", one for the drbd replication link to the 3rd node, and one for any cluster services clients may connect to, as they may be on different interfaces/network segments: ,------------------ local cluster --------------------------------. | ,----- Alpha -----. ,----- Bravo -----.| | | |- fixed-ip <==> fixed-ip -| || | | | | || | `-----------------´ `-----------------´| | [cluster ip 1] | | [cluster ip 2] | `------------------ local cluster --------------------------------´ clients would connect to [cluster ip 1], Charlie would connect its drbd to [cluster ip 2]. the drbd on Charlie may well be an "upper" drbd replicating to a "lower" drbd on Charlie, which would then replicate further to Delta. in that case, you'd have a floating ip on Charlie and Delta as well, and the upper DRBD would replicate from either (Alpha or Bravo) to either (Charlie or Delta), or vice versa, depending on which one is "active" in the lower/upper drbd. if it is all local networks, you can have them all managed within one pacemaker cluster, and the currently active "upper" DRBD (and resources using it) may then be moved freely among all four nodes. if you get the contraints right, that is. use four such DRBD, and add preferential contraints, so they would be equally distributed in normal operation. you'd use protocol C (or maybe B) throughout, and do fully automatic pacemaker controlled failovers. if the replicatin link between both clusters is a WAN, you probably do NOT want them all within one pacemaker cluster, and you'd use protocol A (and potentially the DRBD Proxy; contact LINBIT) on that link, and would do only semi-automatic, operator confirmed, failover between sites. hope that helps. > > > Other Questions : > > > - Is the manual available for download/printing ? > > > No. We hand it out in training sessions, though. > > Vienna sounds good ... now if I could convince my boss ... we also do London and Berlin regularly. I'm not sure about the North America, I don't think we have fix a training schedule yet. But if you are interessted, and we get a few "me too" from North America, we shall be able to arrange one. > > > - Has anyone used the nx_lsa(Linx Sockets Acceration) driver > > > to run drbd ? > > > I'm not exactly sure what that is supposed to do. > > See http://www.netxen.com/technology/pdfs/Netxen_LinuxSocketsAcc_r3.pdf > Essentially it implements a socket-level offload of the network subsystem > to a TCP stack running in firmware on the NIC. By using the nxoffload > facility you can specify tcp ip, ports or applications to offload. "Linux simply cannot drive a 10-Gigabit Ethernet pipe using standard 1500 byte packets" so what. use jumbo frames. use interrupt coalescing. "Another approach is to use a TCP offload engine (TOE). The Linux community has opposed this approach for several reasons, including rigid implementations and limited functionality" uh? then why have several (all?) 10 GB network drivers in the linux kernel TOE functionality? which can be switched on and off using ethtool -K? blablah... And because LSA is implemented completely in firmware, it can accommodate TCP stack or Linux kernel changes, as required. well, if that is true, then we should not even notice, right? it should "just work". "In operation, LSA intercepts calls at the INET ops layer. Based on the rules defined by Selective Acceleration, the offload decision is taken on connect() for active connections and listen() and accept() for passive connections." now, this sound more like a LD_PRELOAD hack? in which case it would be of no use for DRBD, because connections are established from kernel context. no LD_PRELOAD there. I guess you just have to try, and report back. if it does not "just work", ask them whether they support connections established from kernel context, and if/how code would have to be modified to leverage this LSA stuff. and then send a patch. or get us to do the work for you ;) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed