[DRBD-user] Corosync Configuration

Jake Smith jsmith at argotec.com
Thu Jun 14 05:01:53 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


----- Original Message -----
> From: "Arnold Krille" <arnold at arnoldarts.de>
> To: drbd-user at lists.linbit.com
> Sent: Wednesday, June 13, 2012 3:04:04 PM
> Subject: Re: [DRBD-user] Corosync Configuration
> 
> On 13.06.2012 17:56, William Seligman wrote:
> > A data point:
> > 
> > On my cluster, I have two dedicated direct-link cables between the
> > two nodes,
> > one for DRBD traffic, the other for corosync/pacemaker traffic.
> > Roughly once per
> > week, I get a "link down" messages on one of the nodes:
> 
> A) use several communication-rings in corosync. We use one on the
> regular user-network and a second on the storage-network. One fails,
> no
> problem, corosync doesn't sense a need to fence something.
> B) use bonded/bridged interfaces for the storage-connection. We
> currently have our storage-network aka vlan17 as vlan on eth0 of all
> the
> servers and untagged on eth1, using a bond with active-backup mode
> where
> eth1 is the primary and vlan17 the backup.
> 
> With these two I didn't even realize my boss unplugged the
> network-cables of one of our servers one by one. Neither did drbd
> feel
> any glitch nor did the cluster feel a need to move/kill/fence
> anything.
> And a 5 second hang for the x2go-sessions on one of the machines
> doesn't
> matter when everyone is on break.
> 
> I haven't yet figured out how to build the bridges/bonds when all the
> servers have 4 nics. But that isn't a real problem until I also did
> functionality tests with two (or three) new switches.
> I think I will do one bridge of two ports with rstp for the normal
> user
> network and one bridge of two ports with rstp for the
> storage-network.
> Then skip the active-backup bonding and see that rstp manages to find
> the paths. Of course this wouldn't necessarily improve throughput
> between two nodes, but throughput from one node to two nodes would
> probably be higher.
> Or I extend my current setup and instead of eth0 and eth1 I use one
> pair
> of bonded ports each. Which would give me a total of three bonds per
> server, two one of the 'real' modes and one in active-backup mode...
> 

This is whole thing is somewhat off-topic for DRBD and we should probably move the thread to the pacemaker mailing list but since no one has complained I'll chime in with our setup :-)

We have a pair of bonded nics in each server using round robin directly connected to each other for drbd sync/corosync first ring.

Then we have a second pair of nics on our regular network.  These are using 802.3ad link aggregation through a HP 5412 chassis switch.  That switch supports link aggregation between seperate switch modules so each of the two nics on each server are connected to different modules giving us greater fault tolerance as well as aggregation.  We run our second corosync ring on that bond.

This config theoretically gives us the potential of 3 nic/cable failures without requiring a STONITH.
We've not had a STONITH event due to a link problem since we got the setup running (over 9 months). 

Jake



More information about the drbd-user mailing list