[DRBD-user] Corosync Configuration

Wed Jun 13 17:56:17 CEST 2012

On 6/13/12 11:45 AM, Arnold Krille wrote:
> On Wednesday 13 June 2012 09:26:45 Felix Frank wrote:
>> On 06/12/2012 08:23 PM, Dennis Jacobfeuerborn wrote:
>>>> Don't use crossover cables.. In my experience use crossover cables for
>>>> two
>>>>
>>>>> node cluster make only problems... use a simple switch..
>>>
>>> Why would a setup with 2 cables and a switch be more reliable than just a
>>> single cable? That doesn't make sense.
>>
>> Uhm, I don't think that was Eduardo was suggesting.
>>
>> Someone on this list (Digimer?) has made a good point some time about
>> switches allowing for better forensics in case of link problems (i.e.,
>> the switch can help you identify the side with a faulty NIC/cable).
>>
>> On the other hand, a switch introduces one more (two really, counting
>> the extra required cable) possible point of replication failure. I've
>> never had negative experiences with back-to-back connections either.
> 
> A switch also has input- and output-buffers introducing another step of 
> latency.
> 
> We would use a direct link-cable if that scaled for more then 2 servers. (We 
> actually though about just adding more network-cards, connect three servers 
> with three cables directly and use bridges with (r)stp for the storage ring. 
> But now we will just use the additional cards for more redundancy to have 
> trunked connections to two switches...)

A data point:

On my cluster, I have two dedicated direct-link cables between the two nodes,
one for DRBD traffic, the other for corosync/pacemaker traffic. Roughly once per
week, I get a "link down" messages on one of the nodes:

Jun 12 09:39:33 orestes-corosync kernel: igb: eth1 NIC Link is Down
Jun 12 09:39:33 orestes-corosync kernel: igb: eth3 NIC Link is Down

The cluster responds by STONITHing (rebooting) the other node. Everything comes
up fine, and all the resources continue to be available (though some VMs get
rebooted, which is mildly annoying).

eth0 and eth2, which are connected to switches, don't have this problem; eth1 is
on the motherboard while eth3 is on an expansion card; only one node has the
error. This makes it difficult to diagnose.

It's not a big deal, but it does contribute to the idea that perhaps using an
intermediate switch increases the chance of a reliable connection, at the
obvious cost of an additional mode of failure.

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120613/837de782/attachment.bin>