Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 6/13/12 8:02 PM, Dennis Jacobfeuerborn wrote: > On 06/13/2012 05:56 PM, William Seligman wrote: >> On 6/13/12 11:45 AM, Arnold Krille wrote: >>> On Wednesday 13 June 2012 09:26:45 Felix Frank wrote: >>>> On 06/12/2012 08:23 PM, Dennis Jacobfeuerborn wrote: >>>>>> Don't use crossover cables.. In my experience use crossover cables for >>>>>> two >>>>>> >>>>>>> node cluster make only problems... use a simple switch.. >>>>> >>>>> Why would a setup with 2 cables and a switch be more reliable than just a >>>>> single cable? That doesn't make sense. >>>> >>>> Uhm, I don't think that was Eduardo was suggesting. >>>> >>>> Someone on this list (Digimer?) has made a good point some time about >>>> switches allowing for better forensics in case of link problems (i.e., >>>> the switch can help you identify the side with a faulty NIC/cable). >>>> >>>> On the other hand, a switch introduces one more (two really, counting >>>> the extra required cable) possible point of replication failure. I've >>>> never had negative experiences with back-to-back connections either. >>> >>> A switch also has input- and output-buffers introducing another step of >>> latency. >>> >>> We would use a direct link-cable if that scaled for more then 2 servers. (We >>> actually though about just adding more network-cards, connect three servers >>> with three cables directly and use bridges with (r)stp for the storage ring. >>> But now we will just use the additional cards for more redundancy to have >>> trunked connections to two switches...) >> >> A data point: >> >> On my cluster, I have two dedicated direct-link cables between the two nodes, >> one for DRBD traffic, the other for corosync/pacemaker traffic. Roughly once per >> week, I get a "link down" messages on one of the nodes: >> >> Jun 12 09:39:33 orestes-corosync kernel: igb: eth1 NIC Link is Down >> Jun 12 09:39:33 orestes-corosync kernel: igb: eth3 NIC Link is Down >> >> The cluster responds by STONITHing (rebooting) the other node. Everything comes >> up fine, and all the resources continue to be available (though some VMs get >> rebooted, which is mildly annoying). >> >> eth0 and eth2, which are connected to switches, don't have this problem; eth1 is >> on the motherboard while eth3 is on an expansion card; only one node has the >> error. This makes it difficult to diagnose. >> >> It's not a big deal, but it does contribute to the idea that perhaps using an >> intermediate switch increases the chance of a reliable connection, at the >> obvious cost of an additional mode of failure. > > With how many systems have you tested this i.e. how large was your sample > size? An anecdote cannot be generalized and you might as well started of > with a switch then run into problems and fixed it by using a direct connection. > > In order to prefer one case over the other in general one has to provide > sound reasoning *why* exactly one case is inherently better. You are correct; my sample size is one, which proves nothing. I offered the anecdote so that other folks can add it to any other anecdotes upthread, downthread, or elsewhere and perhaps come to a general conclusion. -- Bill Seligman | mailto://seligman@nevis.columbia.edu Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/ PO Box 137 | Irvington NY 10533 USA | Phone: (914) 591-2823 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4497 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120613/9a861d8e/attachment.bin>