Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Now you know why the bottom of http://www.drbd.org/users-guide/ch-heartbeat.html is so emphatic about having two _independent_ channels not dependent on a SPOF, such as your switch. 'Heartbeat communication channels Heartbeat uses a UDP-based communication protocol to periodically check for node availability (the "heartbeat" proper). For this purpose, Heartbeat can use several communication methods, including: • IP multicast, • IP broadcast, • IP unicast, • serial line. Of these, IP multicast and IP broadcast are the most relevant in practice. The absolute minimum requirement for stable cluster operation is _two independent_ communication channels. ' From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Heiko Sent: Friday, July 03, 2009 09:01 To: drbd-user at lists.linbit.com Subject: [DRBD-user] switch was down, all drbd machines rebootet Hello, i had an earlier discussion here where we came to the conclusion that using Protocol C can cause crashes. Yesterday we had problems with one of our switches and therefore the drbd enabled machines couldnt see each other, than all the machines did reboots, created splitbrains and a lot of work. Do you think the crashes/reboots are caused by the same problem or can we prevent this behavouir by optimizing our heartbeat drbd config? Ill attach a drbd config and the ha.cf --------------------------------- drbd.conf common { protocol C; } resource drbd_backend { startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 500M; al-extents 257; } on xen-B1.fra1.mailcluster { device /dev/drbd0; disk /dev/md3; address 172.20.2.1:7788; meta-disk internal; } on xen-A1.fra1.mailcluster { device /dev/drbd0; disk /dev/md3; address 172.20.1.1:7788; meta-disk internal; } } --------------------------------------- ha.cf #use_logd on logfile /var/log/ha-log debugfile /var/log/ha-debug logfacility local0 keepalive 2 deadtime 10 warntime 3 initdead 20 udpport 694 ucast eth0 172.20.1.1 ucast eth0 172.20.2.1 node xen-A1.fra1.mailcluster node xen-B1.fra1.mailcluster auto_failback on thnx a lot .r