Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, We have the following setup: Two physical servers installed with DRBD 8.3.2 and Heartbeat 2.1.3 on CentOS 5.4. Everything installed via official RPM packages in CentOS' repositories. They have two bonded direct links between them for DRBD replication, and two other bonded links for all other traffic (management, iSCSI etc.) We can do hb_takeover from host to host without any issues. When we power off the primary host, the other host tries to take over, but never succeeds. We see the following lines in the log several times, until heartbeat gives up, and goes standby again: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 0 (0x0) block drbd0: fence-peer helper broken, returned 0 block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 After the "failed" node gets powered on again, they are in a split-brain condition. We have tried compiling the latest DRBD and Heartbeat and using those, but the error is the same. Here is our drbd.conf: resource r0 { protocol C; startup { wfc-timeout 0; } disk { on-io-error detach; no-disk-barrier; no-disk-flushes; no-md-flushes; fencing resource-only; } net { max-buffers 20000; max-epoch-size 20000; sndbuf-size 1M; } syncer { rate 2000M; al-extents 1201; } on server1 { device /dev/drbd0; disk /dev/dm-1; address 172.16.0.127:7788; meta-disk internal; } on server2 { device /dev/drbd0; disk /dev/dm-1; address 172.16.0.227:7788; meta-disk internal; } Here is our ha.cf: use_logd yes keepalive 1 deadtime 10 warntime 10 initdead 20 udpport 694 ucast bond0.20 10.0.0.127 auto_failback off node server1 server2 uuidfrom nodename respawn hacluster /usr/lib/heartbeat/ipfail ping 10.0.0.1 deadping 20 How can we solve this problem? Best Regards, Mikkel R. Jakobsen Systems Consultant DANSUPPORT A/S