[DRBD-user] fence-peer helper broken, returned 0

Thu Mar 11 14:34:27 CET 2010

Hi,

We have the following setup:

Two physical servers installed with DRBD 8.3.2 and Heartbeat 2.1.3 on
CentOS 5.4. Everything installed via official RPM packages in CentOS'
repositories.
They have two bonded direct links between them for DRBD replication, and
two other bonded links for all other traffic (management, iSCSI etc.)

We can do hb_takeover from host to host without any issues.
When we power off the primary host, the other host tries to take over,
but never succeeds.

We see the following lines in the log several times, until heartbeat
gives up, and goes standby again:

block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code
0 (0x0)
block drbd0: fence-peer helper broken, returned 0
block drbd0: helper command: /sbin/drbdadm fence-peer minor-0

After the "failed" node gets powered on again, they are in a split-brain
condition.
We have tried compiling the latest DRBD and Heartbeat and using those,
but the error is the same.

Here is our drbd.conf:
resource r0 {
        protocol C;

        startup { wfc-timeout 0; }

        disk { on-io-error detach;
                no-disk-barrier;
                no-disk-flushes;
                no-md-flushes;
                fencing resource-only;
        }

        net {
                max-buffers 20000;
                max-epoch-size 20000;
                sndbuf-size 1M;
        }

        syncer { rate 2000M;
                 al-extents 1201; }

        on server1 {
                device /dev/drbd0;
                disk /dev/dm-1;
                address 172.16.0.127:7788;
                meta-disk internal;
        }

        on server2 {
                device /dev/drbd0;
                disk /dev/dm-1;
                address 172.16.0.227:7788;
                meta-disk internal;
        }

Here is our ha.cf:
use_logd        yes
keepalive       1
deadtime        10
warntime        10
initdead        20
udpport         694
ucast           bond0.20 10.0.0.127
auto_failback   off
node            server1 server2

uuidfrom        nodename
respawn hacluster /usr/lib/heartbeat/ipfail
ping            10.0.0.1
deadping        20

How can we solve this problem?

Best Regards,

Mikkel R. Jakobsen
Systems Consultant
DANSUPPORT A/S