[DRBD-user] "fence-peer helper broken, returned 1"

Ben Beuchler insyte at gmail.com
Tue Sep 29 02:55:24 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I've implemented a simple two-node DRBD cluster with Heartbeat v1.  I
have a single back-end ethernet link for both DRBD replication and
heartbeat traffic.  I realize this is not a stable configuration.

If I kill the ethernet link between the two nodes, drbd does not fail
over, logging these errors:

Sep 28 19:19:55 test01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Sep 28 19:19:55 test01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0 exit code 1 (0x100)
Sep 28 19:19:55 test01 kernel: block drbd0: fence-peer helper broken, returned 1
Sep 28 19:19:55 test01 kernel: block drbd0: Considering state change
from bad state. Error would be: 'Refusing to be Primary while peer is
not outdated'

Is this caused by the lack of a secondary communications mechanism for
Heartbeat to convey the 'fence-peer' command to the second node?  If
so, how does heartbeat/drbd handle a node that fails suddenly and
catastrophically, before the 'fence-peer' command can be conveyed?

Thanks!

-Ben

ha.cf contains these lines:
respawn hacluster /usr/lib64/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster

drbd.conf:

global {
    usage-count no;
}

common {
  syncer { rate 10M; }
}

resource ha_disk {
  protocol C;
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
  }

  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
        fencing         resource-only;
  }

  net {
    cram-hmac-alg "sha1";
    shared-secret "eea0a7bfc04b965b9beb5ba37096bc7a";
    after-sb-0pri discard-older-primary;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 10M;
    al-extents 257;
  }

  on test01 {
    device     /dev/drbd0;
    disk       /dev/vg0/drbd-disk;
    address    10.1.1.210:7788;
    flexible-meta-disk  internal;
  }

  on test02 {
    device     /dev/drbd0;
    disk       /dev/vg0/drbd-disk;
    address    10.1.1.211:7788;
    flexible-meta-disk  internal;
  }
}



More information about the drbd-user mailing list