[DRBD-user] heartbeat and drbd / Failover / Failback

Rois Cannon rois at cobiz.com
Fri Nov 30 21:16:08 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

I have 2 nodes with 2 nics and a serial.  Working on Primary/Standby

    /----- eth0 ------\
node1 --- Serial ---- node2
    \----- eth1 ------/

node1 eth0
node1 eth1

node2 eth0
node2 eth1

VirtualIP on eth0

drbd uses eth1.
heartbeat uses all three and has ping nodes to test on eth0 and eth1.
node1 is the primary to start.  

If, for any reason, node2 becomes primary, I want node1 to be outdated
so it can not become primary again without human intervention.  

Here's why:
if the eth1 nic fails on node1, heartbeat will rollover to node2 as
primary and drbd will disconnect leaving
node1 -> StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown
node2 -> WFConnection st:Primary/Unknown ds:UpToDate/DUnknown

Both are showing UpToDate and so heartbeat can bring up either as

If node2 fails before human intervention can correct the problem on
node1, heartbeat rolls node1 back to the primary.  That could cause loss
of data that was written to node2 but not written to node1 because of
the disconnected state.  I want to fail over to node2 but I don't want
it coming back to node1 without human intervention.  (btw I'm not
talking about auto_failback.  I already have that off)  

Also, in the case where a secondary node2 has eth1 fail, I'd like to
outdate node2 so heartbeat can't rollover to it until human intervention
corrects the situation.

I'm guessing there must be so way of doing this already before I start
hacking it up.

Feel free to tell me I'm all wet and what I should by doing.  I'm pretty
new to drbd.  I looking for failover and NOT auto_failback.  Is this
making any sense.


I'm using mandriva 2008.0 with  drbd 8.0.6 and heartbeat 2.0.8.

Here is my ha.cf:
auto_failback off
logfacility     local0
debugfile /var/log/ha-debug
keepalive 2
deadtime 10
deadping 6
initdead 30
baud 460800
serial /dev/ttyS0
ucast eth0
ucast eth1
node svr91 svr92
respawn hacluster /usr/lib/heartbeat/ipfail

Here is my drbd.conf:
global {
    usage-count no;
common {
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
    local-io-error "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  disk {
    on-io-error   detach;
  net {
    cram-hmac-alg "[..removed..]";
    shared-secret "[..removed..]";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  syncer {
    rate 10M;
    al-extents 257;
resource home {
  protocol C;
  on svr91 {
    device     /dev/drbd0;
    disk       /dev/vg0/home;
    meta-disk  internal;
  on svr92 {
    device     /dev/drbd0;
    disk       /dev/vg0/home;
    meta-disk  internal;

svr91 IPaddr:: drbddisk::home

More information about the drbd-user mailing list