[DRBD-user] heartbeat and drbd / Failover / Failback

Fri Nov 30 21:16:08 CET 2007

I have 2 nodes with 2 nics and a serial.  Working on Primary/Standby
model.

    /----- eth0 ------\
node1 --- Serial ---- node2
    \----- eth1 ------/

node1 eth0 192.168.151.91
node1 eth1 192.168.1.91

node2 eth0 192.168.151.91
node2 eth1 192.168.1.91

VirtualIP on eth0 192.168.151.90

drbd uses eth1.
heartbeat uses all three and has ping nodes to test on eth0 and eth1.
node1 is the primary to start.  

If, for any reason, node2 becomes primary, I want node1 to be outdated
so it can not become primary again without human intervention.  

Here's why:
if the eth1 nic fails on node1, heartbeat will rollover to node2 as
primary and drbd will disconnect leaving
node1 -> StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown
node2 -> WFConnection st:Primary/Unknown ds:UpToDate/DUnknown

Both are showing UpToDate and so heartbeat can bring up either as
primary.

If node2 fails before human intervention can correct the problem on
node1, heartbeat rolls node1 back to the primary.  That could cause loss
of data that was written to node2 but not written to node1 because of
the disconnected state.  I want to fail over to node2 but I don't want
it coming back to node1 without human intervention.  (btw I'm not
talking about auto_failback.  I already have that off)  

Also, in the case where a secondary node2 has eth1 fail, I'd like to
outdate node2 so heartbeat can't rollover to it until human intervention
corrects the situation.

I'm guessing there must be so way of doing this already before I start
hacking it up.

Feel free to tell me I'm all wet and what I should by doing.  I'm pretty
new to drbd.  I looking for failover and NOT auto_failback.  Is this
making any sense.

Thanx
Rois

I'm using mandriva 2008.0 with  drbd 8.0.6 and heartbeat 2.0.8.

Here is my ha.cf:
----------------------------------------------------
auto_failback off
logfacility     local0
debugfile /var/log/ha-debug
keepalive 2
deadtime 10
deadping 6
initdead 30
baud 460800
serial /dev/ttyS0
ucast eth0 192.168.151.91 192.168.151.92
ucast eth1 192.168.1.91 192.168.1.92
node svr91 svr92
ping 192.168.151.1
ping 192.168.1.3
respawn hacluster /usr/lib/heartbeat/ipfail
----------------------------------------------------

Here is my drbd.conf:
----------------------------------------------------
global {
    usage-count no;
}
common {
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
    local-io-error "echo o > /proc/sysrq-trigger ; /usr/bin/halt -p";
  }
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
    cram-hmac-alg "[..removed..]";
    shared-secret "[..removed..]";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
  syncer {
    rate 10M;
    al-extents 257;
  }
}
resource home {
  protocol C;
  on svr91 {
    device     /dev/drbd0;
    disk       /dev/vg0/home;
    address    192.168.1.91:7788;
    meta-disk  internal;
  }
  on svr92 {
    device     /dev/drbd0;
    disk       /dev/vg0/home;
    address    192.168.1.92:7788;
    meta-disk  internal;
  }
}
----------------------------------------------------

haresources:
----------------------------------------------------
svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home
Filesystem::/dev/drbd0::/home::xfs
----------------------------------------------------