[DRBD-user] after heartbeat restart device dont switch back

Heiko rupertt at gmail.com
Thu Aug 14 10:56:05 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,
i have a 2 Server fallback cluster with heartbeat and a XEN VM on a drbd device.
When i do manuelly stop heartbeat the fallback works and the VM starts
on the other machine,
but when I do a heartbeat restart or reboot one server the VM does not
go back to server1.
Sometimes I had the situation where I only come to the login prompt of
the VM, but it seems hanging.
I had to do a drbdb primary(on server1) and shutdown the VM to get it
back working.

Do I have done something wrong here?
Right after the restart of heartbeat I get this message:

heartbeat[7470]: 2008/08/14_08:33:27 info: devmaster-1.blabla.com
wants to go standby [foreign]

but this is the machine  "devmaster-1.blabla.com"


here are my configs(same on both servers):

drbd.conf
-------------------------
resource ticketsystem {
  protocol C;
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
    degr-wfc-timeout 120; ## 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
        after-sb-0pri disconnect;
        after-sb-1pri disconnect;
        after-sb-2pri disconnect;

#       allow-two-primaries;
        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        timeout 60;
#       connect-int 10;
#       ping-int 10;
#       max-buffers 2048;
#       max-epoch-size 2048;
  }
  syncer {
#    group 100;
    al-extents 257;
        rate 20M;
  }

  on devmaster-1.blabla.com {
    device     /dev/drbd0;
    disk       /dev/vg01/lv_ticketsystem;
    address    192.168.60.101:7788;
    meta-disk  internal;
  }

  on devmaster-2.blabla.com {
    device    /dev/drbd0;
    disk      /dev/vg01/lv_ticketsystem;
    address   192.168.60.102:7788;
    meta-disk internal;
  }
}

ha.cf
-------------------------
autojoin none
logfacility local0
udpport 694
keepalive 2
deadtime 10
warntime 3
initdead 20
ucast eth0 192.168.60.101
ucast eth0 192.168.60.102
auto_failback on
watchdog /dev/watchdog
debugfile /var/log/ha-debug
node devmaster-1.blabla.com
node devmaster-2.blabla.com

haresources:
-------------------------
devmaster-1.blabla.com drbddisk::ticketsystem  xen::ticketsystem


and the last lines of the hadebug logfile


master(where the VM ist primarlily)
-------------------------------------------------------
heartbeat[18428]: 2008/08/14_08:31:00 info: Received shutdown notice
from 'devmaster-2.blabla.com'.
heartbeat[18428]: 2008/08/14_08:31:00 info: Resources being acquired
from devmaster-2.blabla.com.
heartbeat[18428]: 2008/08/14_08:31:00 debug: StartNextRemoteRscReq():
child count 1
heartbeat[6872]: 2008/08/14_08:31:00 info: acquire local HA resources (standby).
ResourceManager[6899]:  2008/08/14_08:31:00 info: Acquiring resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
heartbeat[6873]: 2008/08/14_08:31:00 info: Local Resource acquisition completed.
heartbeat[18428]: 2008/08/14_08:31:00 debug: StartNextRemoteRscReq():
child count 1
heartbeat[6872]: 2008/08/14_08:31:01 info: local HA resource
acquisition completed (standby).
heartbeat[18428]: 2008/08/14_08:31:01 info: Standby resource
acquisition done [foreign].
heartbeat[6972]: 2008/08/14_08:31:01 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[6972]:     2008/08/14_08:31:01 info: Running /etc/ha.d/rc.d/status status
mach_down[6988]:        2008/08/14_08:31:01 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[6988]:        2008/08/14_08:31:01 info: mach_down takeover
complete for node devmaster-2.blabla.com.
heartbeat[18428]: 2008/08/14_08:31:01 info: mach_down takeover complete.
heartbeat[18428]: 2008/08/14_08:31:13 WARN: node devmaster-2.blabla.com: is dead
heartbeat[18428]: 2008/08/14_08:31:13 info: Dead node
devmaster-2.blabla.com gave up resources.
heartbeat[18428]: 2008/08/14_08:31:13 info: Link
devmaster-2.blabla.com:eth0 dead.
heartbeat[18428]: 2008/08/14_08:31:39 info: Heartbeat shutdown in
progress. (18428)
heartbeat[7131]: 2008/08/14_08:31:39 info: Giving up all HA resources.
ResourceManager[7145]:  2008/08/14_08:31:39 info: Releasing resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
ResourceManager[7145]:  2008/08/14_08:31:39 info: Running
/etc/ha.d/resource.d/xen ticketsystem stop
ResourceManager[7145]:  2008/08/14_08:31:39 debug: Starting
/etc/ha.d/resource.d/xen ticketsystem stop
Domain ticketsystem terminated
All domains terminated
ResourceManager[7145]:  2008/08/14_08:32:00 debug:
/etc/ha.d/resource.d/xen ticketsystem stop done. RC=0
ResourceManager[7145]:  2008/08/14_08:32:00 info: Running
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[7145]:  2008/08/14_08:32:00 debug: Starting
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[7145]:  2008/08/14_08:32:00 debug:
/etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0
heartbeat[7131]: 2008/08/14_08:32:00 info: All HA resources relinquished.
heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBFIFO process
18431 with signal 15
heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBWRITE process
18432 with signal 15
heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBREAD process
18433 with signal 15
heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBWRITE process
18434 with signal 15
heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBREAD process
18435 with signal 15
heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18435 exited.
5 remaining
heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18431 exited.
4 remaining
heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18433 exited.
3 remaining
heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18434 exited.
2 remaining
heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18432 exited.
1 remaining
heartbeat[18428]: 2008/08/14_08:32:02 info: devmaster-1.blabla.com
Heartbeat shutdown complete.
heartbeat[7469]: 2008/08/14_08:32:23 info: Version 2 support: false
heartbeat[7469]: 2008/08/14_08:32:23 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[7469]: 2008/08/14_08:32:23 info: **************************
heartbeat[7469]: 2008/08/14_08:32:23 info: Configuration validated.
Starting heartbeat 2.1.3
heartbeat[7470]: 2008/08/14_08:32:23 info: heartbeat: version 2.1.3
heartbeat[7470]: 2008/08/14_08:32:23 info: Heartbeat generation: 1217252823
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound send
socket to device: eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: started on
port 694 interface eth0 to 192.168.60.101
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound send
socket to device: eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: started on
port 694 interface eth0 to 192.168.60.102
heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[7470]: 2008/08/14_08:32:23 notice: Using watchdog device:
/dev/watchdog
heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[7470]: 2008/08/14_08:32:23 info: Local status now set to: 'up'
heartbeat[7470]: 2008/08/14_08:32:44 WARN: node devmaster-2.blabla.com: is dead
heartbeat[7470]: 2008/08/14_08:32:44 info: Comm_now_up(): updating
status to active
heartbeat[7470]: 2008/08/14_08:32:44 info: Local status now set to: 'active'
heartbeat[7470]: 2008/08/14_08:32:44 WARN: No STONITH device configured.
heartbeat[7470]: 2008/08/14_08:32:44 WARN: Shared disks are not protected.
heartbeat[7470]: 2008/08/14_08:32:44 info: Resources being acquired
from devmaster-2.blabla.com.
heartbeat[7535]: 2008/08/14_08:32:44 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[7535]:     2008/08/14_08:32:44 info: Running /etc/ha.d/rc.d/status status
mach_down[7565]:        2008/08/14_08:32:44 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
heartbeat[7536]: 2008/08/14_08:32:44 info: Local Resource acquisition completed.
heartbeat[7470]: 2008/08/14_08:32:44 info: mach_down takeover complete.
heartbeat[7470]: 2008/08/14_08:32:44 info: Initial resource
acquisition complete (mach_down)
heartbeat[7470]: 2008/08/14_08:32:44 debug: StartNextRemoteRscReq():
child count 2
heartbeat[7470]: 2008/08/14_08:32:44 debug: StartNextRemoteRscReq():
child count 1
mach_down[7565]:        2008/08/14_08:32:44 info: mach_down takeover
complete for node devmaster-2.blabla.com.
heartbeat[7627]: 2008/08/14_08:32:44 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[7627]:     2008/08/14_08:32:44 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[7627]:  2008/08/14_08:32:44 received ip-request-resp
drbddisk::ticketsystem OK yes
ResourceManager[7648]:  2008/08/14_08:32:44 info: Acquiring resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
ResourceManager[7648]:  2008/08/14_08:32:44 info: Running
/etc/ha.d/resource.d/drbddisk ticketsystem start
ResourceManager[7648]:  2008/08/14_08:32:44 debug: Starting
/etc/ha.d/resource.d/drbddisk ticketsystem start
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
heartbeat[7470]: 2008/08/14_08:32:54 info: Local Resource acquisition
completed. (none)
heartbeat[7470]: 2008/08/14_08:32:54 info: local resource transition completed.
/dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11
ResourceManager[7648]:  2008/08/14_08:32:56 debug:
/etc/ha.d/resource.d/drbddisk ticketsystem start done. RC=1
ResourceManager[7648]:  2008/08/14_08:32:56 ERROR: Return code 1 from
/etc/ha.d/resource.d/drbddisk
ResourceManager[7648]:  2008/08/14_08:32:56 CRIT: Giving up resources
due to failure of drbddisk::ticketsystem
ResourceManager[7648]:  2008/08/14_08:32:56 info: Releasing resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
ResourceManager[7648]:  2008/08/14_08:32:56 info: Running
/etc/ha.d/resource.d/xen ticketsystem stop
ResourceManager[7648]:  2008/08/14_08:32:56 debug: Starting
/etc/ha.d/resource.d/xen ticketsystem stop
Error: Domain 'ticketsystem' does not exist.
Usage: xm shutdown <Domain> [-waRH]

Shutdown a domain.
ResourceManager[7648]:  2008/08/14_08:32:56 debug:
/etc/ha.d/resource.d/xen ticketsystem stop done. RC=0
ResourceManager[7648]:  2008/08/14_08:32:56 info: Running
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[7648]:  2008/08/14_08:32:56 debug: Starting
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[7648]:  2008/08/14_08:32:57 debug:
/etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0
hb_standby[7811]:       2008/08/14_08:33:27 Going standby [foreign].
heartbeat[7470]: 2008/08/14_08:33:27 info: devmaster-1.blabla.com
wants to go standby [foreign]
heartbeat[7470]: 2008/08/14_08:33:37 WARN: No reply to standby
request.  Standby request cancelled.


slave:
-------------
heartbeat[4593]: 2008/08/14_08:31:00 info: Heartbeat shutdown in
progress. (4593)
heartbeat[14462]: 2008/08/14_08:31:00 info: Giving up all HA resources.
ResourceManager[14475]: 2008/08/14_08:31:00 info: Releasing resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
ResourceManager[14475]: 2008/08/14_08:31:00 info: Running
/etc/ha.d/resource.d/xen ticketsystem stop
ResourceManager[14475]: 2008/08/14_08:31:00 debug: Starting
/etc/ha.d/resource.d/xen ticketsystem stop
Error: Domain 'ticketsystem' does not exist.
Usage: xm shutdown <Domain> [-waRH]

Shutdown a domain.
ResourceManager[14475]: 2008/08/14_08:31:00 debug:
/etc/ha.d/resource.d/xen ticketsystem stop done. RC=0
ResourceManager[14475]: 2008/08/14_08:31:00 info: Running
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[14475]: 2008/08/14_08:31:00 debug: Starting
/etc/ha.d/resource.d/drbddisk ticketsystem stop
ResourceManager[14475]: 2008/08/14_08:31:00 debug:
/etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0
heartbeat[14462]: 2008/08/14_08:31:00 info: All HA resources relinquished.
heartbeat[4593]: 2008/08/14_08:31:01 WARN: 1 lost packet(s) for
[devmaster-1.blabla.com] [67382:67384]
heartbeat[4593]: 2008/08/14_08:31:01 info: No pkts missing from
devmaster-1.blabla.com!
heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBWRITE process
4598 with signal 15
heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBREAD process 4599
with signal 15
heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBFIFO process 4595
with signal 15
heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBWRITE process
4596 with signal 15
heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBREAD process 4597
with signal 15
heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4597 exited. 5 remaining
heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4595 exited. 4 remaining
heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4596 exited. 3 remaining
heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4598 exited. 2 remaining
heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4599 exited. 1 remaining
heartbeat[4593]: 2008/08/14_08:31:02 info: devmaster-2.blabla.com
Heartbeat shutdown complete.
heartbeat[14601]: 2008/08/14_08:31:23 info: Version 2 support: false
heartbeat[14601]: 2008/08/14_08:31:23 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[14601]: 2008/08/14_08:31:23 info: **************************
heartbeat[14601]: 2008/08/14_08:31:23 info: Configuration validated.
Starting heartbeat 2.1.3
heartbeat[14602]: 2008/08/14_08:31:23 info: heartbeat: version 2.1.3
heartbeat[14602]: 2008/08/14_08:31:23 info: Heartbeat generation: 1217252802
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound send
socket to device: eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: started on
port 694 interface eth0 to 192.168.60.101
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound send
socket to device: eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: started on
port 694 interface eth0 to 192.168.60.102
heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[14602]: 2008/08/14_08:31:23 notice: Using watchdog device:
/dev/watchdog
heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[14602]: 2008/08/14_08:31:23 info: Local status now set to: 'up'
heartbeat[14602]: 2008/08/14_08:31:24 info: Link devmaster-1.blabla.com:eth0 up.
heartbeat[14602]: 2008/08/14_08:31:24 info: Status update for node
devmaster-1.blabla.com: status active
heartbeat[14610]: 2008/08/14_08:31:24 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[14610]:    2008/08/14_08:31:24 info: Running /etc/ha.d/rc.d/status status
heartbeat[14602]: 2008/08/14_08:32:01 info: Received shutdown notice
from 'devmaster-1.blabla.com'.
heartbeat[14602]: 2008/08/14_08:32:01 info: Resources being acquired
from devmaster-1.blabla.com.
heartbeat[14602]: 2008/08/14_08:32:01 debug: StartNextRemoteRscReq():
child count 1
heartbeat[14632]: 2008/08/14_08:32:01 info: acquire all HA resources (standby).
heartbeat[14633]: 2008/08/14_08:32:01 info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys devmaster-2.blabla.com]
to acquire.
heartbeat[14602]: 2008/08/14_08:32:01 debug: StartNextRemoteRscReq():
child count 1
ResourceManager[14658]: 2008/08/14_08:32:01 info: Acquiring resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
ResourceManager[14658]: 2008/08/14_08:32:01 info: Running
/etc/ha.d/resource.d/drbddisk ticketsystem start
ResourceManager[14658]: 2008/08/14_08:32:01 debug: Starting
/etc/ha.d/resource.d/drbddisk ticketsystem start
ResourceManager[14658]: 2008/08/14_08:32:01 debug:
/etc/ha.d/resource.d/drbddisk ticketsystem start done. RC=0
ResourceManager[14658]: 2008/08/14_08:32:01 info: Running
/etc/ha.d/resource.d/xen ticketsystem start
ResourceManager[14658]: 2008/08/14_08:32:01 debug: Starting
/etc/ha.d/resource.d/xen ticketsystem start Using config file
"/etc/xen/ticketsystem".
Started domain ticketsystem
ResourceManager[14658]: 2008/08/14_08:32:02 debug:
/etc/ha.d/resource.d/xen ticketsystem start done. RC=0
heartbeat[14632]: 2008/08/14_08:32:02 info: all HA resource
acquisition completed (standby).
heartbeat[14963]: 2008/08/14_08:32:02 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[14963]:    2008/08/14_08:32:02 info: Running /etc/ha.d/rc.d/status status
mach_down[14979]:       2008/08/14_08:32:02 info: Taking over resource
group drbddisk::ticketsystem
ResourceManager[15005]: 2008/08/14_08:32:02 info: Acquiring resource
group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem
mach_down[14979]:       2008/08/14_08:32:02 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[14979]:       2008/08/14_08:32:02 info: mach_down takeover
complete for node devmaster-1.blabla.com.
heartbeat[14602]: 2008/08/14_08:32:12 WARN: node devmaster-1.blabla.com: is dead
heartbeat[14602]: 2008/08/14_08:32:12 info: Cancelling pending standby operation
heartbeat[14602]: 2008/08/14_08:32:12 info: Dead node
devmaster-1.blabla.com gave up resources.
heartbeat[14602]: 2008/08/14_08:32:12 info: Link
devmaster-1.blabla.com:eth0 dead.
heartbeat[14602]: 2008/08/14_08:32:12 info: Comm_now_up(): updating
status to active
heartbeat[14602]: 2008/08/14_08:32:12 info: Local status now set to: 'active'



thx for your help

Rupertt



More information about the drbd-user mailing list