Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, i have a 2 Server fallback cluster with heartbeat and a XEN VM on a drbd device. When i do manuelly stop heartbeat the fallback works and the VM starts on the other machine, but when I do a heartbeat restart or reboot one server the VM does not go back to server1. Sometimes I had the situation where I only come to the login prompt of the VM, but it seems hanging. I had to do a drbdb primary(on server1) and shutdown the VM to get it back working. Do I have done something wrong here? Right after the restart of heartbeat I get this message: heartbeat[7470]: 2008/08/14_08:33:27 info: devmaster-1.blabla.com wants to go standby [foreign] but this is the machine "devmaster-1.blabla.com" here are my configs(same on both servers): drbd.conf ------------------------- resource ticketsystem { protocol C; startup { degr-wfc-timeout 120; # 2 minutes. degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; # allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; timeout 60; # connect-int 10; # ping-int 10; # max-buffers 2048; # max-epoch-size 2048; } syncer { # group 100; al-extents 257; rate 20M; } on devmaster-1.blabla.com { device /dev/drbd0; disk /dev/vg01/lv_ticketsystem; address 192.168.60.101:7788; meta-disk internal; } on devmaster-2.blabla.com { device /dev/drbd0; disk /dev/vg01/lv_ticketsystem; address 192.168.60.102:7788; meta-disk internal; } } ha.cf ------------------------- autojoin none logfacility local0 udpport 694 keepalive 2 deadtime 10 warntime 3 initdead 20 ucast eth0 192.168.60.101 ucast eth0 192.168.60.102 auto_failback on watchdog /dev/watchdog debugfile /var/log/ha-debug node devmaster-1.blabla.com node devmaster-2.blabla.com haresources: ------------------------- devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem and the last lines of the hadebug logfile master(where the VM ist primarlily) ------------------------------------------------------- heartbeat[18428]: 2008/08/14_08:31:00 info: Received shutdown notice from 'devmaster-2.blabla.com'. heartbeat[18428]: 2008/08/14_08:31:00 info: Resources being acquired from devmaster-2.blabla.com. heartbeat[18428]: 2008/08/14_08:31:00 debug: StartNextRemoteRscReq(): child count 1 heartbeat[6872]: 2008/08/14_08:31:00 info: acquire local HA resources (standby). ResourceManager[6899]: 2008/08/14_08:31:00 info: Acquiring resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem heartbeat[6873]: 2008/08/14_08:31:00 info: Local Resource acquisition completed. heartbeat[18428]: 2008/08/14_08:31:00 debug: StartNextRemoteRscReq(): child count 1 heartbeat[6872]: 2008/08/14_08:31:01 info: local HA resource acquisition completed (standby). heartbeat[18428]: 2008/08/14_08:31:01 info: Standby resource acquisition done [foreign]. heartbeat[6972]: 2008/08/14_08:31:01 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[6972]: 2008/08/14_08:31:01 info: Running /etc/ha.d/rc.d/status status mach_down[6988]: 2008/08/14_08:31:01 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[6988]: 2008/08/14_08:31:01 info: mach_down takeover complete for node devmaster-2.blabla.com. heartbeat[18428]: 2008/08/14_08:31:01 info: mach_down takeover complete. heartbeat[18428]: 2008/08/14_08:31:13 WARN: node devmaster-2.blabla.com: is dead heartbeat[18428]: 2008/08/14_08:31:13 info: Dead node devmaster-2.blabla.com gave up resources. heartbeat[18428]: 2008/08/14_08:31:13 info: Link devmaster-2.blabla.com:eth0 dead. heartbeat[18428]: 2008/08/14_08:31:39 info: Heartbeat shutdown in progress. (18428) heartbeat[7131]: 2008/08/14_08:31:39 info: Giving up all HA resources. ResourceManager[7145]: 2008/08/14_08:31:39 info: Releasing resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem ResourceManager[7145]: 2008/08/14_08:31:39 info: Running /etc/ha.d/resource.d/xen ticketsystem stop ResourceManager[7145]: 2008/08/14_08:31:39 debug: Starting /etc/ha.d/resource.d/xen ticketsystem stop Domain ticketsystem terminated All domains terminated ResourceManager[7145]: 2008/08/14_08:32:00 debug: /etc/ha.d/resource.d/xen ticketsystem stop done. RC=0 ResourceManager[7145]: 2008/08/14_08:32:00 info: Running /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[7145]: 2008/08/14_08:32:00 debug: Starting /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[7145]: 2008/08/14_08:32:00 debug: /etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0 heartbeat[7131]: 2008/08/14_08:32:00 info: All HA resources relinquished. heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBFIFO process 18431 with signal 15 heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBWRITE process 18432 with signal 15 heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBREAD process 18433 with signal 15 heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBWRITE process 18434 with signal 15 heartbeat[18428]: 2008/08/14_08:32:02 info: killing HBREAD process 18435 with signal 15 heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18435 exited. 5 remaining heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18431 exited. 4 remaining heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18433 exited. 3 remaining heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18434 exited. 2 remaining heartbeat[18428]: 2008/08/14_08:32:02 info: Core process 18432 exited. 1 remaining heartbeat[18428]: 2008/08/14_08:32:02 info: devmaster-1.blabla.com Heartbeat shutdown complete. heartbeat[7469]: 2008/08/14_08:32:23 info: Version 2 support: false heartbeat[7469]: 2008/08/14_08:32:23 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[7469]: 2008/08/14_08:32:23 info: ************************** heartbeat[7469]: 2008/08/14_08:32:23 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[7470]: 2008/08/14_08:32:23 info: heartbeat: version 2.1.3 heartbeat[7470]: 2008/08/14_08:32:23 info: Heartbeat generation: 1217252823 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound send socket to device: eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound receive socket to device: eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: started on port 694 interface eth0 to 192.168.60.101 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound send socket to device: eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: bound receive socket to device: eth0 heartbeat[7470]: 2008/08/14_08:32:23 info: glib: ucast: started on port 694 interface eth0 to 192.168.60.102 heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[7470]: 2008/08/14_08:32:23 notice: Using watchdog device: /dev/watchdog heartbeat[7470]: 2008/08/14_08:32:23 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[7470]: 2008/08/14_08:32:23 info: Local status now set to: 'up' heartbeat[7470]: 2008/08/14_08:32:44 WARN: node devmaster-2.blabla.com: is dead heartbeat[7470]: 2008/08/14_08:32:44 info: Comm_now_up(): updating status to active heartbeat[7470]: 2008/08/14_08:32:44 info: Local status now set to: 'active' heartbeat[7470]: 2008/08/14_08:32:44 WARN: No STONITH device configured. heartbeat[7470]: 2008/08/14_08:32:44 WARN: Shared disks are not protected. heartbeat[7470]: 2008/08/14_08:32:44 info: Resources being acquired from devmaster-2.blabla.com. heartbeat[7535]: 2008/08/14_08:32:44 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[7535]: 2008/08/14_08:32:44 info: Running /etc/ha.d/rc.d/status status mach_down[7565]: 2008/08/14_08:32:44 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired heartbeat[7536]: 2008/08/14_08:32:44 info: Local Resource acquisition completed. heartbeat[7470]: 2008/08/14_08:32:44 info: mach_down takeover complete. heartbeat[7470]: 2008/08/14_08:32:44 info: Initial resource acquisition complete (mach_down) heartbeat[7470]: 2008/08/14_08:32:44 debug: StartNextRemoteRscReq(): child count 2 heartbeat[7470]: 2008/08/14_08:32:44 debug: StartNextRemoteRscReq(): child count 1 mach_down[7565]: 2008/08/14_08:32:44 info: mach_down takeover complete for node devmaster-2.blabla.com. heartbeat[7627]: 2008/08/14_08:32:44 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[7627]: 2008/08/14_08:32:44 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[7627]: 2008/08/14_08:32:44 received ip-request-resp drbddisk::ticketsystem OK yes ResourceManager[7648]: 2008/08/14_08:32:44 info: Acquiring resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem ResourceManager[7648]: 2008/08/14_08:32:44 info: Running /etc/ha.d/resource.d/drbddisk ticketsystem start ResourceManager[7648]: 2008/08/14_08:32:44 debug: Starting /etc/ha.d/resource.d/drbddisk ticketsystem start /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 heartbeat[7470]: 2008/08/14_08:32:54 info: Local Resource acquisition completed. (none) heartbeat[7470]: 2008/08/14_08:32:54 info: local resource transition completed. /dev/drbd0: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 ResourceManager[7648]: 2008/08/14_08:32:56 debug: /etc/ha.d/resource.d/drbddisk ticketsystem start done. RC=1 ResourceManager[7648]: 2008/08/14_08:32:56 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk ResourceManager[7648]: 2008/08/14_08:32:56 CRIT: Giving up resources due to failure of drbddisk::ticketsystem ResourceManager[7648]: 2008/08/14_08:32:56 info: Releasing resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem ResourceManager[7648]: 2008/08/14_08:32:56 info: Running /etc/ha.d/resource.d/xen ticketsystem stop ResourceManager[7648]: 2008/08/14_08:32:56 debug: Starting /etc/ha.d/resource.d/xen ticketsystem stop Error: Domain 'ticketsystem' does not exist. Usage: xm shutdown <Domain> [-waRH] Shutdown a domain. ResourceManager[7648]: 2008/08/14_08:32:56 debug: /etc/ha.d/resource.d/xen ticketsystem stop done. RC=0 ResourceManager[7648]: 2008/08/14_08:32:56 info: Running /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[7648]: 2008/08/14_08:32:56 debug: Starting /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[7648]: 2008/08/14_08:32:57 debug: /etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0 hb_standby[7811]: 2008/08/14_08:33:27 Going standby [foreign]. heartbeat[7470]: 2008/08/14_08:33:27 info: devmaster-1.blabla.com wants to go standby [foreign] heartbeat[7470]: 2008/08/14_08:33:37 WARN: No reply to standby request. Standby request cancelled. slave: ------------- heartbeat[4593]: 2008/08/14_08:31:00 info: Heartbeat shutdown in progress. (4593) heartbeat[14462]: 2008/08/14_08:31:00 info: Giving up all HA resources. ResourceManager[14475]: 2008/08/14_08:31:00 info: Releasing resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem ResourceManager[14475]: 2008/08/14_08:31:00 info: Running /etc/ha.d/resource.d/xen ticketsystem stop ResourceManager[14475]: 2008/08/14_08:31:00 debug: Starting /etc/ha.d/resource.d/xen ticketsystem stop Error: Domain 'ticketsystem' does not exist. Usage: xm shutdown <Domain> [-waRH] Shutdown a domain. ResourceManager[14475]: 2008/08/14_08:31:00 debug: /etc/ha.d/resource.d/xen ticketsystem stop done. RC=0 ResourceManager[14475]: 2008/08/14_08:31:00 info: Running /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[14475]: 2008/08/14_08:31:00 debug: Starting /etc/ha.d/resource.d/drbddisk ticketsystem stop ResourceManager[14475]: 2008/08/14_08:31:00 debug: /etc/ha.d/resource.d/drbddisk ticketsystem stop done. RC=0 heartbeat[14462]: 2008/08/14_08:31:00 info: All HA resources relinquished. heartbeat[4593]: 2008/08/14_08:31:01 WARN: 1 lost packet(s) for [devmaster-1.blabla.com] [67382:67384] heartbeat[4593]: 2008/08/14_08:31:01 info: No pkts missing from devmaster-1.blabla.com! heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBWRITE process 4598 with signal 15 heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBREAD process 4599 with signal 15 heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBFIFO process 4595 with signal 15 heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBWRITE process 4596 with signal 15 heartbeat[4593]: 2008/08/14_08:31:02 info: killing HBREAD process 4597 with signal 15 heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4597 exited. 5 remaining heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4595 exited. 4 remaining heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4596 exited. 3 remaining heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4598 exited. 2 remaining heartbeat[4593]: 2008/08/14_08:31:02 info: Core process 4599 exited. 1 remaining heartbeat[4593]: 2008/08/14_08:31:02 info: devmaster-2.blabla.com Heartbeat shutdown complete. heartbeat[14601]: 2008/08/14_08:31:23 info: Version 2 support: false heartbeat[14601]: 2008/08/14_08:31:23 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[14601]: 2008/08/14_08:31:23 info: ************************** heartbeat[14601]: 2008/08/14_08:31:23 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[14602]: 2008/08/14_08:31:23 info: heartbeat: version 2.1.3 heartbeat[14602]: 2008/08/14_08:31:23 info: Heartbeat generation: 1217252802 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound send socket to device: eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound receive socket to device: eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: started on port 694 interface eth0 to 192.168.60.101 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound send socket to device: eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: bound receive socket to device: eth0 heartbeat[14602]: 2008/08/14_08:31:23 info: glib: ucast: started on port 694 interface eth0 to 192.168.60.102 heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[14602]: 2008/08/14_08:31:23 notice: Using watchdog device: /dev/watchdog heartbeat[14602]: 2008/08/14_08:31:23 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[14602]: 2008/08/14_08:31:23 info: Local status now set to: 'up' heartbeat[14602]: 2008/08/14_08:31:24 info: Link devmaster-1.blabla.com:eth0 up. heartbeat[14602]: 2008/08/14_08:31:24 info: Status update for node devmaster-1.blabla.com: status active heartbeat[14610]: 2008/08/14_08:31:24 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[14610]: 2008/08/14_08:31:24 info: Running /etc/ha.d/rc.d/status status heartbeat[14602]: 2008/08/14_08:32:01 info: Received shutdown notice from 'devmaster-1.blabla.com'. heartbeat[14602]: 2008/08/14_08:32:01 info: Resources being acquired from devmaster-1.blabla.com. heartbeat[14602]: 2008/08/14_08:32:01 debug: StartNextRemoteRscReq(): child count 1 heartbeat[14632]: 2008/08/14_08:32:01 info: acquire all HA resources (standby). heartbeat[14633]: 2008/08/14_08:32:01 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys devmaster-2.blabla.com] to acquire. heartbeat[14602]: 2008/08/14_08:32:01 debug: StartNextRemoteRscReq(): child count 1 ResourceManager[14658]: 2008/08/14_08:32:01 info: Acquiring resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem ResourceManager[14658]: 2008/08/14_08:32:01 info: Running /etc/ha.d/resource.d/drbddisk ticketsystem start ResourceManager[14658]: 2008/08/14_08:32:01 debug: Starting /etc/ha.d/resource.d/drbddisk ticketsystem start ResourceManager[14658]: 2008/08/14_08:32:01 debug: /etc/ha.d/resource.d/drbddisk ticketsystem start done. RC=0 ResourceManager[14658]: 2008/08/14_08:32:01 info: Running /etc/ha.d/resource.d/xen ticketsystem start ResourceManager[14658]: 2008/08/14_08:32:01 debug: Starting /etc/ha.d/resource.d/xen ticketsystem start Using config file "/etc/xen/ticketsystem". Started domain ticketsystem ResourceManager[14658]: 2008/08/14_08:32:02 debug: /etc/ha.d/resource.d/xen ticketsystem start done. RC=0 heartbeat[14632]: 2008/08/14_08:32:02 info: all HA resource acquisition completed (standby). heartbeat[14963]: 2008/08/14_08:32:02 debug: notify_world: setting SIGCHLD Handler to SIG_DFL harc[14963]: 2008/08/14_08:32:02 info: Running /etc/ha.d/rc.d/status status mach_down[14979]: 2008/08/14_08:32:02 info: Taking over resource group drbddisk::ticketsystem ResourceManager[15005]: 2008/08/14_08:32:02 info: Acquiring resource group: devmaster-1.blabla.com drbddisk::ticketsystem xen::ticketsystem mach_down[14979]: 2008/08/14_08:32:02 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[14979]: 2008/08/14_08:32:02 info: mach_down takeover complete for node devmaster-1.blabla.com. heartbeat[14602]: 2008/08/14_08:32:12 WARN: node devmaster-1.blabla.com: is dead heartbeat[14602]: 2008/08/14_08:32:12 info: Cancelling pending standby operation heartbeat[14602]: 2008/08/14_08:32:12 info: Dead node devmaster-1.blabla.com gave up resources. heartbeat[14602]: 2008/08/14_08:32:12 info: Link devmaster-1.blabla.com:eth0 dead. heartbeat[14602]: 2008/08/14_08:32:12 info: Comm_now_up(): updating status to active heartbeat[14602]: 2008/08/14_08:32:12 info: Local status now set to: 'active' thx for your help Rupertt