Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I attached the logs for each. node2 is svr92_syslog. Failure of node1 shows up first at 13:22:07 when the serial shows dead. Looks like I'm getting a drbd0 error at 13:22:13 ----------------------------------------------------------------- Dec 4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken, returned 255 ----------------------------------------------------------------- and then a bunch of refused to become primary stuff. Anyway, have a look and tell me what you think. It's greatly appreciated. Rois On Tue, 2007-12-04 at 17:27 +0100, Florian Haas wrote: > Rois, > > can you provide a syslog snippet to include any drbd/dopd message from node2, > around the time you pulled the plug on node1? > > Thanks. > > Cheers, > Florian > > On Tuesday 04 December 2007 01:55:03 Rois Cannon wrote: > > Florian, > > I'm sure I'm just missing something. Probably a timing thing. I added > > the lines to drbd.conf and ha.cf per the instructions on your > > blog (see below for for full file.) Brought up the system and made sure > > it was correctly primary on node1 and secondary on node2. On node1, if > > I do a "halt" on the machine or restart heartbeat it correctly brings up > > node2 as primary. If I pull the plug on node1, then node2 is being set > > to outdated so heartbeat can't bring it up. Can you tell me what I'm > > missing? Just FYI (in case it makes a difference) I'm running this in 2 > > VMServer's as a test bed. > > > > [...] > -------------- next part -------------- Dec 4 13:20:30 svr91 kernel: drbd: initialised. Version: 8.0.6 (api:86/proto:86) Dec 4 13:20:30 svr91 kernel: drbd: SVN Revision: 3048 build by phil at mescal, 2007-09-03 10:39:27 Dec 4 13:20:30 svr91 kernel: drbd: registered as block device major 147 Dec 4 13:20:30 svr91 kernel: drbd: minor_table @ 0xc6b5a740 Dec 4 13:20:30 svr91 kernel: drbd0: disk( Diskless -> Attaching ) Dec 4 13:20:30 svr91 kernel: klogd 1.4.2, ---------- state change ---------- Dec 4 13:20:30 svr91 kernel: drbd0: Found 6 transactions (19 active extents) in activity log. Dec 4 13:20:30 svr91 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 Dec 4 13:20:30 svr91 kernel: drbd0: drbd_bm_resize called with capacity == 262064 Dec 4 13:20:30 svr91 kernel: drbd0: resync bitmap: bits=32758 words=1024 Dec 4 13:20:30 svr91 kernel: drbd0: size = 127 MB (131032 KB) Dec 4 13:20:30 svr91 kernel: drbd0: reading of bitmap took 0 jiffies Dec 4 13:20:30 svr91 kernel: drbd0: recounting of set bits took additional 0 jiffies Dec 4 13:20:30 svr91 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Dec 4 13:20:30 svr91 kernel: drbd0: disk( Attaching -> Consistent ) Dec 4 13:20:30 svr91 kernel: drbd0: Writing meta data super block now. Dec 4 13:20:30 svr91 kernel: drbd0: conn( StandAlone -> Unconnected ) Dec 4 13:20:30 svr91 kernel: drbd0: receiver (re)started Dec 4 13:20:30 svr91 kernel: drbd0: conn( Unconnected -> WFConnection ) Dec 4 13:20:30 svr91 kernel: drbd0: conn( WFConnection -> WFReportParams ) Dec 4 13:20:30 svr91 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86 Dec 4 13:20:30 svr91 kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Dec 4 13:20:30 svr91 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) Dec 4 13:20:30 svr91 kernel: drbd0: Writing meta data super block now. Dec 4 13:20:37 svr91 logd: [5943]: info: logd started with default configuration. Dec 4 13:20:37 svr91 logd: [5944]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 4 13:20:37 svr91 logd: [5943]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 4 13:20:37 svr91 heartbeat: [6018]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Dec 4 13:20:37 svr91 heartbeat: [6018]: info: ************************** Dec 4 13:20:37 svr91 heartbeat: [6018]: info: Configuration validated. Starting heartbeat 2.0.8 Dec 4 13:20:37 svr91 heartbeat: [6019]: info: heartbeat: version 2.0.8 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: Heartbeat generation: 198 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_TriggerHandler: Added signal manual handler Dec 4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_TriggerHandler: Added signal manual handler Dec 4 13:20:38 svr91 heartbeat: [6019]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud) Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound send socket to device: eth0 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound receive socket to device: eth0 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: started on port 694 interface eth0 to 192.168.151.91 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound send socket to device: eth1 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound receive socket to device: eth1 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: started on port 694 interface eth1 to 192.168.1.91 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ping heartbeat started. Dec 4 13:20:38 svr91 heartbeat: [6019]: info: glib: ping heartbeat started. Dec 4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 4 13:20:38 svr91 heartbeat: [6019]: info: Local status now set to: 'up' Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:eth1 up. Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node svr92: status up Dec 4 13:20:39 svr91 heartbeat: [6035]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:20:39 svr91 harc[6035]: info: Running /etc/ha.d/rc.d/status status Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:/dev/ttyS0 up. Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Link 192.168.151.1:192.168.151.1 up. Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node 192.168.151.1: status ping Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node 192.168.1.3: status ping Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Link 192.168.1.3:192.168.1.3 up. Dec 4 13:20:39 svr91 heartbeat: [6019]: debug: get_delnodelist: delnodelist= Dec 4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:eth0 up. Dec 4 13:20:40 svr91 heartbeat: [6019]: info: Comm_now_up(): updating status to active Dec 4 13:20:40 svr91 heartbeat: [6019]: info: Local status now set to: 'active' Dec 4 13:20:40 svr91 heartbeat: [6019]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501) Dec 4 13:20:40 svr91 heartbeat: [6019]: info: Starting child client "/usr/lib/heartbeat/dopd " (501,501) Dec 4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 680 ms (> 50 ms) (GSource: 0x8108290) Dec 4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child was delayed 680 ms (> 500 ms) before being called (GSource: 0x81083c0) Dec 4 13:20:40 svr91 heartbeat: [6019]: info: G_CH_dispatch_int: started at 103955 should have started at 103887 Dec 4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child was delayed 680 ms (> 500 ms) before being called (GSource: 0x81084f0) Dec 4 13:20:40 svr91 heartbeat: [6019]: info: G_CH_dispatch_int: started at 103955 should have started at 103887 Dec 4 13:20:40 svr91 heartbeat: [6046]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501 gid 501 (pid 6046) Dec 4 13:20:40 svr91 ipfail: [6046]: debug: PID=6046 Dec 4 13:20:40 svr91 heartbeat: [6047]: info: Starting "/usr/lib/heartbeat/dopd " as uid 501 gid 501 (pid 6047) Dec 4 13:20:40 svr91 ipfail: [6046]: debug: Signing in with heartbeat Dec 4 13:20:41 svr91 heartbeat: [6019]: info: Status update for node svr92: status active Dec 4 13:20:41 svr91 heartbeat: [6048]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:20:41 svr91 ipfail: [6046]: debug: [We are svr91] Dec 4 13:20:41 svr91 harc[6048]: info: Running /etc/ha.d/rc.d/status status Dec 4 13:20:41 svr91 ipfail: [6046]: debug: auto_failback -> 0 (off) Dec 4 13:20:41 svr91 ipfail: [6046]: debug: Setting message filter mode Dec 4 13:20:42 svr91 ipfail: [6046]: debug: Starting node walk Dec 4 13:20:43 svr91 ipfail: [6046]: debug: Cluster node: 192.168.1.3: status: ping Dec 4 13:20:43 svr91 ipfail: [6046]: debug: Cluster node: 192.168.151.1: status: ping Dec 4 13:20:44 svr91 ipfail: [6046]: debug: Cluster node: svr92: status: active Dec 4 13:20:45 svr91 ipfail: [6046]: debug: [They are svr92] Dec 4 13:20:45 svr91 ipfail: [6046]: debug: Cluster node: svr91: status: active Dec 4 13:20:46 svr91 ipfail: [6046]: debug: Setting message signal Dec 4 13:20:46 svr91 ipfail: [6046]: debug: Waiting for messages... Dec 4 13:20:47 svr91 ipfail: [6046]: info: Status update: Node svr92 now has status active Dec 4 13:20:48 svr91 ipfail: [6046]: debug: Other side is unstable. Dec 4 13:20:48 svr91 ipfail: [6046]: debug: Got join message from another ipfail client. (svr92) Dec 4 13:20:49 svr91 ipfail: [6046]: debug: Found ping node 192.168.1.3! Dec 4 13:20:50 svr91 ipfail: [6046]: debug: Found ping node 192.168.151.1! Dec 4 13:20:51 svr91 heartbeat: [6019]: info: remote resource transition completed. Dec 4 13:20:51 svr91 heartbeat: [6019]: info: remote resource transition completed. Dec 4 13:20:51 svr91 heartbeat: [6019]: info: Initial resource acquisition complete (T_RESOURCES(us)) Dec 4 13:20:51 svr91 ipfail: [6046]: info: Asking other side for ping node count. Dec 4 13:20:51 svr91 ipfail: [6046]: debug: Message [num_ping] sent. Dec 4 13:20:51 svr91 ipfail: [6046]: debug: Other side is now stable. Dec 4 13:20:51 svr91 IPaddr[6085]: INFO: Resource is stopped Dec 4 13:20:51 svr91 heartbeat: [6058]: info: Local Resource acquisition completed. Dec 4 13:20:51 svr91 heartbeat: [6019]: debug: StartNextRemoteRscReq(): child count 1 Dec 4 13:20:51 svr91 heartbeat: [6122]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:20:51 svr91 harc[6122]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Dec 4 13:20:51 svr91 ip-request-resp[6122]: received ip-request-resp IPaddr::192.168.151.90/24/eth0 OK yes Dec 4 13:20:51 svr91 ResourceManager[6137]: info: Acquiring resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs Dec 4 13:20:51 svr91 IPaddr[6161]: INFO: Resource is stopped Dec 4 13:20:51 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start Dec 4 13:20:51 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start Dec 4 13:20:51 svr91 IPaddr[6237]: INFO: Using calculated netmask for 192.168.151.90: 255.255.255.0 Dec 4 13:20:51 svr91 IPaddr[6237]: DEBUG: Using calculated broadcast for 192.168.151.90: 192.168.151.255 Dec 4 13:20:51 svr91 IPaddr[6237]: INFO: eval /sbin/ifconfig eth0:0 192.168.151.90 netmask 255.255.255.0 broadcast 192.168.151.255 Dec 4 13:20:51 svr91 IPaddr[6237]: DEBUG: Sending Gratuitous Arp for 192.168.151.90 on eth0:0 [eth0] Dec 4 13:20:51 svr91 IPaddr[6216]: INFO: Success Dec 4 13:20:51 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start done. RC=0 Dec 4 13:20:51 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/drbddisk home start Dec 4 13:20:51 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/drbddisk home start Dec 4 13:20:51 svr91 kernel: drbd0: role( Secondary -> Primary ) Dec 4 13:20:51 svr91 kernel: drbd0: Writing meta data super block now. Dec 4 13:20:52 svr91 ipfail: [6046]: debug: Other side is now stable. Dec 4 13:20:52 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/drbddisk home start done. RC=0 Dec 4 13:20:52 svr91 Filesystem[6367]: INFO: Resource is stopped Dec 4 13:20:52 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start Dec 4 13:20:52 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start Dec 4 13:20:52 svr91 Filesystem[6408]: INFO: Running start for /dev/drbd0 on /home Dec 4 13:20:53 svr91 kernel: Filesystem "drbd0": Disabling barriers, not supported by the underlying device Dec 4 13:20:53 svr91 kernel: XFS mounting filesystem drbd0 Dec 4 13:20:53 svr91 kernel: Ending clean XFS mount for filesystem: drbd0 Dec 4 13:20:53 svr91 Filesystem[6405]: INFO: Success Dec 4 13:20:53 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start done. RC=0 Dec 4 13:20:54 svr91 ipfail: [6046]: info: No giveup timer to abort. Dec 4 13:21:41 svr91 ntpd[3621]: synchronized to LOCAL(0), stratum 10 -------------- next part -------------- Dec 4 13:20:26 svr92 kernel: drbd: initialised. Version: 8.0.6 (api:86/proto:86) Dec 4 13:20:26 svr92 kernel: drbd: SVN Revision: 3048 build by phil at mescal, 2007-09-03 10:39:27 Dec 4 13:20:26 svr92 kernel: drbd: registered as block device major 147 Dec 4 13:20:26 svr92 kernel: drbd: minor_table @ 0xc68fc440 Dec 4 13:20:26 svr92 kernel: drbd0: disk( Diskless -> Attaching ) Dec 4 13:20:26 svr92 kernel: klogd 1.4.2, ---------- state change ---------- Dec 4 13:20:26 svr92 kernel: drbd0: Found 2 transactions (2 active extents) in activity log. Dec 4 13:20:26 svr92 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 Dec 4 13:20:26 svr92 kernel: drbd0: drbd_bm_resize called with capacity == 262064 Dec 4 13:20:26 svr92 kernel: drbd0: resync bitmap: bits=32758 words=1024 Dec 4 13:20:26 svr92 kernel: drbd0: size = 127 MB (131032 KB) Dec 4 13:20:26 svr92 kernel: drbd0: reading of bitmap took 0 jiffies Dec 4 13:20:26 svr92 kernel: drbd0: recounting of set bits took additional 0 jiffies Dec 4 13:20:26 svr92 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Dec 4 13:20:26 svr92 kernel: drbd0: disk( Attaching -> Consistent ) Dec 4 13:20:26 svr92 kernel: drbd0: Writing meta data super block now. Dec 4 13:20:26 svr92 kernel: drbd0: conn( StandAlone -> Unconnected ) Dec 4 13:20:26 svr92 kernel: drbd0: receiver (re)started Dec 4 13:20:26 svr92 kernel: drbd0: conn( Unconnected -> WFConnection ) Dec 4 13:20:30 svr92 kernel: drbd0: conn( WFConnection -> WFReportParams ) Dec 4 13:20:30 svr92 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86 Dec 4 13:20:30 svr92 kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Dec 4 13:20:30 svr92 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) Dec 4 13:20:30 svr92 kernel: drbd0: Writing meta data super block now. Dec 4 13:20:34 svr92 logd: [4846]: info: logd started with default configuration. Dec 4 13:20:34 svr92 logd: [4847]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 4 13:20:34 svr92 logd: [4846]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Dec 4 13:20:34 svr92 heartbeat: [4921]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Dec 4 13:20:34 svr92 heartbeat: [4921]: info: ************************** Dec 4 13:20:34 svr92 heartbeat: [4921]: info: Configuration validated. Starting heartbeat 2.0.8 Dec 4 13:20:34 svr92 heartbeat: [4922]: info: heartbeat: version 2.0.8 Dec 4 13:20:34 svr92 heartbeat: [4922]: info: Heartbeat generation: 115 Dec 4 13:20:34 svr92 heartbeat: [4922]: info: G_main_add_TriggerHandler: Added signal manual handler Dec 4 13:20:34 svr92 heartbeat: [4922]: info: G_main_add_TriggerHandler: Added signal manual handler Dec 4 13:20:34 svr92 heartbeat: [4922]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud) Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound send socket to device: eth0 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound receive socket to device: eth0 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: started on port 694 interface eth0 to 192.168.151.91 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound send socket to device: eth1 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound receive socket to device: eth1 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: started on port 694 interface eth1 to 192.168.1.91 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ping heartbeat started. Dec 4 13:20:35 svr92 heartbeat: [4922]: info: glib: ping heartbeat started. Dec 4 13:20:35 svr92 heartbeat: [4922]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Dec 4 13:20:35 svr92 heartbeat: [4922]: info: Local status now set to: 'up' Dec 4 13:20:36 svr92 heartbeat: [4922]: info: Status update for node 192.168.1.3: status ping Dec 4 13:20:36 svr92 heartbeat: [4922]: info: Link 192.168.1.3:192.168.1.3 up. Dec 4 13:20:36 svr92 heartbeat: [4922]: info: Link 192.168.151.1:192.168.151.1 up. Dec 4 13:20:36 svr92 heartbeat: [4922]: info: Status update for node 192.168.151.1: status ping Dec 4 13:20:39 svr92 heartbeat: [4922]: info: Link svr91:/dev/ttyS0 up. Dec 4 13:20:39 svr92 heartbeat: [4922]: info: Status update for node svr91: status up Dec 4 13:20:39 svr92 heartbeat: [4938]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:20:39 svr92 harc[4938]: info: Running /etc/ha.d/rc.d/status status Dec 4 13:20:39 svr92 heartbeat: [4922]: debug: get_delnodelist: delnodelist= Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Comm_now_up(): updating status to active Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Local status now set to: 'active' Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501) Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Starting child client "/usr/lib/heartbeat/dopd " (501,501) Dec 4 13:20:40 svr92 heartbeat: [4922]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 610 ms (> 50 ms) (GSource: 0x8108178) Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Status update for node svr91: status active Dec 4 13:20:40 svr92 heartbeat: [4922]: WARN: Gmain_timeout_dispatch: Dispatch function for hb_pop_deadtime was delayed 610 ms (> 500 ms) before being called (GSource: 0x8106e98) Dec 4 13:20:40 svr92 heartbeat: [4922]: info: Gmain_timeout_dispatch: started at 103262 should have started at 103201 Dec 4 13:20:40 svr92 heartbeat: [4949]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501 gid 501 (pid 4949) Dec 4 13:20:40 svr92 heartbeat: [4950]: info: Starting "/usr/lib/heartbeat/dopd " as uid 501 gid 501 (pid 4950) Dec 4 13:20:40 svr92 heartbeat: [4951]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:20:40 svr92 ipfail: [4949]: debug: PID=4949 Dec 4 13:20:40 svr92 ipfail: [4949]: debug: Signing in with heartbeat Dec 4 13:20:40 svr92 ipfail: [4949]: debug: [We are svr92] Dec 4 13:20:40 svr92 harc[4951]: info: Running /etc/ha.d/rc.d/status status Dec 4 13:20:40 svr92 ipfail: [4949]: debug: auto_failback -> 0 (off) Dec 4 13:20:41 svr92 ipfail: [4949]: debug: Setting message filter mode Dec 4 13:20:41 svr92 ipfail: [4949]: debug: Starting node walk Dec 4 13:20:42 svr92 ipfail: [4949]: debug: Cluster node: 192.168.1.3: status: ping Dec 4 13:20:43 svr92 ipfail: [4949]: debug: Cluster node: 192.168.151.1: status: ping Dec 4 13:20:43 svr92 ipfail: [4949]: debug: Cluster node: svr92: status: active Dec 4 13:20:44 svr92 ipfail: [4949]: debug: Cluster node: svr91: status: active Dec 4 13:20:44 svr92 ipfail: [4949]: debug: [They are svr91] Dec 4 13:20:44 svr92 ipfail: [4949]: debug: Setting message signal Dec 4 13:20:45 svr92 ipfail: [4949]: debug: Waiting for messages... Dec 4 13:20:46 svr92 ipfail: [4949]: debug: Other side is unstable. Dec 4 13:20:51 svr92 heartbeat: [4922]: info: local resource transition completed. Dec 4 13:20:51 svr92 heartbeat: [4922]: info: Initial resource acquisition complete (T_RESOURCES(us)) Dec 4 13:20:51 svr92 heartbeat: [4961]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys svr92] to acquire. Dec 4 13:20:51 svr92 ipfail: [4949]: debug: Other side is unstable. Dec 4 13:20:51 svr92 heartbeat: [4922]: info: remote resource transition completed. Dec 4 13:20:51 svr92 ipfail: [4949]: debug: Other side is now stable. Dec 4 13:20:51 svr92 ipfail: [4949]: debug: Got asked for num_ping. Dec 4 13:20:51 svr92 kernel: drbd0: peer( Secondary -> Primary ) Dec 4 13:20:52 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3! Dec 4 13:20:53 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1! Dec 4 13:20:53 svr92 ipfail: [4949]: info: Ping node count is balanced. Dec 4 13:20:53 svr92 ipfail: [4949]: debug: Abort message sent. Dec 4 13:22:07 svr92 heartbeat: [4922]: info: Link svr91:/dev/ttyS0 dead. Dec 4 13:22:07 svr92 ipfail: [4949]: info: Link Status update: Link svr91//dev/ttyS0 now has status dead Dec 4 13:22:07 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3! Dec 4 13:22:07 svr92 kernel: drbd0: PingAck did not arrive in time. Dec 4 13:22:07 svr92 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Dec 4 13:22:07 svr92 kernel: drbd0: asender terminated Dec 4 13:22:07 svr92 kernel: drbd0: short read expecting header on sock: r=-512 Dec 4 13:22:07 svr92 kernel: drbd0: tl_clear() Dec 4 13:22:07 svr92 kernel: drbd0: Connection closed Dec 4 13:22:07 svr92 kernel: drbd0: Writing meta data super block now. Dec 4 13:22:07 svr92 kernel: drbd0: conn( NetworkFailure -> Unconnected ) Dec 4 13:22:07 svr92 kernel: drbd0: receiver terminated Dec 4 13:22:07 svr92 kernel: drbd0: receiver (re)started Dec 4 13:22:07 svr92 kernel: drbd0: conn( Unconnected -> WFConnection ) Dec 4 13:22:08 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1! Dec 4 13:22:09 svr92 ipfail: [4949]: info: Asking other side for ping node count. Dec 4 13:22:09 svr92 ipfail: [4949]: debug: Message [num_ping] sent. Dec 4 13:22:09 svr92 ipfail: [4949]: info: Checking remote count of ping nodes. Dec 4 13:22:13 svr92 heartbeat: [4922]: WARN: node svr91: is dead Dec 4 13:22:13 svr92 heartbeat: [4922]: WARN: No STONITH device configured. Dec 4 13:22:13 svr92 heartbeat: [4922]: WARN: Shared disks are not protected. Dec 4 13:22:13 svr92 heartbeat: [4922]: info: Resources being acquired from svr91. Dec 4 13:22:13 svr92 ipfail: [4949]: info: Status update: Node svr91 now has status dead Dec 4 13:22:13 svr92 heartbeat: [5010]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 4 13:22:13 svr92 harc[5010]: info: Running /etc/ha.d/rc.d/status status Dec 4 13:22:13 svr92 heartbeat: [5011]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys svr92] to acquire. Dec 4 13:22:13 svr92 heartbeat: [4922]: debug: StartNextRemoteRscReq(): child count 1 Dec 4 13:22:13 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3! Dec 4 13:22:13 svr92 mach_down[5030]: info: Taking over resource group IPaddr::192.168.151.90/24/eth0 Dec 4 13:22:13 svr92 ResourceManager[5050]: info: Acquiring resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs Dec 4 13:22:13 svr92 IPaddr[5074]: INFO: Resource is stopped Dec 4 13:22:13 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start Dec 4 13:22:13 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start Dec 4 13:22:13 svr92 IPaddr[5150]: INFO: Using calculated netmask for 192.168.151.90: 255.255.255.0 Dec 4 13:22:13 svr92 IPaddr[5150]: DEBUG: Using calculated broadcast for 192.168.151.90: 192.168.151.255 Dec 4 13:22:13 svr92 IPaddr[5150]: INFO: eval /sbin/ifconfig eth0:0 192.168.151.90 netmask 255.255.255.0 broadcast 192.168.151.255 Dec 4 13:22:13 svr92 IPaddr[5150]: DEBUG: Sending Gratuitous Arp for 192.168.151.90 on eth0:0 [eth0] Dec 4 13:22:13 svr92 IPaddr[5129]: INFO: Success Dec 4 13:22:13 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1! Dec 4 13:22:13 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start done. RC=0 Dec 4 13:22:13 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/drbddisk home start Dec 4 13:22:13 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/drbddisk home start Dec 4 13:22:13 svr92 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer Dec 4 13:22:13 svr92 kernel: drbd0: disk( UpToDate -> Outdated ) Dec 4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken, returned 255 Dec 4 13:22:13 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:13 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:13 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:14 svr92 ipfail: [4949]: info: NS: We are still alive! Dec 4 13:22:14 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:14 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:14 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:15 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:15 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:15 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:16 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:16 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:16 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:17 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:17 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:17 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:18 svr92 kernel: drbd0: BUG! md_sync_timer expired! Worker calls drbd_md_sync(). Dec 4 13:22:18 svr92 kernel: drbd0: Writing meta data super block now. Dec 4 13:22:18 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Dec 4 13:22:18 svr92 kernel: drbd0: state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:18 svr92 kernel: drbd0: wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- } Dec 4 13:22:18 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/drbddisk home start done. RC=20 Dec 4 13:22:18 svr92 ResourceManager[5050]: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk Dec 4 13:22:18 svr92 ResourceManager[5050]: CRIT: Giving up resources due to failure of drbddisk::home Dec 4 13:22:18 svr92 ResourceManager[5050]: info: Releasing resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs Dec 4 13:22:18 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop Dec 4 13:22:18 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop Dec 4 13:22:19 svr92 Filesystem[5329]: INFO: Running stop for /dev/drbd0 on /home Dec 4 13:22:19 svr92 Filesystem[5326]: INFO: Success Dec 4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop done. RC=0 Dec 4 13:22:19 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/drbddisk home stop Dec 4 13:22:19 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/drbddisk home stop Dec 4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/drbddisk home stop done. RC=0 Dec 4 13:22:19 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop Dec 4 13:22:19 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop Dec 4 13:22:19 svr92 IPaddr[5420]: INFO: /sbin/ifconfig eth0:0 192.168.151.90 down Dec 4 13:22:19 svr92 IPaddr[5399]: INFO: Success Dec 4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop done. RC=0 Dec 4 13:22:19 svr92 mach_down[5030]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired Dec 4 13:22:19 svr92 mach_down[5030]: info: mach_down takeover complete for node svr91. Dec 4 13:22:19 svr92 heartbeat: [4922]: info: mach_down takeover complete. Dec 4 13:22:49 svr92 hb_standby[5480]: Going standby [foreign]. Dec 4 13:22:49 svr92 heartbeat: [4922]: info: svr92 wants to go standby [foreign] Dec 4 13:23:01 svr92 heartbeat: [4922]: WARN: No reply to standby request. Standby request cancelled. Dec 4 13:23:13 svr92 ntpd[3734]: time reset -0.238043 s