[Fwd: [DRBD-user] heartbeat and drbd / Failover / Failback]

Rois Cannon rois at cobiz.com
Tue Dec 4 23:05:10 CET 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I attached the logs for each.  node2 is svr92_syslog.  Failure of node1
shows up first at 13:22:07 when the serial shows dead.

Looks like I'm getting a drbd0 error at 13:22:13
-----------------------------------------------------------------
Dec  4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken,
returned 255
-----------------------------------------------------------------
and then a bunch of refused to become primary stuff.

Anyway, have a look and tell me what you think.

It's greatly appreciated.
Rois


On Tue, 2007-12-04 at 17:27 +0100, Florian Haas wrote:
> Rois,
> 
> can you provide a syslog snippet to include any drbd/dopd message from node2, 
> around the time you pulled the plug on node1?
> 
> Thanks.
> 
> Cheers,
> Florian
> 
> On Tuesday 04 December 2007 01:55:03 Rois Cannon wrote:
> > Florian,
> > I'm sure I'm just missing something.  Probably a timing thing.  I added
> > the lines to drbd.conf and ha.cf per the instructions on your
> > blog (see below for for full file.)  Brought up the system and made sure
> > it was correctly primary on node1 and secondary on node2.  On node1, if
> > I do a "halt" on the machine or restart heartbeat it correctly brings up
> > node2 as primary.  If I pull the plug on node1, then node2 is being set
> > to outdated so heartbeat can't bring it up.  Can you tell me what I'm
> > missing?  Just FYI (in case it makes a difference) I'm running this in 2
> > VMServer's as a test bed.
> >
> > [...]
> 
-------------- next part --------------
Dec  4 13:20:30 svr91 kernel: drbd: initialised. Version: 8.0.6 (api:86/proto:86)
Dec  4 13:20:30 svr91 kernel: drbd: SVN Revision: 3048 build by phil at mescal, 2007-09-03 10:39:27
Dec  4 13:20:30 svr91 kernel: drbd: registered as block device major 147
Dec  4 13:20:30 svr91 kernel: drbd: minor_table @ 0xc6b5a740
Dec  4 13:20:30 svr91 kernel: drbd0: disk( Diskless -> Attaching ) 
Dec  4 13:20:30 svr91 kernel: klogd 1.4.2, ---------- state change ---------- 
Dec  4 13:20:30 svr91 kernel: drbd0: Found 6 transactions (19 active extents) in activity log.
Dec  4 13:20:30 svr91 kernel: drbd0: max_segment_size ( = BIO size ) = 32768
Dec  4 13:20:30 svr91 kernel: drbd0: drbd_bm_resize called with capacity == 262064
Dec  4 13:20:30 svr91 kernel: drbd0: resync bitmap: bits=32758 words=1024
Dec  4 13:20:30 svr91 kernel: drbd0: size = 127 MB (131032 KB)
Dec  4 13:20:30 svr91 kernel: drbd0: reading of bitmap took 0 jiffies
Dec  4 13:20:30 svr91 kernel: drbd0: recounting of set bits took additional 0 jiffies
Dec  4 13:20:30 svr91 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
Dec  4 13:20:30 svr91 kernel: drbd0: disk( Attaching -> Consistent ) 
Dec  4 13:20:30 svr91 kernel: drbd0: Writing meta data super block now.
Dec  4 13:20:30 svr91 kernel: drbd0: conn( StandAlone -> Unconnected ) 
Dec  4 13:20:30 svr91 kernel: drbd0: receiver (re)started
Dec  4 13:20:30 svr91 kernel: drbd0: conn( Unconnected -> WFConnection ) 
Dec  4 13:20:30 svr91 kernel: drbd0: conn( WFConnection -> WFReportParams ) 
Dec  4 13:20:30 svr91 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86
Dec  4 13:20:30 svr91 kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Dec  4 13:20:30 svr91 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) 
Dec  4 13:20:30 svr91 kernel: drbd0: Writing meta data super block now.
Dec  4 13:20:37 svr91 logd: [5943]: info: logd started with default configuration.
Dec  4 13:20:37 svr91 logd: [5944]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Dec  4 13:20:37 svr91 logd: [5943]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Dec  4 13:20:37 svr91 heartbeat: [6018]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Dec  4 13:20:37 svr91 heartbeat: [6018]: info: **************************
Dec  4 13:20:37 svr91 heartbeat: [6018]: info: Configuration validated. Starting heartbeat 2.0.8
Dec  4 13:20:37 svr91 heartbeat: [6019]: info: heartbeat: version 2.0.8
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: Heartbeat generation: 198
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_TriggerHandler: Added signal manual handler
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_TriggerHandler: Added signal manual handler
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound send socket to device: eth0
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound receive socket to device: eth0
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: started on port 694 interface eth0 to 192.168.151.91
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound send socket to device: eth1
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: bound receive socket to device: eth1
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ucast: started on port 694 interface eth1 to 192.168.1.91
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ping heartbeat started.
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: glib: ping heartbeat started.
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Dec  4 13:20:38 svr91 heartbeat: [6019]: info: Local status now set to: 'up'
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:eth1 up.
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node svr92: status up
Dec  4 13:20:39 svr91 heartbeat: [6035]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:20:39 svr91 harc[6035]: info: Running /etc/ha.d/rc.d/status status
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:/dev/ttyS0 up.
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Link 192.168.151.1:192.168.151.1 up.
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node 192.168.151.1: status ping
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Status update for node 192.168.1.3: status ping
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Link 192.168.1.3:192.168.1.3 up.
Dec  4 13:20:39 svr91 heartbeat: [6019]: debug: get_delnodelist: delnodelist= 
Dec  4 13:20:39 svr91 heartbeat: [6019]: info: Link svr92:eth0 up.
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: Comm_now_up(): updating status to active
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: Local status now set to: 'active'
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501)
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: Starting child client "/usr/lib/heartbeat/dopd  " (501,501)
Dec  4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 680 ms (> 50 ms) (GSource: 0x8108290)
Dec  4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child was delayed 680 ms (> 500 ms) before being called (GSource: 0x81083c0)
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: G_CH_dispatch_int: started at 103955 should have started at 103887
Dec  4 13:20:40 svr91 heartbeat: [6019]: WARN: G_CH_dispatch_int: Dispatch function for read child was delayed 680 ms (> 500 ms) before being called (GSource: 0x81084f0)
Dec  4 13:20:40 svr91 heartbeat: [6019]: info: G_CH_dispatch_int: started at 103955 should have started at 103887
Dec  4 13:20:40 svr91 heartbeat: [6046]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501  gid 501 (pid 6046)
Dec  4 13:20:40 svr91 ipfail: [6046]: debug: PID=6046
Dec  4 13:20:40 svr91 heartbeat: [6047]: info: Starting "/usr/lib/heartbeat/dopd  " as uid 501  gid 501 (pid 6047)
Dec  4 13:20:40 svr91 ipfail: [6046]: debug: Signing in with heartbeat
Dec  4 13:20:41 svr91 heartbeat: [6019]: info: Status update for node svr92: status active
Dec  4 13:20:41 svr91 heartbeat: [6048]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:20:41 svr91 ipfail: [6046]: debug: [We are svr91]
Dec  4 13:20:41 svr91 harc[6048]: info: Running /etc/ha.d/rc.d/status status
Dec  4 13:20:41 svr91 ipfail: [6046]: debug: auto_failback -> 0 (off)
Dec  4 13:20:41 svr91 ipfail: [6046]: debug: Setting message filter mode
Dec  4 13:20:42 svr91 ipfail: [6046]: debug: Starting node walk
Dec  4 13:20:43 svr91 ipfail: [6046]: debug: Cluster node: 192.168.1.3: status: ping
Dec  4 13:20:43 svr91 ipfail: [6046]: debug: Cluster node: 192.168.151.1: status: ping
Dec  4 13:20:44 svr91 ipfail: [6046]: debug: Cluster node: svr92: status: active
Dec  4 13:20:45 svr91 ipfail: [6046]: debug: [They are svr92]
Dec  4 13:20:45 svr91 ipfail: [6046]: debug: Cluster node: svr91: status: active
Dec  4 13:20:46 svr91 ipfail: [6046]: debug: Setting message signal
Dec  4 13:20:46 svr91 ipfail: [6046]: debug: Waiting for messages...
Dec  4 13:20:47 svr91 ipfail: [6046]: info: Status update: Node svr92 now has status active
Dec  4 13:20:48 svr91 ipfail: [6046]: debug: Other side is unstable.
Dec  4 13:20:48 svr91 ipfail: [6046]: debug: Got join message from another ipfail client. (svr92)
Dec  4 13:20:49 svr91 ipfail: [6046]: debug: Found ping node 192.168.1.3!
Dec  4 13:20:50 svr91 ipfail: [6046]: debug: Found ping node 192.168.151.1!
Dec  4 13:20:51 svr91 heartbeat: [6019]: info: remote resource transition completed.
Dec  4 13:20:51 svr91 heartbeat: [6019]: info: remote resource transition completed.
Dec  4 13:20:51 svr91 heartbeat: [6019]: info: Initial resource acquisition complete (T_RESOURCES(us))
Dec  4 13:20:51 svr91 ipfail: [6046]: info: Asking other side for ping node count.
Dec  4 13:20:51 svr91 ipfail: [6046]: debug: Message [num_ping] sent.
Dec  4 13:20:51 svr91 ipfail: [6046]: debug: Other side is now stable.
Dec  4 13:20:51 svr91 IPaddr[6085]: INFO:  Resource is stopped
Dec  4 13:20:51 svr91 heartbeat: [6058]: info: Local Resource acquisition completed.
Dec  4 13:20:51 svr91 heartbeat: [6019]: debug: StartNextRemoteRscReq(): child count 1
Dec  4 13:20:51 svr91 heartbeat: [6122]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:20:51 svr91 harc[6122]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec  4 13:20:51 svr91 ip-request-resp[6122]: received ip-request-resp IPaddr::192.168.151.90/24/eth0 OK yes
Dec  4 13:20:51 svr91 ResourceManager[6137]: info: Acquiring resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs
Dec  4 13:20:51 svr91 IPaddr[6161]: INFO:  Resource is stopped
Dec  4 13:20:51 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start
Dec  4 13:20:51 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start
Dec  4 13:20:51 svr91 IPaddr[6237]: INFO: Using calculated netmask for 192.168.151.90: 255.255.255.0
Dec  4 13:20:51 svr91 IPaddr[6237]: DEBUG: Using calculated broadcast for 192.168.151.90: 192.168.151.255
Dec  4 13:20:51 svr91 IPaddr[6237]: INFO: eval /sbin/ifconfig eth0:0 192.168.151.90 netmask 255.255.255.0 broadcast 192.168.151.255
Dec  4 13:20:51 svr91 IPaddr[6237]: DEBUG: Sending Gratuitous Arp for 192.168.151.90 on eth0:0 [eth0]
Dec  4 13:20:51 svr91 IPaddr[6216]: INFO:  Success
Dec  4 13:20:51 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start done. RC=0
Dec  4 13:20:51 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/drbddisk home start
Dec  4 13:20:51 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/drbddisk home start
Dec  4 13:20:51 svr91 kernel: drbd0: role( Secondary -> Primary ) 
Dec  4 13:20:51 svr91 kernel: drbd0: Writing meta data super block now.
Dec  4 13:20:52 svr91 ipfail: [6046]: debug: Other side is now stable.
Dec  4 13:20:52 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/drbddisk home start done. RC=0
Dec  4 13:20:52 svr91 Filesystem[6367]: INFO:  Resource is stopped
Dec  4 13:20:52 svr91 ResourceManager[6137]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start
Dec  4 13:20:52 svr91 ResourceManager[6137]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start
Dec  4 13:20:52 svr91 Filesystem[6408]: INFO: Running start for /dev/drbd0 on /home
Dec  4 13:20:53 svr91 kernel: Filesystem "drbd0": Disabling barriers, not supported by the underlying device
Dec  4 13:20:53 svr91 kernel: XFS mounting filesystem drbd0
Dec  4 13:20:53 svr91 kernel: Ending clean XFS mount for filesystem: drbd0
Dec  4 13:20:53 svr91 Filesystem[6405]: INFO:  Success
Dec  4 13:20:53 svr91 ResourceManager[6137]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs start done. RC=0
Dec  4 13:20:54 svr91 ipfail: [6046]: info: No giveup timer to abort.
Dec  4 13:21:41 svr91 ntpd[3621]: synchronized to LOCAL(0), stratum 10
-------------- next part --------------
Dec  4 13:20:26 svr92 kernel: drbd: initialised. Version: 8.0.6 (api:86/proto:86)
Dec  4 13:20:26 svr92 kernel: drbd: SVN Revision: 3048 build by phil at mescal, 2007-09-03 10:39:27
Dec  4 13:20:26 svr92 kernel: drbd: registered as block device major 147
Dec  4 13:20:26 svr92 kernel: drbd: minor_table @ 0xc68fc440
Dec  4 13:20:26 svr92 kernel: drbd0: disk( Diskless -> Attaching ) 
Dec  4 13:20:26 svr92 kernel: klogd 1.4.2, ---------- state change ---------- 
Dec  4 13:20:26 svr92 kernel: drbd0: Found 2 transactions (2 active extents) in activity log.
Dec  4 13:20:26 svr92 kernel: drbd0: max_segment_size ( = BIO size ) = 32768
Dec  4 13:20:26 svr92 kernel: drbd0: drbd_bm_resize called with capacity == 262064
Dec  4 13:20:26 svr92 kernel: drbd0: resync bitmap: bits=32758 words=1024
Dec  4 13:20:26 svr92 kernel: drbd0: size = 127 MB (131032 KB)
Dec  4 13:20:26 svr92 kernel: drbd0: reading of bitmap took 0 jiffies
Dec  4 13:20:26 svr92 kernel: drbd0: recounting of set bits took additional 0 jiffies
Dec  4 13:20:26 svr92 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
Dec  4 13:20:26 svr92 kernel: drbd0: disk( Attaching -> Consistent ) 
Dec  4 13:20:26 svr92 kernel: drbd0: Writing meta data super block now.
Dec  4 13:20:26 svr92 kernel: drbd0: conn( StandAlone -> Unconnected ) 
Dec  4 13:20:26 svr92 kernel: drbd0: receiver (re)started
Dec  4 13:20:26 svr92 kernel: drbd0: conn( Unconnected -> WFConnection ) 
Dec  4 13:20:30 svr92 kernel: drbd0: conn( WFConnection -> WFReportParams ) 
Dec  4 13:20:30 svr92 kernel: drbd0: Handshake successful: DRBD Network Protocol version 86
Dec  4 13:20:30 svr92 kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Dec  4 13:20:30 svr92 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> UpToDate ) 
Dec  4 13:20:30 svr92 kernel: drbd0: Writing meta data super block now.
Dec  4 13:20:34 svr92 logd: [4846]: info: logd started with default configuration.
Dec  4 13:20:34 svr92 logd: [4847]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Dec  4 13:20:34 svr92 logd: [4846]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Dec  4 13:20:34 svr92 heartbeat: [4921]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Dec  4 13:20:34 svr92 heartbeat: [4921]: info: **************************
Dec  4 13:20:34 svr92 heartbeat: [4921]: info: Configuration validated. Starting heartbeat 2.0.8
Dec  4 13:20:34 svr92 heartbeat: [4922]: info: heartbeat: version 2.0.8
Dec  4 13:20:34 svr92 heartbeat: [4922]: info: Heartbeat generation: 115
Dec  4 13:20:34 svr92 heartbeat: [4922]: info: G_main_add_TriggerHandler: Added signal manual handler
Dec  4 13:20:34 svr92 heartbeat: [4922]: info: G_main_add_TriggerHandler: Added signal manual handler
Dec  4 13:20:34 svr92 heartbeat: [4922]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound send socket to device: eth0
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound receive socket to device: eth0
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: started on port 694 interface eth0 to 192.168.151.91
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound send socket to device: eth1
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: bound receive socket to device: eth1
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ucast: started on port 694 interface eth1 to 192.168.1.91
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ping heartbeat started.
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: glib: ping heartbeat started.
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Dec  4 13:20:35 svr92 heartbeat: [4922]: info: Local status now set to: 'up'
Dec  4 13:20:36 svr92 heartbeat: [4922]: info: Status update for node 192.168.1.3: status ping
Dec  4 13:20:36 svr92 heartbeat: [4922]: info: Link 192.168.1.3:192.168.1.3 up.
Dec  4 13:20:36 svr92 heartbeat: [4922]: info: Link 192.168.151.1:192.168.151.1 up.
Dec  4 13:20:36 svr92 heartbeat: [4922]: info: Status update for node 192.168.151.1: status ping
Dec  4 13:20:39 svr92 heartbeat: [4922]: info: Link svr91:/dev/ttyS0 up.
Dec  4 13:20:39 svr92 heartbeat: [4922]: info: Status update for node svr91: status up
Dec  4 13:20:39 svr92 heartbeat: [4938]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:20:39 svr92 harc[4938]: info: Running /etc/ha.d/rc.d/status status
Dec  4 13:20:39 svr92 heartbeat: [4922]: debug: get_delnodelist: delnodelist= 
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Comm_now_up(): updating status to active
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Local status now set to: 'active'
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501)
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Starting child client "/usr/lib/heartbeat/dopd  " (501,501)
Dec  4 13:20:40 svr92 heartbeat: [4922]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 610 ms (> 50 ms) (GSource: 0x8108178)
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Status update for node svr91: status active
Dec  4 13:20:40 svr92 heartbeat: [4922]: WARN: Gmain_timeout_dispatch: Dispatch function for hb_pop_deadtime was delayed 610 ms (> 500 ms) before being called (GSource: 0x8106e98)
Dec  4 13:20:40 svr92 heartbeat: [4922]: info: Gmain_timeout_dispatch: started at 103262 should have started at 103201
Dec  4 13:20:40 svr92 heartbeat: [4949]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501  gid 501 (pid 4949)
Dec  4 13:20:40 svr92 heartbeat: [4950]: info: Starting "/usr/lib/heartbeat/dopd  " as uid 501  gid 501 (pid 4950)
Dec  4 13:20:40 svr92 heartbeat: [4951]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:20:40 svr92 ipfail: [4949]: debug: PID=4949
Dec  4 13:20:40 svr92 ipfail: [4949]: debug: Signing in with heartbeat
Dec  4 13:20:40 svr92 ipfail: [4949]: debug: [We are svr92]
Dec  4 13:20:40 svr92 harc[4951]: info: Running /etc/ha.d/rc.d/status status
Dec  4 13:20:40 svr92 ipfail: [4949]: debug: auto_failback -> 0 (off)
Dec  4 13:20:41 svr92 ipfail: [4949]: debug: Setting message filter mode
Dec  4 13:20:41 svr92 ipfail: [4949]: debug: Starting node walk
Dec  4 13:20:42 svr92 ipfail: [4949]: debug: Cluster node: 192.168.1.3: status: ping
Dec  4 13:20:43 svr92 ipfail: [4949]: debug: Cluster node: 192.168.151.1: status: ping
Dec  4 13:20:43 svr92 ipfail: [4949]: debug: Cluster node: svr92: status: active
Dec  4 13:20:44 svr92 ipfail: [4949]: debug: Cluster node: svr91: status: active
Dec  4 13:20:44 svr92 ipfail: [4949]: debug: [They are svr91]
Dec  4 13:20:44 svr92 ipfail: [4949]: debug: Setting message signal
Dec  4 13:20:45 svr92 ipfail: [4949]: debug: Waiting for messages...
Dec  4 13:20:46 svr92 ipfail: [4949]: debug: Other side is unstable.
Dec  4 13:20:51 svr92 heartbeat: [4922]: info: local resource transition completed.
Dec  4 13:20:51 svr92 heartbeat: [4922]: info: Initial resource acquisition complete (T_RESOURCES(us))
Dec  4 13:20:51 svr92 heartbeat: [4961]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys svr92] to acquire.
Dec  4 13:20:51 svr92 ipfail: [4949]: debug: Other side is unstable.
Dec  4 13:20:51 svr92 heartbeat: [4922]: info: remote resource transition completed.
Dec  4 13:20:51 svr92 ipfail: [4949]: debug: Other side is now stable.
Dec  4 13:20:51 svr92 ipfail: [4949]: debug: Got asked for num_ping.
Dec  4 13:20:51 svr92 kernel: drbd0: peer( Secondary -> Primary ) 
Dec  4 13:20:52 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3!
Dec  4 13:20:53 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1!
Dec  4 13:20:53 svr92 ipfail: [4949]: info: Ping node count is balanced.
Dec  4 13:20:53 svr92 ipfail: [4949]: debug: Abort message sent.
Dec  4 13:22:07 svr92 heartbeat: [4922]: info: Link svr91:/dev/ttyS0 dead.
Dec  4 13:22:07 svr92 ipfail: [4949]: info: Link Status update: Link svr91//dev/ttyS0 now has status dead
Dec  4 13:22:07 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3!
Dec  4 13:22:07 svr92 kernel: drbd0: PingAck did not arrive in time.
Dec  4 13:22:07 svr92 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Dec  4 13:22:07 svr92 kernel: drbd0: asender terminated
Dec  4 13:22:07 svr92 kernel: drbd0: short read expecting header on sock: r=-512
Dec  4 13:22:07 svr92 kernel: drbd0: tl_clear()
Dec  4 13:22:07 svr92 kernel: drbd0: Connection closed
Dec  4 13:22:07 svr92 kernel: drbd0: Writing meta data super block now.
Dec  4 13:22:07 svr92 kernel: drbd0: conn( NetworkFailure -> Unconnected ) 
Dec  4 13:22:07 svr92 kernel: drbd0: receiver terminated
Dec  4 13:22:07 svr92 kernel: drbd0: receiver (re)started
Dec  4 13:22:07 svr92 kernel: drbd0: conn( Unconnected -> WFConnection ) 
Dec  4 13:22:08 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1!
Dec  4 13:22:09 svr92 ipfail: [4949]: info: Asking other side for ping node count.
Dec  4 13:22:09 svr92 ipfail: [4949]: debug: Message [num_ping] sent.
Dec  4 13:22:09 svr92 ipfail: [4949]: info: Checking remote count of ping nodes.
Dec  4 13:22:13 svr92 heartbeat: [4922]: WARN: node svr91: is dead
Dec  4 13:22:13 svr92 heartbeat: [4922]: WARN: No STONITH device configured.
Dec  4 13:22:13 svr92 heartbeat: [4922]: WARN: Shared disks are not protected.
Dec  4 13:22:13 svr92 heartbeat: [4922]: info: Resources being acquired from svr91.
Dec  4 13:22:13 svr92 ipfail: [4949]: info: Status update: Node svr91 now has status dead
Dec  4 13:22:13 svr92 heartbeat: [5010]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Dec  4 13:22:13 svr92 harc[5010]: info: Running /etc/ha.d/rc.d/status status
Dec  4 13:22:13 svr92 heartbeat: [5011]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys svr92] to acquire.
Dec  4 13:22:13 svr92 heartbeat: [4922]: debug: StartNextRemoteRscReq(): child count 1
Dec  4 13:22:13 svr92 ipfail: [4949]: debug: Found ping node 192.168.1.3!
Dec  4 13:22:13 svr92 mach_down[5030]: info: Taking over resource group IPaddr::192.168.151.90/24/eth0
Dec  4 13:22:13 svr92 ResourceManager[5050]: info: Acquiring resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs
Dec  4 13:22:13 svr92 IPaddr[5074]: INFO:  Resource is stopped
Dec  4 13:22:13 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start
Dec  4 13:22:13 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start
Dec  4 13:22:13 svr92 IPaddr[5150]: INFO: Using calculated netmask for 192.168.151.90: 255.255.255.0
Dec  4 13:22:13 svr92 IPaddr[5150]: DEBUG: Using calculated broadcast for 192.168.151.90: 192.168.151.255
Dec  4 13:22:13 svr92 IPaddr[5150]: INFO: eval /sbin/ifconfig eth0:0 192.168.151.90 netmask 255.255.255.0 broadcast 192.168.151.255
Dec  4 13:22:13 svr92 IPaddr[5150]: DEBUG: Sending Gratuitous Arp for 192.168.151.90 on eth0:0 [eth0]
Dec  4 13:22:13 svr92 IPaddr[5129]: INFO:  Success
Dec  4 13:22:13 svr92 ipfail: [4949]: debug: Found ping node 192.168.151.1!
Dec  4 13:22:13 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 start done. RC=0
Dec  4 13:22:13 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/drbddisk home start
Dec  4 13:22:13 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/drbddisk home start
Dec  4 13:22:13 svr92 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer
Dec  4 13:22:13 svr92 kernel: drbd0: disk( UpToDate -> Outdated ) 
Dec  4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken, returned 255 
Dec  4 13:22:13 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:13 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:13 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:14 svr92 ipfail: [4949]: info: NS: We are still alive!
Dec  4 13:22:14 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:14 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:14 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:15 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:15 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:15 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:16 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:16 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:16 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:17 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:17 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:17 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:18 svr92 kernel: drbd0: BUG! md_sync_timer expired! Worker calls drbd_md_sync().
Dec  4 13:22:18 svr92 kernel: drbd0: Writing meta data super block now.
Dec  4 13:22:18 svr92 kernel: drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk
Dec  4 13:22:18 svr92 kernel: drbd0:   state = { cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:18 svr92 kernel: drbd0:  wanted = { cs:WFConnection st:Primary/Unknown ds:Outdated/DUnknown r--- }
Dec  4 13:22:18 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/drbddisk home start done. RC=20
Dec  4 13:22:18 svr92 ResourceManager[5050]: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk
Dec  4 13:22:18 svr92 ResourceManager[5050]: CRIT: Giving up resources due to failure of drbddisk::home
Dec  4 13:22:18 svr92 ResourceManager[5050]: info: Releasing resource group: svr91 IPaddr::192.168.151.90/24/eth0 drbddisk::home Filesystem::/dev/drbd0::/home::xfs
Dec  4 13:22:18 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop
Dec  4 13:22:18 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop
Dec  4 13:22:19 svr92 Filesystem[5329]: INFO: Running stop for /dev/drbd0 on /home
Dec  4 13:22:19 svr92 Filesystem[5326]: INFO:  Success
Dec  4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home xfs stop done. RC=0
Dec  4 13:22:19 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/drbddisk home stop
Dec  4 13:22:19 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/drbddisk home stop
Dec  4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/drbddisk home stop done. RC=0
Dec  4 13:22:19 svr92 ResourceManager[5050]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop
Dec  4 13:22:19 svr92 ResourceManager[5050]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop
Dec  4 13:22:19 svr92 IPaddr[5420]: INFO: /sbin/ifconfig eth0:0 192.168.151.90 down
Dec  4 13:22:19 svr92 IPaddr[5399]: INFO:  Success
Dec  4 13:22:19 svr92 ResourceManager[5050]: debug: /etc/ha.d/resource.d/IPaddr 192.168.151.90/24/eth0 stop done. RC=0
Dec  4 13:22:19 svr92 mach_down[5030]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec  4 13:22:19 svr92 mach_down[5030]: info: mach_down takeover complete for node svr91.
Dec  4 13:22:19 svr92 heartbeat: [4922]: info: mach_down takeover complete.
Dec  4 13:22:49 svr92 hb_standby[5480]: Going standby [foreign].
Dec  4 13:22:49 svr92 heartbeat: [4922]: info: svr92 wants to go standby [foreign]
Dec  4 13:23:01 svr92 heartbeat: [4922]: WARN: No reply to standby request.  Standby request cancelled.
Dec  4 13:23:13 svr92 ntpd[3734]: time reset -0.238043 s


More information about the drbd-user mailing list