Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Andreas, Lars, Thanks much for the quick response. I made the changes. Here's the current drbd.conf: global { usage-count yes; } common { protocol C; disk { on-io-error detach; fencing resource-and-stonith; } syncer { rate 33M; al-extents 3389; } net { allow-two-primaries; # Enable this *after* initial testing cram-hmac-alg sha1; shared-secret "a6a0680c40bca2439dbe48343ddddcf4"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } handlers { fence-peer "/usr/lib/drbd/stonith_admin-fence-peer.sh"; } } resource vmsvn { device /dev/drbd0; disk /dev/sdb; meta-disk internal; on xm01 { address 100.0.0.1:7788; } on xm02 { address 100.0.0.2:7788; } } resource srvsvn1 { protocol C; device /dev/drbd1; disk /dev/sdc; meta-disk internal; on xm01 { address 100.0.0.1:7789; } on xm02 { address 100.0.0.2:7789; } } resource srvsvn2 { protocol C; device /dev/drbd2; disk /dev/sdd; meta-disk internal; on xm01 { address 100.0.0.1:7790; } on xm02 { address 100.0.0.2:7790; } } resource vmconfig { protocol C; device /dev/drbd3; meta-disk internal; on xm01 { address 100.0.0.1:7791; disk /dev/vg_xm01/lv_xm01_vmconfig; } on xm02 { address 100.0.0.2:7791; disk /dev/vg_xm02/lv_xm02_vmconfig; } } And here's what happened: - rcnetwork stop on XM01 @ 1:33:00 PM: Mar 1 13:32:59 xm01 ifdown: eth0 device: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Mar 1 13:33:00 xm01 ifdown: eth1 device: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Mar 1 13:33:01 xm01 /usr/sbin/cron[9479]: (root) CMD (/usr/sbin/logwatch --service dmeventd) Mar 1 13:33:01 xm01 ifdown: usb0 name: RNDIS/CDC ETHER Mar 1 13:33:02 xm01 ifdown: vif1.0 Mar 1 13:33:02 xm01 ifdown: No configuration found for vif1.0 Mar 1 13:33:02 xm01 ifdown: Nevertheless the interface will be shut down. - XM01 is back: Mar 1 13:36:35 xm01 kernel: [ 51.170175] drbd: initialized. Version: 8.3.11 (api:88/proto:86-96) Mar 1 13:36:35 xm01 kernel: [ 51.170178] drbd: GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by phil at fat-tyre, 2011-06-29 11:37:11 Mar 1 13:36:35 xm01 kernel: [ 51.170181] drbd: registered as block device major 147 Mar 1 13:36:35 xm01 kernel: [ 51.170184] drbd: minor_table @ 0xffff8807d66c5480 Mar 1 13:36:35 xm01 kernel: [ 51.319210] block drbd0: Starting worker thread (from cqueue [4927]) Mar 1 13:36:35 xm01 kernel: [ 51.319283] block drbd0: disk( Diskless -> Attaching ) Mar 1 13:36:35 xm01 kernel: klogd 1.4.1, ---------- state change ---------- Mar 1 13:36:35 xm01 kernel: [ 51.332408] block drbd0: Found 57 transactions (91 active extents) in activity log. Mar 1 13:36:35 xm01 kernel: [ 51.332411] block drbd0: Method to ensure write ordering: barrier Mar 1 13:36:35 xm01 kernel: [ 51.332414] block drbd0: max BIO size = 131072 Mar 1 13:36:35 xm01 kernel: [ 51.332418] block drbd0: drbd_bm_resize called with capacity == 1172087720 Mar 1 13:36:35 xm01 kernel: [ 51.336592] block drbd0: resync bitmap: bits=146510965 words=2289234 pages=4472 Mar 1 13:36:35 xm01 kernel: [ 51.336598] block drbd0: size = 559 GB (586043860 KB) Mar 1 13:36:35 xm01 kernel: [ 51.534814] block drbd0: bitmap READ of 4472 pages took 50 jiffies Mar 1 13:36:35 xm01 kernel: [ 51.551170] block drbd0: recounting of set bits took additional 4 jiffies Mar 1 13:36:35 xm01 kernel: [ 51.551174] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:35 xm01 kernel: [ 51.551231] block drbd0: Marked additional 224 MB as out-of-sync based on AL. Mar 1 13:36:35 xm01 kernel: [ 51.551274] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:35 xm01 kernel: [ 51.551296] block drbd0: 224 MB (57344 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:35 xm01 kernel: [ 51.551304] block drbd0: disk( Attaching -> Consistent ) Mar 1 13:36:35 xm01 kernel: [ 51.551307] block drbd0: attached to UUIDs EEDF542BD48564B5:0000000000000000:AF298F27A3172092:AF288F27A3172093 Mar 1 13:36:35 xm01 kernel: [ 51.567908] block drbd1: Starting worker thread (from cqueue [4927]) Mar 1 13:36:35 xm01 kernel: [ 51.567981] block drbd1: disk( Diskless -> Attaching ) Mar 1 13:36:35 xm01 kernel: [ 51.581253] block drbd1: Found 57 transactions (57 active extents) in activity log. Mar 1 13:36:35 xm01 kernel: [ 51.581257] block drbd1: Method to ensure write ordering: barrier Mar 1 13:36:35 xm01 kernel: [ 51.581260] block drbd1: max BIO size = 131072 Mar 1 13:36:35 xm01 kernel: [ 51.581265] block drbd1: drbd_bm_resize called with capacity == 1172087720 Mar 1 13:36:35 xm01 kernel: [ 51.585510] block drbd1: resync bitmap: bits=146510965 words=2289234 pages=4472 Mar 1 13:36:35 xm01 kernel: [ 51.585525] block drbd1: size = 559 GB (586043860 KB) Mar 1 13:36:36 xm01 kernel: [ 51.778368] block drbd1: bitmap READ of 4472 pages took 48 jiffies Mar 1 13:36:36 xm01 kernel: [ 51.794740] block drbd1: recounting of set bits took additional 4 jiffies Mar 1 13:36:36 xm01 kernel: [ 51.794744] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 51.794797] block drbd1: Marked additional 120 MB as out-of-sync based on AL. Mar 1 13:36:36 xm01 kernel: [ 51.794838] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:36 xm01 kernel: [ 51.794860] block drbd1: 120 MB (30720 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 51.794867] block drbd1: disk( Attaching -> Consistent ) Mar 1 13:36:36 xm01 kernel: [ 51.794871] block drbd1: attached to UUIDs E6E23470FD3656AD:0000000000000000:65C464E576893480:65C364E576893481 Mar 1 13:36:36 xm01 kernel: [ 51.811431] block drbd2: Starting worker thread (from cqueue [4927]) Mar 1 13:36:36 xm01 kernel: [ 51.811511] block drbd2: disk( Diskless -> Attaching ) Mar 1 13:36:36 xm01 kernel: [ 51.825901] block drbd2: Found 57 transactions (57 active extents) in activity log. Mar 1 13:36:36 xm01 kernel: [ 51.825905] block drbd2: Method to ensure write ordering: barrier Mar 1 13:36:36 xm01 kernel: [ 51.825908] block drbd2: max BIO size = 131072 Mar 1 13:36:36 xm01 kernel: [ 51.825915] block drbd2: drbd_bm_resize called with capacity == 1172087720 Mar 1 13:36:36 xm01 kernel: [ 51.830989] block drbd2: resync bitmap: bits=146510965 words=2289234 pages=4472 Mar 1 13:36:36 xm01 kernel: [ 51.830995] block drbd2: size = 559 GB (586043860 KB) Mar 1 13:36:36 xm01 kernel: [ 52.033592] block drbd2: bitmap READ of 4472 pages took 51 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.050223] block drbd2: recounting of set bits took additional 4 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.050228] block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 52.050291] block drbd2: Marked additional 48 MB as out-of-sync based on AL. Mar 1 13:36:36 xm01 kernel: [ 52.050352] block drbd2: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.050382] block drbd2: 48 MB (12288 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 52.050391] block drbd2: disk( Attaching -> Consistent ) Mar 1 13:36:36 xm01 kernel: [ 52.050396] block drbd2: attached to UUIDs 324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B Mar 1 13:36:36 xm01 kernel: [ 52.079074] block drbd3: Starting worker thread (from cqueue [4927]) Mar 1 13:36:36 xm01 kernel: [ 52.079172] block drbd3: disk( Diskless -> Attaching ) Mar 1 13:36:36 xm01 kernel: [ 52.118864] block drbd3: Found 29 transactions (29 active extents) in activity log. Mar 1 13:36:36 xm01 kernel: [ 52.118868] block drbd3: Method to ensure write ordering: barrier Mar 1 13:36:36 xm01 kernel: [ 52.118872] block drbd3: max BIO size = 131072 Mar 1 13:36:36 xm01 kernel: [ 52.118877] block drbd3: drbd_bm_resize called with capacity == 2097016 Mar 1 13:36:36 xm01 kernel: [ 52.118888] block drbd3: resync bitmap: bits=262127 words=4096 pages=8 Mar 1 13:36:36 xm01 kernel: [ 52.118891] block drbd3: size = 1024 MB (1048508 KB) Mar 1 13:36:36 xm01 kernel: [ 52.125476] block drbd3: bitmap READ of 8 pages took 2 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.125509] block drbd3: recounting of set bits took additional 0 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.125511] block drbd3: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 52.125540] block drbd3: Marked additional 20 MB as out-of-sync based on AL. Mar 1 13:36:36 xm01 kernel: [ 52.125543] block drbd3: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:36 xm01 kernel: [ 52.129955] block drbd3: 20 MB (5120 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:36 xm01 kernel: [ 52.129960] block drbd3: disk( Attaching -> Consistent ) Mar 1 13:36:36 xm01 kernel: [ 52.129964] block drbd3: attached to UUIDs 75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001 Mar 1 13:36:36 xm01 kernel: [ 52.204837] padlock: VIA PadLock Hash Engine not detected. Mar 1 13:36:36 xm01 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32.49-0.3-xen/kernel/drivers/crypto/padlock-sha.ko): No such device Mar 1 13:36:36 xm01 kernel: [ 52.238263] block drbd0: conn( StandAlone -> Unconnected ) Mar 1 13:36:36 xm01 kernel: [ 52.238301] block drbd0: Starting receiver thread (from drbd0_worker [4938]) Mar 1 13:36:36 xm01 kernel: [ 52.238341] block drbd0: receiver (re)started Mar 1 13:36:36 xm01 kernel: [ 52.238349] block drbd0: conn( Unconnected -> WFConnection ) Mar 1 13:36:36 xm01 kernel: [ 52.241205] block drbd1: conn( StandAlone -> Unconnected ) Mar 1 13:36:36 xm01 kernel: [ 52.241238] block drbd1: Starting receiver thread (from drbd1_worker [4960]) Mar 1 13:36:36 xm01 kernel: [ 52.241311] block drbd1: receiver (re)started Mar 1 13:36:36 xm01 kernel: [ 52.241318] block drbd1: conn( Unconnected -> WFConnection ) Mar 1 13:36:36 xm01 kernel: [ 52.243718] block drbd2: conn( StandAlone -> Unconnected ) Mar 1 13:36:36 xm01 kernel: [ 52.243743] block drbd2: Starting receiver thread (from drbd2_worker [4986]) Mar 1 13:36:36 xm01 kernel: [ 52.243808] block drbd2: receiver (re)started Mar 1 13:36:36 xm01 kernel: [ 52.243817] block drbd2: conn( Unconnected -> WFConnection ) Mar 1 13:36:36 xm01 kernel: [ 52.246305] block drbd3: conn( StandAlone -> Unconnected ) Mar 1 13:36:36 xm01 kernel: [ 52.246337] block drbd3: Starting receiver thread (from drbd3_worker [5016]) Mar 1 13:36:36 xm01 kernel: [ 52.246406] block drbd3: receiver (re)started Mar 1 13:36:36 xm01 kernel: [ 52.246415] block drbd3: conn( Unconnected -> WFConnection ) Mar 1 13:36:37 xm01 kernel: [ 52.738908] block drbd1: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:37 xm01 kernel: [ 52.738985] block drbd0: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:37 xm01 kernel: [ 52.739113] block drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:37 xm01 kernel: [ 52.739122] block drbd1: conn( WFConnection -> WFReportParams ) Mar 1 13:36:37 xm01 kernel: [ 52.739141] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:37 xm01 kernel: [ 52.739146] block drbd0: conn( WFConnection -> WFReportParams ) Mar 1 13:36:37 xm01 kernel: [ 52.739182] block drbd1: Starting asender thread (from drbd1_receiver [5114]) Mar 1 13:36:37 xm01 kernel: [ 52.739191] block drbd0: Starting asender thread (from drbd0_receiver [5110]) Mar 1 13:36:37 xm01 kernel: [ 52.739298] block drbd0: data-integrity-alg: <not-used> Mar 1 13:36:37 xm01 kernel: [ 52.739316] block drbd0: drbd_sync_handshake: Mar 1 13:36:37 xm01 kernel: [ 52.739320] block drbd0: self EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093 bits:57344 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.739324] block drbd0: peer EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093 bits:0 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.739328] block drbd0: uuid_compare()=1 by rule 40 Mar 1 13:36:37 xm01 kernel: [ 52.739334] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Consistent ) Mar 1 13:36:37 xm01 kernel: [ 52.739374] block drbd1: data-integrity-alg: <not-used> Mar 1 13:36:37 xm01 kernel: [ 52.739389] block drbd1: drbd_sync_handshake: Mar 1 13:36:37 xm01 kernel: [ 52.739393] block drbd1: self E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481 bits:30720 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.739397] block drbd1: peer E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481 bits:0 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.739400] block drbd1: uuid_compare()=1 by rule 40 Mar 1 13:36:37 xm01 kernel: [ 52.739406] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Consistent ) Mar 1 13:36:37 xm01 kernel: [ 52.739584] block drbd1: meta connection shut down by peer. Mar 1 13:36:37 xm01 kernel: [ 52.739590] block drbd1: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:36:37 xm01 kernel: [ 52.739646] block drbd0: sock_sendmsg returned -32 Mar 1 13:36:37 xm01 kernel: [ 52.739651] block drbd0: peer( Primary -> Unknown ) conn( WFBitMapS -> BrokenPipe ) pdsk( Consistent -> DUnknown ) Mar 1 13:36:37 xm01 kernel: [ 52.739657] block drbd0: short sent ReportBitMap size=4096 sent=3172 Mar 1 13:36:37 xm01 kernel: [ 52.739674] block drbd0: meta connection shut down by peer. Mar 1 13:36:37 xm01 kernel: [ 52.739683] block drbd0: asender terminated Mar 1 13:36:37 xm01 kernel: [ 52.739687] block drbd0: Terminating asender thread Mar 1 13:36:37 xm01 kernel: [ 52.739738] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:37 xm01 kernel: [ 52.741865] block drbd1: 120 MB (30720 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:37 xm01 kernel: [ 52.743017] block drbd2: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:37 xm01 kernel: [ 52.743091] block drbd3: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:37 xm01 kernel: [ 52.743270] block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:37 xm01 kernel: [ 52.743278] block drbd2: conn( WFConnection -> WFReportParams ) Mar 1 13:36:37 xm01 kernel: [ 52.743309] block drbd2: Starting asender thread (from drbd2_receiver [5120]) Mar 1 13:36:37 xm01 kernel: [ 52.743341] block drbd3: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:37 xm01 kernel: [ 52.743348] block drbd3: conn( WFConnection -> WFReportParams ) Mar 1 13:36:37 xm01 kernel: [ 52.743410] block drbd3: Starting asender thread (from drbd3_receiver [5124]) Mar 1 13:36:37 xm01 kernel: [ 52.743494] block drbd3: data-integrity-alg: <not-used> Mar 1 13:36:37 xm01 kernel: [ 52.743532] block drbd3: drbd_sync_handshake: Mar 1 13:36:37 xm01 kernel: [ 52.743536] block drbd3: self 75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001 bits:5120 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.743540] block drbd3: peer 75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001 bits:0 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.743543] block drbd3: uuid_compare()=1 by rule 40 Mar 1 13:36:37 xm01 kernel: [ 52.743550] block drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Consistent ) Mar 1 13:36:37 xm01 kernel: [ 52.743733] block drbd3: meta connection shut down by peer. Mar 1 13:36:37 xm01 kernel: [ 52.743740] block drbd3: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:36:37 xm01 kernel: [ 52.743878] block drbd3: sock_sendmsg returned -32 Mar 1 13:36:37 xm01 kernel: [ 52.743884] block drbd3: short sent ReportBitMap size=4096 sent=276 Mar 1 13:36:37 xm01 kernel: [ 52.743894] block drbd2: data-integrity-alg: <not-used> Mar 1 13:36:37 xm01 kernel: [ 52.743905] block drbd3: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:37 xm01 kernel: [ 52.743908] block drbd2: drbd_sync_handshake: Mar 1 13:36:37 xm01 kernel: [ 52.743914] block drbd2: self 324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B bits:12288 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.743918] block drbd2: peer 324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B bits:0 flags:0 Mar 1 13:36:37 xm01 kernel: [ 52.743921] block drbd2: uuid_compare()=1 by rule 40 Mar 1 13:36:37 xm01 kernel: [ 52.743928] block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Consistent ) Mar 1 13:36:37 xm01 kernel: [ 52.744091] block drbd2: meta connection shut down by peer. Mar 1 13:36:37 xm01 kernel: [ 52.744097] block drbd2: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:36:37 xm01 kernel: [ 52.744279] block drbd2: sock_sendmsg returned -32 Mar 1 13:36:37 xm01 kernel: [ 52.744283] block drbd2: short sent ReportBitMap size=4096 sent=2180 Mar 1 13:36:37 xm01 kernel: [ 52.744335] block drbd2: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:37 xm01 kernel: [ 52.747349] block drbd2: 48 MB (12288 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:37 xm01 kernel: [ 52.747833] block drbd1: asender terminated Mar 1 13:36:37 xm01 kernel: [ 52.747837] block drbd1: Terminating asender thread Mar 1 13:36:37 xm01 kernel: [ 52.747902] block drbd1: Connection closed Mar 1 13:36:37 xm01 kernel: [ 52.747908] block drbd1: conn( NetworkFailure -> Unconnected ) Mar 1 13:36:37 xm01 kernel: [ 52.747915] block drbd1: receiver terminated Mar 1 13:36:37 xm01 kernel: [ 52.747917] block drbd1: Restarting receiver thread Mar 1 13:36:37 xm01 kernel: [ 52.747933] block drbd1: receiver (re)started Mar 1 13:36:37 xm01 kernel: [ 52.747938] block drbd1: conn( Unconnected -> WFConnection ) Mar 1 13:36:37 xm01 kernel: [ 52.749723] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:36:37 xm01 kernel: [ 52.749734] block drbd0: 224 MB (57344 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:37 xm01 kernel: [ 52.749775] block drbd0: Connection closed Mar 1 13:36:37 xm01 kernel: [ 52.749780] block drbd0: conn( BrokenPipe -> Unconnected ) Mar 1 13:36:37 xm01 kernel: [ 52.749787] block drbd0: receiver terminated Mar 1 13:36:37 xm01 kernel: [ 52.749789] block drbd0: Restarting receiver thread Mar 1 13:36:37 xm01 kernel: [ 52.749792] block drbd0: receiver (re)started Mar 1 13:36:37 xm01 kernel: [ 52.749796] block drbd0: conn( Unconnected -> WFConnection ) Mar 1 13:36:37 xm01 kernel: [ 52.753343] block drbd2: asender terminated Mar 1 13:36:37 xm01 kernel: [ 52.753347] block drbd2: Terminating asender thread Mar 1 13:36:37 xm01 kernel: [ 52.753391] block drbd2: Connection closed Mar 1 13:36:37 xm01 kernel: [ 52.753395] block drbd2: conn( NetworkFailure -> Unconnected ) Mar 1 13:36:37 xm01 kernel: [ 52.753399] block drbd2: receiver terminated Mar 1 13:36:37 xm01 kernel: [ 52.753401] block drbd2: Restarting receiver thread Mar 1 13:36:37 xm01 kernel: [ 52.753403] block drbd2: receiver (re)started Mar 1 13:36:37 xm01 kernel: [ 52.753407] block drbd2: conn( Unconnected -> WFConnection ) Mar 1 13:36:37 xm01 kernel: [ 52.754182] block drbd3: 20 MB (5120 bits) marked out-of-sync by on disk bit-map. Mar 1 13:36:37 xm01 kernel: [ 52.769214] block drbd3: asender terminated Mar 1 13:36:37 xm01 kernel: [ 52.769222] block drbd3: Terminating asender thread Mar 1 13:36:37 xm01 kernel: [ 52.769303] block drbd3: Connection closed Mar 1 13:36:37 xm01 kernel: [ 52.769309] block drbd3: conn( NetworkFailure -> Unconnected ) Mar 1 13:36:37 xm01 kernel: [ 52.769317] block drbd3: receiver terminated Mar 1 13:36:37 xm01 kernel: [ 52.769320] block drbd3: Restarting receiver thread Mar 1 13:36:37 xm01 kernel: [ 52.769322] block drbd3: receiver (re)started Mar 1 13:36:37 xm01 kernel: [ 52.769327] block drbd3: conn( Unconnected -> WFConnection ) ... Mar 1 13:37:17 xm01 kernel: [ 93.073374] block drbd0: Handshake successful: Agreed network protocol version 96 Mar 1 13:37:17 xm01 kernel: [ 93.073589] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:37:17 xm01 kernel: [ 93.073609] block drbd0: conn( WFConnection -> WFReportParams ) Mar 1 13:37:17 xm01 kernel: [ 93.073647] block drbd0: Starting asender thread (from drbd0_receiver [5110]) Mar 1 13:37:17 xm01 kernel: [ 93.073768] block drbd0: data-integrity-alg: <not-used> Mar 1 13:37:17 xm01 kernel: [ 93.073786] block drbd0: drbd_sync_handshake: Mar 1 13:37:17 xm01 kernel: [ 93.073790] block drbd0: self EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093 bits:57344 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.073794] block drbd0: peer EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093 bits:0 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.073798] block drbd0: uuid_compare()=1 by rule 40 Mar 1 13:37:17 xm01 kernel: [ 93.073804] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) Mar 1 13:37:17 xm01 kernel: [ 93.073985] block drbd0: sock_sendmsg returned -32 Mar 1 13:37:17 xm01 kernel: [ 93.073990] block drbd0: peer( Primary -> Unknown ) conn( WFBitMapS -> BrokenPipe ) pdsk( Consistent -> DUnknown ) Mar 1 13:37:17 xm01 kernel: [ 93.073998] block drbd0: short sent ReportBitMap size=4096 sent=732 Mar 1 13:37:17 xm01 kernel: [ 93.074015] block drbd0: meta connection shut down by peer. Mar 1 13:37:17 xm01 kernel: [ 93.074021] block drbd0: asender terminated Mar 1 13:37:17 xm01 kernel: [ 93.074024] block drbd0: Terminating asender thread Mar 1 13:37:17 xm01 kernel: [ 93.077364] block drbd3: Handshake successful: Agreed network protocol version 96 Mar 1 13:37:17 xm01 kernel: [ 93.078584] block drbd3: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:37:17 xm01 kernel: [ 93.078593] block drbd3: conn( WFConnection -> WFReportParams ) Mar 1 13:37:17 xm01 kernel: [ 93.078633] block drbd3: Starting asender thread (from drbd3_receiver [5124]) Mar 1 13:37:17 xm01 kernel: [ 93.078756] block drbd3: data-integrity-alg: <not-used> Mar 1 13:37:17 xm01 kernel: [ 93.078786] block drbd3: drbd_sync_handshake: Mar 1 13:37:17 xm01 kernel: [ 93.078790] block drbd3: self 75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001 bits:5120 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.078794] block drbd3: peer 75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001 bits:0 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.078797] block drbd3: uuid_compare()=1 by rule 40 Mar 1 13:37:17 xm01 kernel: [ 93.078803] block drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) Mar 1 13:37:17 xm01 kernel: [ 93.078925] block drbd3: meta connection shut down by peer. Mar 1 13:37:17 xm01 kernel: [ 93.078930] block drbd3: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:37:17 xm01 kernel: [ 93.078970] block drbd3: sock_sendmsg returned -32 Mar 1 13:37:17 xm01 kernel: [ 93.078975] block drbd3: short sent ReportBitMap size=4096 sent=276 Mar 1 13:37:17 xm01 kernel: [ 93.078983] block drbd3: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:37:17 xm01 kernel: [ 93.084657] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:37:17 xm01 kernel: [ 93.084668] block drbd0: 224 MB (57344 bits) marked out-of-sync by on disk bit-map. Mar 1 13:37:17 xm01 kernel: [ 93.084678] block drbd0: Connection closed Mar 1 13:37:17 xm01 kernel: [ 93.084683] block drbd0: conn( BrokenPipe -> Unconnected ) Mar 1 13:37:17 xm01 kernel: [ 93.084687] block drbd0: receiver terminated Mar 1 13:37:17 xm01 kernel: [ 93.084689] block drbd0: Restarting receiver thread Mar 1 13:37:17 xm01 kernel: [ 93.084692] block drbd0: receiver (re)started Mar 1 13:37:17 xm01 kernel: [ 93.084696] block drbd0: conn( Unconnected -> WFConnection ) Mar 1 13:37:17 xm01 kernel: [ 93.089359] block drbd1: Handshake successful: Agreed network protocol version 96 Mar 1 13:37:17 xm01 kernel: [ 93.089575] block drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:37:17 xm01 kernel: [ 93.089582] block drbd1: conn( WFConnection -> WFReportParams ) Mar 1 13:37:17 xm01 kernel: [ 93.089595] block drbd1: Starting asender thread (from drbd1_receiver [5114]) Mar 1 13:37:17 xm01 kernel: [ 93.089691] block drbd1: data-integrity-alg: <not-used> Mar 1 13:37:17 xm01 kernel: [ 93.089745] block drbd1: drbd_sync_handshake: Mar 1 13:37:17 xm01 kernel: [ 93.089749] block drbd1: self E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481 bits:30720 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.089753] block drbd1: peer E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481 bits:0 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.089757] block drbd1: uuid_compare()=1 by rule 40 Mar 1 13:37:17 xm01 kernel: [ 93.089762] block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) Mar 1 13:37:17 xm01 kernel: [ 93.089862] block drbd1: meta connection shut down by peer. Mar 1 13:37:17 xm01 kernel: [ 93.089868] block drbd1: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:37:17 xm01 kernel: [ 93.089931] block drbd1: sock_sendmsg returned -32 Mar 1 13:37:17 xm01 kernel: [ 93.089935] block drbd1: short sent ReportBitMap size=4096 sent=2180 Mar 1 13:37:17 xm01 kernel: [ 93.089985] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:37:17 xm01 kernel: [ 93.094402] block drbd1: 120 MB (30720 bits) marked out-of-sync by on disk bit-map. Mar 1 13:37:17 xm01 kernel: [ 93.100362] block drbd1: asender terminated Mar 1 13:37:17 xm01 kernel: [ 93.100367] block drbd1: Terminating asender thread Mar 1 13:37:17 xm01 kernel: [ 93.100451] block drbd1: Connection closed Mar 1 13:37:17 xm01 kernel: [ 93.100456] block drbd1: conn( NetworkFailure -> Unconnected ) Mar 1 13:37:17 xm01 kernel: [ 93.100464] block drbd1: receiver terminated Mar 1 13:37:17 xm01 kernel: [ 93.100466] block drbd1: Restarting receiver thread Mar 1 13:37:17 xm01 kernel: [ 93.100468] block drbd1: receiver (re)started Mar 1 13:37:17 xm01 kernel: [ 93.100472] block drbd1: conn( Unconnected -> WFConnection ) Mar 1 13:37:17 xm01 kernel: [ 93.102859] block drbd3: 20 MB (5120 bits) marked out-of-sync by on disk bit-map. Mar 1 13:37:17 xm01 kernel: [ 93.119786] block drbd3: asender terminated Mar 1 13:37:17 xm01 kernel: [ 93.119794] block drbd3: Terminating asender thread Mar 1 13:37:17 xm01 kernel: [ 93.119847] block drbd3: Connection closed Mar 1 13:37:17 xm01 kernel: [ 93.119853] block drbd3: conn( NetworkFailure -> Unconnected ) Mar 1 13:37:17 xm01 kernel: [ 93.119859] block drbd3: receiver terminated Mar 1 13:37:17 xm01 kernel: [ 93.119861] block drbd3: Restarting receiver thread Mar 1 13:37:17 xm01 kernel: [ 93.119864] block drbd3: receiver (re)started Mar 1 13:37:17 xm01 kernel: [ 93.119868] block drbd3: conn( Unconnected -> WFConnection ) Mar 1 13:37:17 xm01 kernel: [ 93.625232] block drbd2: Handshake successful: Agreed network protocol version 96 Mar 1 13:37:17 xm01 kernel: [ 93.625450] block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:37:17 xm01 kernel: [ 93.625460] block drbd2: conn( WFConnection -> WFReportParams ) Mar 1 13:37:17 xm01 kernel: [ 93.625476] block drbd2: Starting asender thread (from drbd2_receiver [5120]) Mar 1 13:37:17 xm01 kernel: [ 93.625592] block drbd2: data-integrity-alg: <not-used> Mar 1 13:37:17 xm01 kernel: [ 93.625639] block drbd2: drbd_sync_handshake: Mar 1 13:37:17 xm01 kernel: [ 93.625643] block drbd2: self 324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B bits:12288 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.625647] block drbd2: peer 324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B bits:0 flags:0 Mar 1 13:37:17 xm01 kernel: [ 93.625651] block drbd2: uuid_compare()=1 by rule 40 Mar 1 13:37:17 xm01 kernel: [ 93.625657] block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) Mar 1 13:37:17 xm01 kernel: [ 93.625804] block drbd2: meta connection shut down by peer. Mar 1 13:37:17 xm01 kernel: [ 93.625812] block drbd2: peer( Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( Consistent -> DUnknown ) Mar 1 13:37:17 xm01 kernel: [ 93.625819] block drbd2: sock_sendmsg returned -32 Mar 1 13:37:17 xm01 kernel: [ 93.625824] block drbd2: short sent ReportBitMap size=4096 sent=2180 Mar 1 13:37:17 xm01 kernel: [ 93.625875] block drbd2: bitmap WRITE of 0 pages took 0 jiffies Mar 1 13:37:17 xm01 kernel: [ 93.632366] block drbd2: 48 MB (12288 bits) marked out-of-sync by on disk bit-map. Mar 1 13:37:17 xm01 kernel: [ 93.638339] block drbd2: asender terminated Mar 1 13:37:17 xm01 kernel: [ 93.638344] block drbd2: Terminating asender thread Mar 1 13:37:17 xm01 kernel: [ 93.638395] block drbd2: Connection closed Mar 1 13:37:17 xm01 kernel: [ 93.638400] block drbd2: conn( NetworkFailure -> Unconnected ) Mar 1 13:37:17 xm01 kernel: [ 93.638405] block drbd2: receiver terminated Mar 1 13:37:17 xm01 kernel: [ 93.638407] block drbd2: Restarting receiver thread Mar 1 13:37:17 xm01 kernel: [ 93.638409] block drbd2: receiver (re)started Mar 1 13:37:18 xm01 kernel: [ 93.638413] block drbd2: conn( Unconnected -> WFConnection ) Mar 1 13:37:19 xm01 lrmd: [5649]: info: rsc:vmconfig:0 promote[20] (pid 6032) Mar 1 13:37:19 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stdout) allow-two-primaries; Mar 1 13:37:19 xm01 kernel: [ 94.962300] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) 3: State change failed: (-7) Refusing to be Primary while peer is not outdated Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) Command 'drbdsetup 3 primary Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) ' terminated with exit code 11 Mar 1 13:37:20 xm01 kernel: [ 95.978113] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00) Mar 1 13:37:20 xm01 kernel: [ 95.978117] block drbd3: fence-peer helper broken, returned 126 Mar 1 13:37:20 xm01 kernel: [ 95.978124] block drbd3: State change failed: Refusing to be Primary while peer is not outdated Mar 1 13:37:20 xm01 kernel: [ 95.978128] block drbd3: state = { cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown r----- } Mar 1 13:37:20 xm01 kernel: [ 95.978132] block drbd3: wanted = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---F- } Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Called drbdadm -c /etc/drbd.conf primary vmconfig Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Exit code 11 Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Command output: Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stdout) Mar 1 13:37:20 xm01 kernel: [ 96.012979] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) 3: State change failed: (-7) Refusing to be Primary while peer is not outdated Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) Command 'drbdsetup 3 primary Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stderr) ' terminated with exit code 11 Mar 1 13:37:21 xm01 kernel: [ 97.020366] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00) Mar 1 13:37:21 xm01 kernel: [ 97.020369] block drbd3: fence-peer helper broken, returned 126 Mar 1 13:37:21 xm01 kernel: [ 97.020375] block drbd3: State change failed: Refusing to be Primary while peer is not outdated Mar 1 13:37:21 xm01 kernel: [ 97.020379] block drbd3: state = { cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown r----- } Mar 1 13:37:21 xm01 kernel: [ 97.020383] block drbd3: wanted = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---F- } Mar 1 13:37:21 xm01 drbd[6032]: ERROR: vmconfig: Called drbdadm -c /etc/drbd.conf primary vmconfig Mar 1 13:37:21 xm01 drbd[6032]: ERROR: vmconfig: Exit code 11 several times until I get this: Mar 1 13:38:47 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stdout) Mar 1 13:38:48 xm01 kernel: [ 184.088528] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 Mar 1 13:38:49 xm01 lrmd: [5649]: WARN: vmconfig:0:promote process (PID 6032) timed out (try 1). Killing with signal SIGTERM (15). Mar 1 13:38:49 xm01 lrmd: [5649]: WARN: operation promote[20] on vmconfig:0 for client 5652: pid 6032 timed out Mar 1 13:38:49 xm01 crmd: [5652]: ERROR: process_lrm_event: LRM operation vmconfig:0_promote_0 (20) Timed Out (timeout=90000ms) Mar 1 13:38:49 xm01 attrd: [5650]: notice: attrd_ais_dispatch: Update relayed from xm02 Mar 1 13:38:49 xm01 attrd: [5650]: info: find_hash_entry: Creating hash entry for fail-count-vmconfig:0 Mar 1 13:38:49 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing key=211:6:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:0_notify_0 ) Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_local_callback: Expanded fail-count-vmconfig:0=value++ to 1 Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-vmconfig:0 (1) Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_perform_update: Sent update 33: fail-count-vmconfig:0=1 Mar 1 13:38:49 xm01 lrmd: [5649]: info: rsc:vmconfig:0 notify[21] (pid 7100) Mar 1 13:38:49 xm01 attrd: [5650]: notice: attrd_ais_dispatch: Update relayed from xm02 Mar 1 13:38:49 xm01 attrd: [5650]: info: find_hash_entry: Creating hash entry for last-failure-vmconfig:0 Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-vmconfig:0 (1330619909) Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_perform_update: Sent update 36: last-failure-vmconfig:0=1330619909 Mar 1 13:38:52 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:notify:stderr) lock on /var/lock/drbd-147-3 currently held by pid:7099 Mar 1 13:38:52 xm01 crm_attribute: [7128]: info: Invoked: crm_attribute -N xm01 -n master-vmconfig:0 -l reboot -D Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_trigger_update: Sending flush op to all hosts for: master-vmconfig:0 (<null>) Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_perform_update: Sent delete 38: node=xm01, attr=master-vmconfig:0, id=<n/a>, set=(null), section=status Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_perform_update: Sent delete 40: node=xm01, attr=master-vmconfig:0, id=<n/a>, set=(null), section=status Mar 1 13:38:52 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:notify:stdout) Mar 1 13:38:52 xm01 lrmd: [5649]: info: operation notify[21] on vmconfig:0 for client 5652: pid 7100 exited with return code 0 Mar 1 13:38:52 xm01 crmd: [5652]: info: process_lrm_event: LRM operation vmconfig:0_notify_0 (call=21, rc=0, cib-update=26, confirmed=true) ok Mar 1 13:38:55 xm01 external/ipmi[7135]: [7146]: debug: ipmitool output: Chassis Power is on Mar 1 13:38:56 xm01 stonith: [7131]: info: external/ipmi device OK. Mar 1 13:39:01 xm01 /usr/sbin/cron[7148]: (root) CMD (/usr/sbin/logwatch --service dmeventd) Mar 1 13:39:11 xm01 external/ipmi[7176]: [7187]: debug: ipmitool output: Chassis Power is on Mar 1 13:39:12 xm01 stonith: [7172]: info: external/ipmi device OK. Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing key=216:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:0_notify_0 ) Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:vmconfig:0 notify[22] (pid 7188) Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing key=224:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmsvn-drbd:0_notify_0 ) Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:vmsvn-drbd:0 notify[23] (pid 7189) Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing key=232:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn1-drbd:0_notify_0 ) Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:srvsvn1-drbd:0 notify[24] (pid 7190) Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing key=240:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn2-drbd:0_notify_0 ) Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:srvsvn2-drbd:0 notify[25] (pid 7191) Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: crm_new_peer: Node xm02 now has id: 33554532 Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: crm_new_peer: Node 33554532 is now known as xm02 Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_queryQuery <stonith_command t="stonith-ng" st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_query" st_callid="0" st_callopt="0" st_ remote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02" st_device_action="reboot" st_clientid="bb653c7a-6351-4517-ad06-6fb0e20fe375" st_timeout="6000" src="xm02" seq="5" /> Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: can_fence_host_with_device: Refreshing port list for ipmi-stonith-xm02 Mar 1 13:39:19 xm01 stonith-ng: [5647]: WARN: parse_host_line: Could not parse (0 0): Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: can_fence_host_with_device: ipmi-stonith-xm02 can fence xm02: dynamic-list Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_query: Found 1 matching devices for 'xm02' Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_fenceExec <stonith_command t="stonith-ng" st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_fence" st_callid="0" st_callopt="0" st_r emote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02" st_device_action="reboot" st_timeout="54000" src="xm02" seq="7" /> Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: can_fence_host_with_device: ipmi-stonith-xm02 can fence xm02: dynamic-list Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_fence: Found 1 matching devices for 'xm02' Mar 1 13:39:20 xm01 external/ipmi[7288]: [7302]: debug: ipmitool output: Chassis Power Control: Reset Mar 1 13:39:21 xm01 stonith-ng: [5647]: info: log_operation: Operation 'reboot' [7277] for host 'xm02' with device 'ipmi-stonith-xm02' returned: 0 (call 0 from (null)) Mar 1 13:39:21 xm01 lrmd: [5649]: info: operation notify[22] on vmconfig:0 for client 5652: pid 7188 exited with return code 0 Mar 1 13:39:21 xm01 crmd: [5652]: info: process_lrm_event: LRM operation vmconfig:0_notify_0 (call=22, rc=0, cib-update=27, confirmed=true) ok Mar 1 13:39:22 xm01 kernel: [ 218.177661] bnx2: eth1 NIC Copper Link is Down Mar 1 13:39:24 xm01 kernel: [ 220.488280] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Mar 1 13:39:24 xm01 corosync[5621]: [TOTEM ] A processor failed, forming new configuration. Mar 1 13:39:27 xm01 external/ipmi[7311]: [7322]: debug: ipmitool output: Chassis Power is on Mar 1 13:39:28 xm01 stonith: [7307]: info: external/ipmi device OK. Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] CLM CONFIGURATION CHANGE Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] New Configuration: Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.1) Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Left: Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.2) Mar 1 13:39:30 xm01 cib: [5648]: notice: ais_dispatch_message: Membership 1028: quorum lost Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Joined: Mar 1 13:39:30 xm01 crmd: [5652]: notice: ais_dispatch_message: Membership 1028: quorum lost Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1028: memb=1, new=0, lost=1 Mar 1 13:39:30 xm01 cib: [5648]: info: crm_update_peer: Node xm02: id=33554532 state=lost (new) addr=r(0) ip(100.0.0.2) votes=1 born=1016 seen=1024 proc=00000000000000000000000000151312 Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: pcmk_peer_update: memb: xm01 16777316 Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: pcmk_peer_update: lost: xm02 33554532 Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] CLM CONFIGURATION CHANGE Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] New Configuration: Mar 1 13:39:30 xm01 crmd: [5652]: info: ais_status_callback: status: xm02 is now lost (was member) Mar 1 13:39:30 xm01 crmd: [5652]: info: crm_update_peer: Node xm02: id=33554532 state=lost (new) addr=r(0) ip(100.0.0.2) votes=1 born=1016 seen=1024 proc=00000000000000000000000000151312 Mar 1 13:39:30 xm01 stonith-ng: [5647]: info: process_remote_stonith_execExecResult <st-reply st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" st_remote_op="c1be22cc-e535- 441c-a674-89551a2b9d4c" st_callid="0" st_callopt="0" st_rc="0" st_output="Performing: stonith -t external/ipmi -T reset xm02 success: xm02 0 " src="xm01" seq="2" /> Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.1) Mar 1 13:39:30 xm01 crmd: [5652]: WARN: check_dead_member: Our DC node (xm02) left the cluster Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Left: Mar 1 13:39:30 xm01 stonith-ng: [5647]: info: remote_op_done: Notifing clients of c1be22cc-e535-441c-a674-89551a2b9d4c (reboot of xm02 from bb653c7a-6351-4517-ad06-6fb0e20fe375 by xm01): 0, rc=0 Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Joined: Mar 1 13:39:30 xm01 stonith-ng: [5647]: info: stonith_notify_client: Sending st_fence-notification to client 5652/c9e6b033-73f2-43a9-b848-81bffa3c6d9b Mar 1 13:39:30 xm01 crmd: [5652]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ] Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1028: memb=1, new=0, lost=0 Mar 1 13:39:30 xm01 crmd: [5652]: info: update_dc: Unset DC xm02 Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: pcmk_peer_update: MEMB: xm01 16777316 Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: ais_mark_unseen_peer_dead: Node xm02 was not seen in the previous transition Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: update_member: Node 33554532/xm02 is now: lost Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: send_member_notification: Sending membership update 1028 to 2 children Mar 1 13:39:30 xm01 corosync[5621]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Mar 1 13:39:30 xm01 corosync[5621]: [CPG ] chosen downlist: sender r(0) ip(100.0.0.1) ; members(old:2 left:1) Mar 1 13:39:30 xm01 crmd: [5652]: info: tengine_stonith_notify: Peer xm02 was terminated (reboot) by xm01 for xm02 (ref=c1be22cc-e535-441c-a674-89551a2b9d4c): OK Mar 1 13:39:30 xm01 crmd: [5652]: notice: tengine_stonith_notify: Target was our leader xm02/xm02 (recorded leader: <unset>) Mar 1 13:39:30 xm01 corosync[5621]: [MAIN ] Completed service synchronization, ready to provide service. Mar 1 13:39:30 xm01 crmd: [5652]: info: send_stonith_update: Sending fencing update 28 for xm02 Mar 1 13:39:30 xm01 crmd: [5652]: notice: crmd_peer_update: Status update: Client xm02/crmd now has status [offline] (DC=<null>) Mar 1 13:39:30 xm01 crmd: [5652]: info: crm_update_peer: Node xm02: id=33554532 state=lost addr=r(0) ip(100.0.0.2) votes=1 born=1016 seen=1024 proc=00000000000000000000000000000001 (new) Mar 1 13:39:30 xm01 crmd: [5652]: info: cib_fencing_updated: Fencing update 28 for xm02: complete Mar 1 13:39:30 xm01 crmd: [5652]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] @ 13:39:32 XM02 has been stonithed. WHY???? With the drbd.conf modifications, I no longer have the constraints (which is fine!) and they both become Master. BUT... The VM never fails over to XM02 as it should when XM01 goes down. Here's the XM02 log between 13:33:00 and 13:36:40 when XM01 is up again. Mar 1 13:32:56 xm02 mgmtd: [6300]: info: CIB query: cib Mar 1 13:33:01 xm02 /usr/sbin/cron[8783]: (root) CMD (/usr/sbin/logwatch --service dmeventd) Mar 1 13:33:09 xm02 external/ipmi[8858]: [8869]: debug: ipmitool output: Chassis Power is on Mar 1 13:33:10 xm02 stonith: [8854]: info: external/ipmi device OK. Mar 1 13:33:21 xm02 kernel: [ 238.815026] bnx2: eth1 NIC Copper Link is Down Mar 1 13:33:23 xm02 kernel: [ 241.298581] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Mar 1 13:33:27 xm02 external/ipmi[9035]: [9061]: debug: ipmitool output: Chassis Power is on Mar 1 13:33:28 xm02 stonith: [9031]: info: external/ipmi device OK. Mar 1 13:33:36 xm02 kernel: [ 254.005922] bnx2: eth1 NIC Copper Link is Down Mar 1 13:33:39 xm02 kernel: [ 256.432743] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Mar 1 13:33:39 xm02 kernel: [ 256.820486] bnx2: eth1 NIC Copper Link is Down Mar 1 13:33:41 xm02 kernel: [ 259.290456] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Mar 1 13:33:44 xm02 external/ipmi[9254]: [9265]: debug: ipmitool output: Chassis Power is on Mar 1 13:33:45 xm02 stonith: [9250]: info: external/ipmi device OK. Mar 1 13:33:55 xm02 lrmd: [6296]: WARN: VMSVN:start process (PID 8644) timed out (try 1). Killing with signal SIGTERM (15). Mar 1 13:34:00 xm02 lrmd: [6296]: WARN: VMSVN:start process (PID 8644) timed out (try 2). Killing with signal SIGKILL (9). Mar 1 13:34:00 xm02 external/ipmi[9478]: [9489]: debug: ipmitool output: Chassis Power is on Mar 1 13:34:01 xm02 /usr/sbin/cron[9491]: (root) CMD (/usr/sbin/logwatch --service dmeventd) Mar 1 13:34:01 xm02 stonith: [9474]: info: external/ipmi device OK. Mar 1 13:34:05 xm02 lrmd: [6296]: ERROR: TrackedProcTimeoutFunction: VMSVN:start process (PID 8644) will not die! Mar 1 13:34:17 xm02 external/ipmi[9697]: [9708]: debug: ipmitool output: Chassis Power is on Mar 1 13:34:18 xm02 stonith: [9693]: info: external/ipmi device OK. Mar 1 13:34:31 xm02 kernel: [ 308.659429] bnx2: eth1 NIC Copper Link is Down Mar 1 13:34:33 xm02 external/ipmi[9804]: [9815]: debug: ipmitool output: Chassis Power is on Mar 1 13:34:33 xm02 kernel: [ 311.171354] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON Mar 1 13:34:34 xm02 stonith: [9800]: info: external/ipmi device OK. Mar 1 13:34:50 xm02 external/ipmi[10014]: [10025]: debug: ipmitool output: Chassis Power is on Mar 1 13:34:51 xm02 stonith: [10010]: info: external/ipmi device OK. Mar 1 13:34:55 xm02 crmd: [6299]: WARN: action_timer_callback: Timer popped (timeout=60000, abort_level=1000000, complete=false) Mar 1 13:34:55 xm02 crmd: [6299]: ERROR: print_elem: Aborting transition, action lost: [Action 180]: In-flight (id: VMSVN_start_0, loc: xm02, priority: 0) Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost Mar 1 13:34:55 xm02 crmd: [6299]: WARN: cib_action_update: rsc_op 180: VMSVN_start_0 on xm02 timed out Mar 1 13:34:55 xm02 crmd: [6299]: info: create_operation_update: cib_action_update: Updating resouce VMSVN after Timed Out start op (interval=0) Mar 1 13:34:55 xm02 crmd: [6299]: info: run_graph: ==================================================== Mar 1 13:34:55 xm02 crmd: [6299]: notice: run_graph: Transition 0 (Complete=31, Pending=0, Fired=0, Skipped=35, Incomplete=37, Source=/var/lib/pengine/pe-warn-309.bz2): Stopped Mar 1 13:34:55 xm02 crmd: [6299]: info: te_graph_trigger: Transition 0 is now complete Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 78: Requesting the current CIB: S_POLICY_ENGINE Mar 1 13:34:55 xm02 crmd: [6299]: info: process_graph_event: Action VMSVN_start_0 arrived after a completed transition Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph: process_graph_event:482 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=VMSVN_start_0, magic=2:1;180:0:0:8b7a050b-901b-4 db7-b1f7-c3c5dd8a9653, cib=0.2472.134) : Inactive graph Mar 1 13:34:55 xm02 crmd: [6299]: WARN: update_failcount: Updating failcount for VMSVN on xm02 after failed start: rc=1 (update=INFINITY, time=1330619695) Mar 1 13:34:55 xm02 attrd: [6297]: info: find_hash_entry: Creating hash entry for fail-count-VMSVN Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-VMSVN (INFINITY) Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 79: Requesting the current CIB: S_POLICY_ENGINE Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_perform_update: Sent update 35: fail-count-VMSVN=INFINITY Mar 1 13:34:55 xm02 attrd: [6297]: info: find_hash_entry: Creating hash entry for last-failure-VMSVN Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-VMSVN (1330619695) Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke_callback: Invoking the PE: query=79, ref=pe_calc-dc-1330619695-20, seq=1020, quorate=0 Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph: te_update_diff:142 - Triggered transition abort (complete=1, tag=nvpair, id=status-xm02-fail-count-VMSVN, magic=NA, cib=0.2472.135) : Transient attribute: update Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_perform_update: Sent update 38: last-failure-VMSVN=1330619695 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph: te_update_diff:142 - Triggered transition abort (complete=1, tag=nvpair, id=status-xm02-last-failure-VMSVN, magic=NA, cib=0.2472.136) : Transient attribute: update Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 80: Requesting the current CIB: S_POLICY_ENGINE Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 81: Requesting the current CIB: S_POLICY_ENGINE Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation vmsvn-drbd:1_monitor_0 found resource vmsvn-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation srvsvn1-drbd:1_monitor_0 found resource srvsvn1-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation srvsvn2-drbd:1_monitor_0 found resource srvsvn2-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation vmconfig:1_monitor_0 found resource vmconfig:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing failed op VMSVN_start_0 on xm02: unknown error (1) Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke_callback: Invoking the PE: query=81, ref=pe_calc-dc-1330619695-21, seq=1020, quorate=0 Mar 1 13:34:55 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (30s) for VMSVN on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm01 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm02 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmsvn-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmsvn-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn1-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn1-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn2-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn2-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave dlm:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave clvm:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave dlm:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave clvm:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Recover VMSVN (Started xm02) Mar 1 13:34:55 xm02 crmd: [6299]: info: handle_response: pe_calc calculation pe_calc-dc-1330619695-20 is obsolete Mar 1 13:34:55 xm02 pengine: [6298]: notice: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-2288.bz2 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation vmsvn-drbd:1_monitor_0 found resource vmsvn-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation srvsvn1-drbd:1_monitor_0 found resource srvsvn1-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation srvsvn2-drbd:1_monitor_0 found resource srvsvn2-drbd:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op: Operation vmconfig:1_monitor_0 found resource vmconfig:1 active on xm02 Mar 1 13:34:55 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing failed op VMSVN_start_0 on xm02: unknown error (1) Mar 1 13:34:55 xm02 pengine: [6298]: WARN: common_apply_stickiness: Forcing VMSVN away from xm02 after 1000000 failures (max=1000000) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm01 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm02 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmsvn-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmsvn-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn1-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn1-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn2-drbd:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave srvsvn2-drbd:1 (Master xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave dlm:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave clvm:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave dlm:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave clvm:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:0 (Stopped) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:1 (Started xm02) Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Stop VMSVN (xm02) Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Mar 1 13:34:55 xm02 crmd: [6299]: info: unpack_graph: Unpacked transition 2: 2 actions in 2 synapses Mar 1 13:34:55 xm02 crmd: [6299]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1330619695-21) derived from /var/lib/pengine/pe-input-2289.bz2 Mar 1 13:34:55 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 5: stop VMSVN_stop_0 on xm02 (local) Mar 1 13:34:55 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing key=5:2:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=VMSVN_stop_0 ) Mar 1 13:34:55 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:34:55 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:34:55 xm02 pengine: [6298]: notice: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-2289.bz2 Mar 1 13:34:56 xm02 mgmtd: [6300]: info: CIB query: cib Mar 1 13:34:56 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. I got this several times until I get the following: Mar 1 13:36:16 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:17 xm02 kernel: [ 414.513459] block drbd0: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:17 xm02 kernel: [ 414.513468] block drbd1: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:17 xm02 kernel: [ 414.513708] block drbd1: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:17 xm02 kernel: [ 414.513726] block drbd1: conn( WFConnection -> WFReportParams ) Mar 1 13:36:17 xm02 kernel: [ 414.513775] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:17 xm02 kernel: [ 414.513780] block drbd0: conn( WFConnection -> WFReportParams ) Mar 1 13:36:17 xm02 kernel: [ 414.513797] block drbd0: Starting asender thread (from drbd0_receiver [5689]) Mar 1 13:36:17 xm02 kernel: [ 414.513822] block drbd1: Starting asender thread (from drbd1_receiver [5691]) Mar 1 13:36:17 xm02 kernel: [ 414.513965] block drbd1: data-integrity-alg: <not-used> Mar 1 13:36:17 xm02 kernel: [ 414.513984] block drbd1: drbd_sync_handshake: Mar 1 13:36:17 xm02 kernel: [ 414.513988] block drbd1: self E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481 bits:0 flags:0 Mar 1 13:36:17 xm02 kernel: [ 414.513992] block drbd1: peer E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481 bits:30720 flags:2 Mar 1 13:36:17 xm02 kernel: [ 414.513995] block drbd1: uuid_compare()=-1 by rule 40 Mar 1 13:36:17 xm02 kernel: [ 414.513997] block drbd1: I shall become SyncTarget, but I am primary! Mar 1 13:36:17 xm02 kernel: [ 414.514001] block drbd1: conn( WFReportParams -> Disconnecting ) Mar 1 13:36:17 xm02 kernel: [ 414.514008] block drbd1: error receiving ReportState, l: 4! Mar 1 13:36:17 xm02 kernel: [ 414.514039] block drbd1: asender terminated Mar 1 13:36:17 xm02 kernel: [ 414.514045] block drbd1: Terminating asender thread Mar 1 13:36:17 xm02 kernel: [ 414.514051] block drbd0: data-integrity-alg: <not-used> Mar 1 13:36:17 xm02 kernel: [ 414.514090] block drbd0: drbd_sync_handshake: Mar 1 13:36:17 xm02 kernel: [ 414.514095] block drbd0: self EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093 bits:0 flags:0 Mar 1 13:36:17 xm02 kernel: [ 414.514099] block drbd0: peer EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093 bits:57344 flags:2 Mar 1 13:36:17 xm02 kernel: [ 414.514103] block drbd0: uuid_compare()=-1 by rule 40 Mar 1 13:36:17 xm02 kernel: [ 414.514105] block drbd0: I shall become SyncTarget, but I am primary! Mar 1 13:36:17 xm02 kernel: [ 414.514109] block drbd0: conn( WFReportParams -> Disconnecting ) Mar 1 13:36:17 xm02 kernel: [ 414.514117] block drbd0: error receiving ReportState, l: 4! Mar 1 13:36:17 xm02 kernel: [ 414.514158] block drbd0: asender terminated Mar 1 13:36:17 xm02 kernel: [ 414.514164] block drbd0: Terminating asender thread Mar 1 13:36:17 xm02 kernel: [ 414.514253] block drbd0: Connection closed Mar 1 13:36:17 xm02 kernel: [ 414.514285] block drbd1: Connection closed Mar 1 13:36:17 xm02 kernel: [ 414.514320] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Mar 1 13:36:17 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:17 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:17 xm02 kernel: [ 414.514327] block drbd0: conn( Disconnecting -> StandAlone ) Mar 1 13:36:17 xm02 kernel: [ 414.514347] block drbd1: conn( Disconnecting -> StandAlone ) Mar 1 13:36:17 xm02 kernel: [ 414.514350] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 Mar 1 13:36:17 xm02 kernel: [ 414.514433] block drbd0: receiver terminated Mar 1 13:36:17 xm02 kernel: [ 414.514437] block drbd0: Terminating receiver thread Mar 1 13:36:17 xm02 kernel: [ 414.514473] block drbd1: receiver terminated Mar 1 13:36:17 xm02 kernel: [ 414.514475] block drbd1: Terminating receiver thread Mar 1 13:36:17 xm02 kernel: [ 414.517576] block drbd2: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:17 xm02 kernel: [ 414.517651] block drbd3: Handshake successful: Agreed network protocol version 96 Mar 1 13:36:17 xm02 kernel: [ 414.517944] block drbd3: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:17 xm02 kernel: [ 414.517956] block drbd3: conn( WFConnection -> WFReportParams ) Mar 1 13:36:17 xm02 kernel: [ 414.517986] block drbd3: Starting asender thread (from drbd3_receiver [5703]) Mar 1 13:36:17 xm02 kernel: [ 414.518045] block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC Mar 1 13:36:17 xm02 kernel: [ 414.518054] block drbd2: conn( WFConnection -> WFReportParams ) Mar 1 13:36:17 xm02 kernel: [ 414.518073] block drbd2: Starting asender thread (from drbd2_receiver [5699]) Mar 1 13:36:17 xm02 kernel: [ 414.518164] block drbd3: data-integrity-alg: <not-used> Mar 1 13:36:17 xm02 kernel: [ 414.518204] block drbd3: drbd_sync_handshake: Mar 1 13:36:17 xm02 kernel: [ 414.518210] block drbd3: self 75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001 bits:0 flags:0 Mar 1 13:36:17 xm02 kernel: [ 414.518214] block drbd3: peer 75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001 bits:5120 flags:2 Mar 1 13:36:17 xm02 kernel: [ 414.518218] block drbd3: uuid_compare()=-1 by rule 40 Mar 1 13:36:17 xm02 kernel: [ 414.518220] block drbd3: I shall become SyncTarget, but I am primary! Mar 1 13:36:17 xm02 kernel: [ 414.518233] block drbd3: conn( WFReportParams -> Disconnecting ) Mar 1 13:36:17 xm02 kernel: [ 414.518243] block drbd3: error receiving ReportState, l: 4! Mar 1 13:36:17 xm02 kernel: [ 414.518255] block drbd3: asender terminated Mar 1 13:36:17 xm02 kernel: [ 414.518258] block drbd3: Terminating asender thread Mar 1 13:36:17 xm02 kernel: [ 414.518333] block drbd3: Connection closed Mar 1 13:36:17 xm02 kernel: [ 414.518414] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 Mar 1 13:36:17 xm02 kernel: [ 414.518417] block drbd3: conn( Disconnecting -> StandAlone ) Mar 1 13:36:17 xm02 kernel: [ 414.518455] block drbd3: receiver terminated Mar 1 13:36:17 xm02 kernel: [ 414.518460] block drbd3: Terminating receiver thread Mar 1 13:36:17 xm02 kernel: [ 414.518551] block drbd2: data-integrity-alg: <not-used> Mar 1 13:36:17 xm02 kernel: [ 414.518572] block drbd2: drbd_sync_handshake: Mar 1 13:36:17 xm02 kernel: [ 414.518576] block drbd2: self 324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B bits:0 flags:0 Mar 1 13:36:17 xm02 kernel: [ 414.518580] block drbd2: peer 324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B bits:12288 flags:2 Mar 1 13:36:17 xm02 kernel: [ 414.518584] block drbd2: uuid_compare()=-1 by rule 40 Mar 1 13:36:17 xm02 kernel: [ 414.518587] block drbd2: I shall become SyncTarget, but I am primary! Mar 1 13:36:17 xm02 kernel: [ 414.518592] block drbd2: conn( WFReportParams -> Disconnecting ) Mar 1 13:36:17 xm02 kernel: [ 414.518598] block drbd2: error receiving ReportState, l: 4! Mar 1 13:36:17 xm02 kernel: [ 414.518616] block drbd2: asender terminated Mar 1 13:36:17 xm02 kernel: [ 414.518626] block drbd2: Terminating asender thread Mar 1 13:36:17 xm02 kernel: [ 414.518770] block drbd2: Connection closed Mar 1 13:36:17 xm02 kernel: [ 414.518839] block drbd2: conn( Disconnecting -> StandAlone ) Mar 1 13:36:17 xm02 kernel: [ 414.518836] block drbd2: helper command: /sbin/drbdadm fence-peer minor-2 Mar 1 13:36:17 xm02 kernel: [ 414.518869] block drbd2: receiver terminated Mar 1 13:36:17 xm02 kernel: [ 414.518871] block drbd2: Terminating receiver thread Mar 1 13:36:17 xm02 kernel: [ 414.522450] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 126 (0x7e00) Mar 1 13:36:17 xm02 kernel: [ 414.522454] block drbd0: fence-peer helper broken, returned 126 Mar 1 13:36:17 xm02 kernel: [ 414.522902] block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 126 (0x7e00) Mar 1 13:36:17 xm02 kernel: [ 414.522905] block drbd1: fence-peer helper broken, returned 126 Mar 1 13:36:17 xm02 kernel: [ 414.526993] block drbd2: helper command: /sbin/drbdadm fence-peer minor-2 exit code 126 (0x7e00) Mar 1 13:36:17 xm02 kernel: [ 414.526996] block drbd2: fence-peer helper broken, returned 126 Mar 1 13:36:17 xm02 kernel: [ 414.527230] block drbd3: helper command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00) Mar 1 13:36:17 xm02 kernel: [ 414.527233] block drbd3: fence-peer helper broken, returned 126 Mar 1 13:36:18 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:18 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:19 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:19 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:20 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:20 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:21 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:21 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:22 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:36:22 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] CLM CONFIGURATION CHANGE Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] New Configuration: Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.2) Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Left: Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Joined: Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1024: memb=1, new=0, lost=0 Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: pcmk_peer_update: memb: xm02 33554532 Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] CLM CONFIGURATION CHANGE Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] New Configuration: Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.1) Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.2) Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Left: Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: notice: ais_dispatch_message: Membership 1024: quorum acquired Mar 1 13:36:22 xm02 crmd: [6299]: notice: ais_dispatch_message: Membership 1024: quorum acquired Mar 1 13:36:22 xm02 crmd: [6299]: notice: crmd_peer_update: Status update: Client xm01/crmd now has status [online] (DC=true) Mar 1 13:36:22 xm02 cluster-dlm: [7099]: notice: ais_dispatch_message: Membership 1024: quorum acquired Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Joined: Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: info: crm_update_peer: Node xm01: id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1 born=1016 seen=1024 proc=000000000000000000000000001513 12 Mar 1 13:36:22 xm02 cib: [6295]: notice: ais_dispatch_message: Membership 1024: quorum acquired Mar 1 13:36:22 xm02 cluster-dlm: [7099]: info: crm_update_peer: Node xm01: id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1 born=1016 seen=1024 proc=00000000000000000000000000151312 Mar 1 13:36:22 xm02 crmd: [6299]: info: ais_status_callback: status: xm01 is now member (was lost) Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.1) Mar 1 13:36:22 xm02 cib: [6295]: info: crm_update_peer: Node xm01: id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1 born=1016 seen=1024 proc=00000000000000000000000000151312 Mar 1 13:36:22 xm02 cluster-dlm: update_cluster: Processing membership 1024 Mar 1 13:36:22 xm02 cib: [6295]: info: ais_dispatch_message: Membership 1024: quorum retained Mar 1 13:36:22 xm02 crmd: [6299]: info: crm_update_peer: Node xm01: id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1 born=1016 seen=1024 proc=00000000000000000000000000151312 (new) Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1024: memb=2, new=1, lost=0 Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Adding address ip(100.0.0.1) to configfs for node 16777316 Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: info: ais_dispatch_message: Membership 1024: quorum retained Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: update_member: Node 16777316/xm01 is now: member Mar 1 13:36:22 xm02 cluster-dlm: add_configfs_node: set_configfs_node 16777316 100.0.0.1 local 0 Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: pcmk_peer_update: NEW: xm01 16777316 Mar 1 13:36:22 xm02 crmd: [6299]: info: crm_update_quorum: Updating quorum status to true (call=87) Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Added active node 16777316: born-on=1016, last-seen=1024, this-event=1024, last-event=1020 Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: pcmk_peer_update: MEMB: xm01 16777316 Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Skipped active node 33554532: born-on=1016, last-seen=1024, this-event=1024, last-event=1020 Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: pcmk_peer_update: MEMB: xm02 33554532 Mar 1 13:36:22 xm02 cluster-dlm: [7099]: info: ais_dispatch_message: Membership 1024: quorum retained Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: send_member_notification: Sending membership update 1024 to 4 children Mar 1 13:36:22 xm02 corosync[6228]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: update_member: 0x6acba0 Node 16777316 (xm01) born on: 1024 Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: send_member_notification: Sending membership update 1024 to 4 children Mar 1 13:36:22 xm02 corosync[6228]: [CPG ] chosen downlist: sender r(0) ip(100.0.0.1) ; members(old:1 left:0) Mar 1 13:36:22 xm02 corosync[6228]: [MAIN ] Completed service synchronization, ready to provide service. Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='xm01']/lrm (origin=local/crmd/83, version=0.2472.138): ok (rc=0) Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='xm01']/transient_attributes (origin=local/crmd/84, version=0.2472.139) : ok (rc=0) Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/85, version=0.2472.140): ok (rc=0) Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/87, version=0.2472.142): ok (rc=0) Mar 1 13:36:22 xm02 crmd: [6299]: info: crmd_ais_dispatch: Setting expected votes to 2 Mar 1 13:36:22 xm02 crmd: [6299]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=crmd_peer_update ] Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph: do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort action done superceeded by stop Mar 1 13:36:22 xm02 crmd: [6299]: WARN: match_down_event: No match for shutdown action on xm01 Mar 1 13:36:22 xm02 crmd: [6299]: info: te_update_diff: Stonith/shutdown of xm01 not matched Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph: te_update_diff:193 - Triggered transition abort (complete=0, tag=node_state, id=xm01, magic=NA, cib=0.2472.137) : Node failure Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort action stop superceeded by restart Mar 1 13:36:22 xm02 crmd: [6299]: info: erase_xpath_callback: Deletion of "//node_state[@uname='xm01']/lrm": ok (rc=0) Mar 1 13:36:22 xm02 crmd: [6299]: info: erase_xpath_callback: Deletion of "//node_state[@uname='xm01']/transient_attributes": ok (rc=0) Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/89, version=0.2472.143): ok (rc=0) Mar 1 13:36:22 xm02 crmd: [6299]: info: ais_dispatch_message: Membership 1024: quorum retained Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/90, version=0.2472.144): ok (rc=0) Mar 1 13:36:22 xm02 crmd: [6299]: info: crmd_ais_dispatch: Setting expected votes to 2 Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph: do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/93, version=0.2472.146): ok (rc=0) Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph: do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph: do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt Mar 1 13:36:23 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. and again... until 13:38:59 when XM02 goes down: Mar 1 13:38:59 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing failed op VMSVN_stop_0 on xm02: unknown error (1) Mar 1 13:38:59 xm02 pengine: [6298]: WARN: pe_fence_node: Node xm02 will be fenced to recover from resource failure(s) Mar 1 13:38:59 xm02 pengine: [6298]: notice: common_apply_stickiness: ms_drbd_vmconfig can fail 9 more times on xm01 before being forced off Mar 1 13:38:59 xm02 pengine: [6298]: notice: common_apply_stickiness: ms_drbd_vmconfig can fail 9 more times on xm01 before being forced off Mar 1 13:38:59 xm02 pengine: [6298]: WARN: common_apply_stickiness: Forcing VMSVN away from xm02 after 1000000 failures (max=1000000) Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmsvn-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmsvn-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn1-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn1-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn2-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn2-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for dlm:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for o2cb:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for clvm:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmconfig-pri:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (30s) for VMSVN on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: WARN: stage6: Scheduling Node xm02 for STONITH Mar 1 13:38:59 xm02 pengine: [6298]: WARN: native_stop_constraints: Stop of failed resource VMSVN is implicit after xm02 is fenced Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Stop ipmi-stonith-xm01 (xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm02 (Started xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmconfig:0 (Master -> Slave xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Recover vmconfig:0 (Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmconfig:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote vmsvn-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmsvn-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote srvsvn1-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote srvsvn1-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote srvsvn2-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote srvsvn2-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave dlm:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave clvm:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move dlm:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move o2cb:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move clvm:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move vmconfig-pri:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move vg_svn:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move VMSVN (Started xm02 -> xm01) Mar 1 13:38:59 xm02 crmd: [6299]: info: handle_response: pe_calc calculation pe_calc-dc-1330619939-67 is obsolete Mar 1 13:38:59 xm02 pengine: [6298]: WARN: common_apply_stickiness: Forcing VMSVN away from xm02 after 1000000 failures (max=1000000) Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmsvn-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmsvn-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn1-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn1-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn2-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for srvsvn2-drbd:0 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for dlm:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for o2cb:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (10s) for clvm:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (20s) for vmconfig-pri:1 on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start recurring monitor (30s) for VMSVN on xm01 Mar 1 13:38:59 xm02 pengine: [6298]: WARN: stage6: Scheduling Node xm02 for STONITH Mar 1 13:38:59 xm02 pengine: [6298]: WARN: native_stop_constraints: Stop of failed resource VMSVN is implicit after xm02 is fenced Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Stop ipmi-stonith-xm01 (xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave ipmi-stonith-xm02 (Started xm01) Mar 1 13:38:59 xm02 mgmtd: [6300]: info: CIB query: cib Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmconfig:0 (Master -> Slave xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Recover vmconfig:0 (Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmconfig:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote vmsvn-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote vmsvn-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote srvsvn1-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote srvsvn1-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote srvsvn2-drbd:0 (Slave -> Master xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Demote srvsvn2-drbd:1 (Master -> Stopped xm02) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave dlm:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave o2cb:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave clvm:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move dlm:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move o2cb:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move clvm:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave vmconfig-pri:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move vmconfig-pri:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Leave vg_svn:0 (Stopped) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move vg_svn:1 (Started xm02 -> xm01) Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Move VMSVN (Started xm02 -> xm01) Mar 1 13:38:59 xm02 lrmd: [6296]: info: perform_op:2932: operation start[45] with pid 8644 on VMSVN for client 6299, its parameters: CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running. Mar 1 13:38:59 xm02 lrmd: [6296]: info: perform_op:2942: postponing all ops on resource VMSVN by 1000 ms Mar 1 13:38:59 xm02 crmd: [6299]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Mar 1 13:38:59 xm02 crmd: [6299]: info: unpack_graph: Unpacked transition 8: 128 actions in 128 synapses Mar 1 13:38:59 xm02 crmd: [6299]: info: do_te_invoke: Processing graph 8 (ref=pe_calc-dc-1330619939-68) derived from /var/lib/pengine/pe-warn-311.bz2 Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 15 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 42 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 73 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 104 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 135 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo action 184 fired and confirmed Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 216: notify vmconfig:0_pre_notify_demote_0 on xm01 Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 218: notify vmconfig:1_pre_notify_demote_0 on xm02 (local) Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing key=218:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:1_notify_0 ) Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:vmconfig:1 notify[57] (pid 13343) Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 224: notify vmsvn-drbd:0_pre_notify_demote_0 on xm01 Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 226: notify vmsvn-drbd:1_pre_notify_demote_0 on xm02 (local) Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing key=226:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmsvn-drbd:1_notify_0 ) Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:vmsvn-drbd:1 notify[58] (pid 13344) Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 232: notify srvsvn1-drbd:0_pre_notify_demote_0 on xm01 Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 234: notify srvsvn1-drbd:1_pre_notify_demote_0 on xm02 (local) Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing key=234:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn1-drbd:1_notify_0 ) Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:srvsvn1-drbd:1 notify[59] (pid 13345) Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 240: notify srvsvn2-drbd:0_pre_notify_demote_0 on xm01 Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating action 242: notify srvsvn2-drbd:1_pre_notify_demote_0 on xm02 (local) Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing key=242:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn2-drbd:1_notify_0 ) Mar 1 13:38:59 xm02 crmd: [6299]: info: te_fence_node: Executing reboot fencing operation (186) on xm02 (timeout=60000) Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: initiate_remote_stonith_op: Initiating remote operation reboot for xm02: c1be22cc-e535-441c-a674-89551a2b9d4c Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_queryQuery <stonith_command t="stonith-ng" st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_query" st_callid="0" st_callopt="0" st_ remote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02" st_device_action="reboot" st_clientid="bb653c7a-6351-4517-ad06-6fb0e20fe375" st_timeout="6000" src="xm02" seq="5" /> Mar 1 13:38:59 xm02 pengine: [6298]: WARN: process_pe_message: Transition 8: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-311.bz2 Mar 1 13:38:59 xm02 pengine: [6298]: notice: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues. Mar 1 13:38:59 xm02 lrmd: [6296]: info: operation notify[58] on vmsvn-drbd:1 for client 6299: pid 13344 exited with return code 0 Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: can_fence_host_with_device: Refreshing port list for ipmi-stonith-xm01 Mar 1 13:38:59 xm02 stonith-ng: [6294]: WARN: parse_host_line: Could not parse (0 0): Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: can_fence_host_with_device: ipmi-stonith-xm01 can not fence xm02: dynamic-list Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_query: Found 0 matching devices for 'xm02' Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_command: Processed st_query from xm02: rc=0 Mar 1 13:38:59 xm02 crmd: [6299]: info: process_lrm_event: LRM operation vmsvn-drbd:1_notify_0 (call=58, rc=0, cib-update=130, confirmed=true) ok After the storm, both nodes became online, Master/Master and VMSVN is also online. However, the cloned init-group in Pacemaker (dlm, o2cb, clvm) is not running on xm01. Feedbacks? Thanks! Daniel