Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Andreas, Lars,
Thanks much for the quick response.
I made the changes.
Here's the current drbd.conf:
global {
usage-count yes;
}
common {
protocol C;
disk {
on-io-error detach;
fencing resource-and-stonith;
}
syncer {
rate 33M;
al-extents 3389;
}
net {
allow-two-primaries; # Enable this *after* initial testing
cram-hmac-alg sha1;
shared-secret "a6a0680c40bca2439dbe48343ddddcf4";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
handlers {
fence-peer "/usr/lib/drbd/stonith_admin-fence-peer.sh";
}
}
resource vmsvn {
device /dev/drbd0;
disk /dev/sdb;
meta-disk internal;
on xm01 {
address 100.0.0.1:7788;
}
on xm02 {
address 100.0.0.2:7788;
}
}
resource srvsvn1 {
protocol C;
device /dev/drbd1;
disk /dev/sdc;
meta-disk internal;
on xm01 {
address 100.0.0.1:7789;
}
on xm02 {
address 100.0.0.2:7789;
}
}
resource srvsvn2 {
protocol C;
device /dev/drbd2;
disk /dev/sdd;
meta-disk internal;
on xm01 {
address 100.0.0.1:7790;
}
on xm02 {
address 100.0.0.2:7790;
}
}
resource vmconfig {
protocol C;
device /dev/drbd3;
meta-disk internal;
on xm01 {
address 100.0.0.1:7791;
disk /dev/vg_xm01/lv_xm01_vmconfig;
}
on xm02 {
address 100.0.0.2:7791;
disk /dev/vg_xm02/lv_xm02_vmconfig;
}
}
And here's what happened:
- rcnetwork stop on XM01 @ 1:33:00 PM:
Mar 1 13:32:59 xm01 ifdown: eth0 device: Broadcom
Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
Mar 1 13:33:00 xm01 ifdown: eth1 device: Broadcom
Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
Mar 1 13:33:01 xm01 /usr/sbin/cron[9479]: (root) CMD
(/usr/sbin/logwatch --service dmeventd)
Mar 1 13:33:01 xm01 ifdown: usb0 name: RNDIS/CDC ETHER
Mar 1 13:33:02 xm01 ifdown: vif1.0
Mar 1 13:33:02 xm01 ifdown: No configuration found for vif1.0
Mar 1 13:33:02 xm01 ifdown: Nevertheless the interface
will be shut down.
- XM01 is back:
Mar 1 13:36:35 xm01 kernel: [ 51.170175] drbd: initialized.
Version: 8.3.11 (api:88/proto:86-96)
Mar 1 13:36:35 xm01 kernel: [ 51.170178] drbd: GIT-hash:
0de839cee13a4160eed6037c4bddd066645e23c5 build by phil at fat-tyre,
2011-06-29 11:37:11
Mar 1 13:36:35 xm01 kernel: [ 51.170181] drbd: registered as block
device major 147
Mar 1 13:36:35 xm01 kernel: [ 51.170184] drbd: minor_table @
0xffff8807d66c5480
Mar 1 13:36:35 xm01 kernel: [ 51.319210] block drbd0: Starting
worker thread (from cqueue [4927])
Mar 1 13:36:35 xm01 kernel: [ 51.319283] block drbd0: disk(
Diskless -> Attaching )
Mar 1 13:36:35 xm01 kernel: klogd 1.4.1, ---------- state change ----------
Mar 1 13:36:35 xm01 kernel: [ 51.332408] block drbd0: Found 57
transactions (91 active extents) in activity log.
Mar 1 13:36:35 xm01 kernel: [ 51.332411] block drbd0: Method to
ensure write ordering: barrier
Mar 1 13:36:35 xm01 kernel: [ 51.332414] block drbd0: max BIO size = 131072
Mar 1 13:36:35 xm01 kernel: [ 51.332418] block drbd0:
drbd_bm_resize called with capacity == 1172087720
Mar 1 13:36:35 xm01 kernel: [ 51.336592] block drbd0: resync
bitmap: bits=146510965 words=2289234 pages=4472
Mar 1 13:36:35 xm01 kernel: [ 51.336598] block drbd0: size = 559
GB (586043860 KB)
Mar 1 13:36:35 xm01 kernel: [ 51.534814] block drbd0: bitmap READ
of 4472 pages took 50 jiffies
Mar 1 13:36:35 xm01 kernel: [ 51.551170] block drbd0: recounting
of set bits took additional 4 jiffies
Mar 1 13:36:35 xm01 kernel: [ 51.551174] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:35 xm01 kernel: [ 51.551231] block drbd0: Marked
additional 224 MB as out-of-sync based on AL.
Mar 1 13:36:35 xm01 kernel: [ 51.551274] block drbd0: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:35 xm01 kernel: [ 51.551296] block drbd0: 224 MB
(57344 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:35 xm01 kernel: [ 51.551304] block drbd0: disk(
Attaching -> Consistent )
Mar 1 13:36:35 xm01 kernel: [ 51.551307] block drbd0: attached to
UUIDs EEDF542BD48564B5:0000000000000000:AF298F27A3172092:AF288F27A3172093
Mar 1 13:36:35 xm01 kernel: [ 51.567908] block drbd1: Starting
worker thread (from cqueue [4927])
Mar 1 13:36:35 xm01 kernel: [ 51.567981] block drbd1: disk(
Diskless -> Attaching )
Mar 1 13:36:35 xm01 kernel: [ 51.581253] block drbd1: Found 57
transactions (57 active extents) in activity log.
Mar 1 13:36:35 xm01 kernel: [ 51.581257] block drbd1: Method to
ensure write ordering: barrier
Mar 1 13:36:35 xm01 kernel: [ 51.581260] block drbd1: max BIO size = 131072
Mar 1 13:36:35 xm01 kernel: [ 51.581265] block drbd1:
drbd_bm_resize called with capacity == 1172087720
Mar 1 13:36:35 xm01 kernel: [ 51.585510] block drbd1: resync
bitmap: bits=146510965 words=2289234 pages=4472
Mar 1 13:36:35 xm01 kernel: [ 51.585525] block drbd1: size = 559
GB (586043860 KB)
Mar 1 13:36:36 xm01 kernel: [ 51.778368] block drbd1: bitmap READ
of 4472 pages took 48 jiffies
Mar 1 13:36:36 xm01 kernel: [ 51.794740] block drbd1: recounting
of set bits took additional 4 jiffies
Mar 1 13:36:36 xm01 kernel: [ 51.794744] block drbd1: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 51.794797] block drbd1: Marked
additional 120 MB as out-of-sync based on AL.
Mar 1 13:36:36 xm01 kernel: [ 51.794838] block drbd1: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:36 xm01 kernel: [ 51.794860] block drbd1: 120 MB
(30720 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 51.794867] block drbd1: disk(
Attaching -> Consistent )
Mar 1 13:36:36 xm01 kernel: [ 51.794871] block drbd1: attached to
UUIDs E6E23470FD3656AD:0000000000000000:65C464E576893480:65C364E576893481
Mar 1 13:36:36 xm01 kernel: [ 51.811431] block drbd2: Starting
worker thread (from cqueue [4927])
Mar 1 13:36:36 xm01 kernel: [ 51.811511] block drbd2: disk(
Diskless -> Attaching )
Mar 1 13:36:36 xm01 kernel: [ 51.825901] block drbd2: Found 57
transactions (57 active extents) in activity log.
Mar 1 13:36:36 xm01 kernel: [ 51.825905] block drbd2: Method to
ensure write ordering: barrier
Mar 1 13:36:36 xm01 kernel: [ 51.825908] block drbd2: max BIO size = 131072
Mar 1 13:36:36 xm01 kernel: [ 51.825915] block drbd2:
drbd_bm_resize called with capacity == 1172087720
Mar 1 13:36:36 xm01 kernel: [ 51.830989] block drbd2: resync
bitmap: bits=146510965 words=2289234 pages=4472
Mar 1 13:36:36 xm01 kernel: [ 51.830995] block drbd2: size = 559
GB (586043860 KB)
Mar 1 13:36:36 xm01 kernel: [ 52.033592] block drbd2: bitmap READ
of 4472 pages took 51 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.050223] block drbd2: recounting
of set bits took additional 4 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.050228] block drbd2: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 52.050291] block drbd2: Marked
additional 48 MB as out-of-sync based on AL.
Mar 1 13:36:36 xm01 kernel: [ 52.050352] block drbd2: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.050382] block drbd2: 48 MB (12288
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 52.050391] block drbd2: disk(
Attaching -> Consistent )
Mar 1 13:36:36 xm01 kernel: [ 52.050396] block drbd2: attached to
UUIDs 324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B
Mar 1 13:36:36 xm01 kernel: [ 52.079074] block drbd3: Starting
worker thread (from cqueue [4927])
Mar 1 13:36:36 xm01 kernel: [ 52.079172] block drbd3: disk(
Diskless -> Attaching )
Mar 1 13:36:36 xm01 kernel: [ 52.118864] block drbd3: Found 29
transactions (29 active extents) in activity log.
Mar 1 13:36:36 xm01 kernel: [ 52.118868] block drbd3: Method to
ensure write ordering: barrier
Mar 1 13:36:36 xm01 kernel: [ 52.118872] block drbd3: max BIO size = 131072
Mar 1 13:36:36 xm01 kernel: [ 52.118877] block drbd3:
drbd_bm_resize called with capacity == 2097016
Mar 1 13:36:36 xm01 kernel: [ 52.118888] block drbd3: resync
bitmap: bits=262127 words=4096 pages=8
Mar 1 13:36:36 xm01 kernel: [ 52.118891] block drbd3: size = 1024
MB (1048508 KB)
Mar 1 13:36:36 xm01 kernel: [ 52.125476] block drbd3: bitmap READ
of 8 pages took 2 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.125509] block drbd3: recounting
of set bits took additional 0 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.125511] block drbd3: 0 KB (0
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 52.125540] block drbd3: Marked
additional 20 MB as out-of-sync based on AL.
Mar 1 13:36:36 xm01 kernel: [ 52.125543] block drbd3: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:36 xm01 kernel: [ 52.129955] block drbd3: 20 MB (5120
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:36 xm01 kernel: [ 52.129960] block drbd3: disk(
Attaching -> Consistent )
Mar 1 13:36:36 xm01 kernel: [ 52.129964] block drbd3: attached to
UUIDs 75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001
Mar 1 13:36:36 xm01 kernel: [ 52.204837] padlock: VIA PadLock Hash
Engine not detected.
Mar 1 13:36:36 xm01 modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32.49-0.3-xen/kernel/drivers/crypto/padlock-sha.ko):
No such device
Mar 1 13:36:36 xm01 kernel: [ 52.238263] block drbd0: conn(
StandAlone -> Unconnected )
Mar 1 13:36:36 xm01 kernel: [ 52.238301] block drbd0: Starting
receiver thread (from drbd0_worker [4938])
Mar 1 13:36:36 xm01 kernel: [ 52.238341] block drbd0: receiver (re)started
Mar 1 13:36:36 xm01 kernel: [ 52.238349] block drbd0: conn(
Unconnected -> WFConnection )
Mar 1 13:36:36 xm01 kernel: [ 52.241205] block drbd1: conn(
StandAlone -> Unconnected )
Mar 1 13:36:36 xm01 kernel: [ 52.241238] block drbd1: Starting
receiver thread (from drbd1_worker [4960])
Mar 1 13:36:36 xm01 kernel: [ 52.241311] block drbd1: receiver (re)started
Mar 1 13:36:36 xm01 kernel: [ 52.241318] block drbd1: conn(
Unconnected -> WFConnection )
Mar 1 13:36:36 xm01 kernel: [ 52.243718] block drbd2: conn(
StandAlone -> Unconnected )
Mar 1 13:36:36 xm01 kernel: [ 52.243743] block drbd2: Starting
receiver thread (from drbd2_worker [4986])
Mar 1 13:36:36 xm01 kernel: [ 52.243808] block drbd2: receiver (re)started
Mar 1 13:36:36 xm01 kernel: [ 52.243817] block drbd2: conn(
Unconnected -> WFConnection )
Mar 1 13:36:36 xm01 kernel: [ 52.246305] block drbd3: conn(
StandAlone -> Unconnected )
Mar 1 13:36:36 xm01 kernel: [ 52.246337] block drbd3: Starting
receiver thread (from drbd3_worker [5016])
Mar 1 13:36:36 xm01 kernel: [ 52.246406] block drbd3: receiver (re)started
Mar 1 13:36:36 xm01 kernel: [ 52.246415] block drbd3: conn(
Unconnected -> WFConnection )
Mar 1 13:36:37 xm01 kernel: [ 52.738908] block drbd1: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:37 xm01 kernel: [ 52.738985] block drbd0: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:37 xm01 kernel: [ 52.739113] block drbd1: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:37 xm01 kernel: [ 52.739122] block drbd1: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:37 xm01 kernel: [ 52.739141] block drbd0: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:37 xm01 kernel: [ 52.739146] block drbd0: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:37 xm01 kernel: [ 52.739182] block drbd1: Starting
asender thread (from drbd1_receiver [5114])
Mar 1 13:36:37 xm01 kernel: [ 52.739191] block drbd0: Starting
asender thread (from drbd0_receiver [5110])
Mar 1 13:36:37 xm01 kernel: [ 52.739298] block drbd0:
data-integrity-alg: <not-used>
Mar 1 13:36:37 xm01 kernel: [ 52.739316] block drbd0: drbd_sync_handshake:
Mar 1 13:36:37 xm01 kernel: [ 52.739320] block drbd0: self
EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093
bits:57344 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.739324] block drbd0: peer
EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093
bits:0 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.739328] block drbd0:
uuid_compare()=1 by rule 40
Mar 1 13:36:37 xm01 kernel: [ 52.739334] block drbd0: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> Consistent )
Mar 1 13:36:37 xm01 kernel: [ 52.739374] block drbd1:
data-integrity-alg: <not-used>
Mar 1 13:36:37 xm01 kernel: [ 52.739389] block drbd1: drbd_sync_handshake:
Mar 1 13:36:37 xm01 kernel: [ 52.739393] block drbd1: self
E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481
bits:30720 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.739397] block drbd1: peer
E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481
bits:0 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.739400] block drbd1:
uuid_compare()=1 by rule 40
Mar 1 13:36:37 xm01 kernel: [ 52.739406] block drbd1: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> Consistent )
Mar 1 13:36:37 xm01 kernel: [ 52.739584] block drbd1: meta
connection shut down by peer.
Mar 1 13:36:37 xm01 kernel: [ 52.739590] block drbd1: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:36:37 xm01 kernel: [ 52.739646] block drbd0: sock_sendmsg
returned -32
Mar 1 13:36:37 xm01 kernel: [ 52.739651] block drbd0: peer(
Primary -> Unknown ) conn( WFBitMapS -> BrokenPipe ) pdsk( Consistent
-> DUnknown )
Mar 1 13:36:37 xm01 kernel: [ 52.739657] block drbd0: short sent
ReportBitMap size=4096 sent=3172
Mar 1 13:36:37 xm01 kernel: [ 52.739674] block drbd0: meta
connection shut down by peer.
Mar 1 13:36:37 xm01 kernel: [ 52.739683] block drbd0: asender terminated
Mar 1 13:36:37 xm01 kernel: [ 52.739687] block drbd0: Terminating
asender thread
Mar 1 13:36:37 xm01 kernel: [ 52.739738] block drbd1: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:37 xm01 kernel: [ 52.741865] block drbd1: 120 MB
(30720 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:37 xm01 kernel: [ 52.743017] block drbd2: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:37 xm01 kernel: [ 52.743091] block drbd3: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:37 xm01 kernel: [ 52.743270] block drbd2: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:37 xm01 kernel: [ 52.743278] block drbd2: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:37 xm01 kernel: [ 52.743309] block drbd2: Starting
asender thread (from drbd2_receiver [5120])
Mar 1 13:36:37 xm01 kernel: [ 52.743341] block drbd3: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:37 xm01 kernel: [ 52.743348] block drbd3: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:37 xm01 kernel: [ 52.743410] block drbd3: Starting
asender thread (from drbd3_receiver [5124])
Mar 1 13:36:37 xm01 kernel: [ 52.743494] block drbd3:
data-integrity-alg: <not-used>
Mar 1 13:36:37 xm01 kernel: [ 52.743532] block drbd3: drbd_sync_handshake:
Mar 1 13:36:37 xm01 kernel: [ 52.743536] block drbd3: self
75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001
bits:5120 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.743540] block drbd3: peer
75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001
bits:0 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.743543] block drbd3:
uuid_compare()=1 by rule 40
Mar 1 13:36:37 xm01 kernel: [ 52.743550] block drbd3: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> Consistent )
Mar 1 13:36:37 xm01 kernel: [ 52.743733] block drbd3: meta
connection shut down by peer.
Mar 1 13:36:37 xm01 kernel: [ 52.743740] block drbd3: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:36:37 xm01 kernel: [ 52.743878] block drbd3: sock_sendmsg
returned -32
Mar 1 13:36:37 xm01 kernel: [ 52.743884] block drbd3: short sent
ReportBitMap size=4096 sent=276
Mar 1 13:36:37 xm01 kernel: [ 52.743894] block drbd2:
data-integrity-alg: <not-used>
Mar 1 13:36:37 xm01 kernel: [ 52.743905] block drbd3: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:37 xm01 kernel: [ 52.743908] block drbd2: drbd_sync_handshake:
Mar 1 13:36:37 xm01 kernel: [ 52.743914] block drbd2: self
324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B
bits:12288 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.743918] block drbd2: peer
324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B
bits:0 flags:0
Mar 1 13:36:37 xm01 kernel: [ 52.743921] block drbd2:
uuid_compare()=1 by rule 40
Mar 1 13:36:37 xm01 kernel: [ 52.743928] block drbd2: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) disk(
Consistent -> UpToDate ) pdsk( DUnknown -> Consistent )
Mar 1 13:36:37 xm01 kernel: [ 52.744091] block drbd2: meta
connection shut down by peer.
Mar 1 13:36:37 xm01 kernel: [ 52.744097] block drbd2: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:36:37 xm01 kernel: [ 52.744279] block drbd2: sock_sendmsg
returned -32
Mar 1 13:36:37 xm01 kernel: [ 52.744283] block drbd2: short sent
ReportBitMap size=4096 sent=2180
Mar 1 13:36:37 xm01 kernel: [ 52.744335] block drbd2: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:37 xm01 kernel: [ 52.747349] block drbd2: 48 MB (12288
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:37 xm01 kernel: [ 52.747833] block drbd1: asender terminated
Mar 1 13:36:37 xm01 kernel: [ 52.747837] block drbd1: Terminating
asender thread
Mar 1 13:36:37 xm01 kernel: [ 52.747902] block drbd1: Connection closed
Mar 1 13:36:37 xm01 kernel: [ 52.747908] block drbd1: conn(
NetworkFailure -> Unconnected )
Mar 1 13:36:37 xm01 kernel: [ 52.747915] block drbd1: receiver terminated
Mar 1 13:36:37 xm01 kernel: [ 52.747917] block drbd1: Restarting
receiver thread
Mar 1 13:36:37 xm01 kernel: [ 52.747933] block drbd1: receiver (re)started
Mar 1 13:36:37 xm01 kernel: [ 52.747938] block drbd1: conn(
Unconnected -> WFConnection )
Mar 1 13:36:37 xm01 kernel: [ 52.749723] block drbd0: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:36:37 xm01 kernel: [ 52.749734] block drbd0: 224 MB
(57344 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:37 xm01 kernel: [ 52.749775] block drbd0: Connection closed
Mar 1 13:36:37 xm01 kernel: [ 52.749780] block drbd0: conn(
BrokenPipe -> Unconnected )
Mar 1 13:36:37 xm01 kernel: [ 52.749787] block drbd0: receiver terminated
Mar 1 13:36:37 xm01 kernel: [ 52.749789] block drbd0: Restarting
receiver thread
Mar 1 13:36:37 xm01 kernel: [ 52.749792] block drbd0: receiver (re)started
Mar 1 13:36:37 xm01 kernel: [ 52.749796] block drbd0: conn(
Unconnected -> WFConnection )
Mar 1 13:36:37 xm01 kernel: [ 52.753343] block drbd2: asender terminated
Mar 1 13:36:37 xm01 kernel: [ 52.753347] block drbd2: Terminating
asender thread
Mar 1 13:36:37 xm01 kernel: [ 52.753391] block drbd2: Connection closed
Mar 1 13:36:37 xm01 kernel: [ 52.753395] block drbd2: conn(
NetworkFailure -> Unconnected )
Mar 1 13:36:37 xm01 kernel: [ 52.753399] block drbd2: receiver terminated
Mar 1 13:36:37 xm01 kernel: [ 52.753401] block drbd2: Restarting
receiver thread
Mar 1 13:36:37 xm01 kernel: [ 52.753403] block drbd2: receiver (re)started
Mar 1 13:36:37 xm01 kernel: [ 52.753407] block drbd2: conn(
Unconnected -> WFConnection )
Mar 1 13:36:37 xm01 kernel: [ 52.754182] block drbd3: 20 MB (5120
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:36:37 xm01 kernel: [ 52.769214] block drbd3: asender terminated
Mar 1 13:36:37 xm01 kernel: [ 52.769222] block drbd3: Terminating
asender thread
Mar 1 13:36:37 xm01 kernel: [ 52.769303] block drbd3: Connection closed
Mar 1 13:36:37 xm01 kernel: [ 52.769309] block drbd3: conn(
NetworkFailure -> Unconnected )
Mar 1 13:36:37 xm01 kernel: [ 52.769317] block drbd3: receiver terminated
Mar 1 13:36:37 xm01 kernel: [ 52.769320] block drbd3: Restarting
receiver thread
Mar 1 13:36:37 xm01 kernel: [ 52.769322] block drbd3: receiver (re)started
Mar 1 13:36:37 xm01 kernel: [ 52.769327] block drbd3: conn(
Unconnected -> WFConnection )
...
Mar 1 13:37:17 xm01 kernel: [ 93.073374] block drbd0: Handshake
successful: Agreed network protocol version 96
Mar 1 13:37:17 xm01 kernel: [ 93.073589] block drbd0: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:37:17 xm01 kernel: [ 93.073609] block drbd0: conn(
WFConnection -> WFReportParams )
Mar 1 13:37:17 xm01 kernel: [ 93.073647] block drbd0: Starting
asender thread (from drbd0_receiver [5110])
Mar 1 13:37:17 xm01 kernel: [ 93.073768] block drbd0:
data-integrity-alg: <not-used>
Mar 1 13:37:17 xm01 kernel: [ 93.073786] block drbd0: drbd_sync_handshake:
Mar 1 13:37:17 xm01 kernel: [ 93.073790] block drbd0: self
EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093
bits:57344 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.073794] block drbd0: peer
EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093
bits:0 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.073798] block drbd0:
uuid_compare()=1 by rule 40
Mar 1 13:37:17 xm01 kernel: [ 93.073804] block drbd0: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk(
DUnknown -> Consistent )
Mar 1 13:37:17 xm01 kernel: [ 93.073985] block drbd0: sock_sendmsg
returned -32
Mar 1 13:37:17 xm01 kernel: [ 93.073990] block drbd0: peer(
Primary -> Unknown ) conn( WFBitMapS -> BrokenPipe ) pdsk( Consistent
-> DUnknown )
Mar 1 13:37:17 xm01 kernel: [ 93.073998] block drbd0: short sent
ReportBitMap size=4096 sent=732
Mar 1 13:37:17 xm01 kernel: [ 93.074015] block drbd0: meta
connection shut down by peer.
Mar 1 13:37:17 xm01 kernel: [ 93.074021] block drbd0: asender terminated
Mar 1 13:37:17 xm01 kernel: [ 93.074024] block drbd0: Terminating
asender thread
Mar 1 13:37:17 xm01 kernel: [ 93.077364] block drbd3: Handshake
successful: Agreed network protocol version 96
Mar 1 13:37:17 xm01 kernel: [ 93.078584] block drbd3: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:37:17 xm01 kernel: [ 93.078593] block drbd3: conn(
WFConnection -> WFReportParams )
Mar 1 13:37:17 xm01 kernel: [ 93.078633] block drbd3: Starting
asender thread (from drbd3_receiver [5124])
Mar 1 13:37:17 xm01 kernel: [ 93.078756] block drbd3:
data-integrity-alg: <not-used>
Mar 1 13:37:17 xm01 kernel: [ 93.078786] block drbd3: drbd_sync_handshake:
Mar 1 13:37:17 xm01 kernel: [ 93.078790] block drbd3: self
75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001
bits:5120 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.078794] block drbd3: peer
75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001
bits:0 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.078797] block drbd3:
uuid_compare()=1 by rule 40
Mar 1 13:37:17 xm01 kernel: [ 93.078803] block drbd3: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk(
DUnknown -> Consistent )
Mar 1 13:37:17 xm01 kernel: [ 93.078925] block drbd3: meta
connection shut down by peer.
Mar 1 13:37:17 xm01 kernel: [ 93.078930] block drbd3: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:37:17 xm01 kernel: [ 93.078970] block drbd3: sock_sendmsg
returned -32
Mar 1 13:37:17 xm01 kernel: [ 93.078975] block drbd3: short sent
ReportBitMap size=4096 sent=276
Mar 1 13:37:17 xm01 kernel: [ 93.078983] block drbd3: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:37:17 xm01 kernel: [ 93.084657] block drbd0: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:37:17 xm01 kernel: [ 93.084668] block drbd0: 224 MB
(57344 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:37:17 xm01 kernel: [ 93.084678] block drbd0: Connection closed
Mar 1 13:37:17 xm01 kernel: [ 93.084683] block drbd0: conn(
BrokenPipe -> Unconnected )
Mar 1 13:37:17 xm01 kernel: [ 93.084687] block drbd0: receiver terminated
Mar 1 13:37:17 xm01 kernel: [ 93.084689] block drbd0: Restarting
receiver thread
Mar 1 13:37:17 xm01 kernel: [ 93.084692] block drbd0: receiver (re)started
Mar 1 13:37:17 xm01 kernel: [ 93.084696] block drbd0: conn(
Unconnected -> WFConnection )
Mar 1 13:37:17 xm01 kernel: [ 93.089359] block drbd1: Handshake
successful: Agreed network protocol version 96
Mar 1 13:37:17 xm01 kernel: [ 93.089575] block drbd1: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:37:17 xm01 kernel: [ 93.089582] block drbd1: conn(
WFConnection -> WFReportParams )
Mar 1 13:37:17 xm01 kernel: [ 93.089595] block drbd1: Starting
asender thread (from drbd1_receiver [5114])
Mar 1 13:37:17 xm01 kernel: [ 93.089691] block drbd1:
data-integrity-alg: <not-used>
Mar 1 13:37:17 xm01 kernel: [ 93.089745] block drbd1: drbd_sync_handshake:
Mar 1 13:37:17 xm01 kernel: [ 93.089749] block drbd1: self
E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481
bits:30720 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.089753] block drbd1: peer
E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481
bits:0 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.089757] block drbd1:
uuid_compare()=1 by rule 40
Mar 1 13:37:17 xm01 kernel: [ 93.089762] block drbd1: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk(
DUnknown -> Consistent )
Mar 1 13:37:17 xm01 kernel: [ 93.089862] block drbd1: meta
connection shut down by peer.
Mar 1 13:37:17 xm01 kernel: [ 93.089868] block drbd1: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:37:17 xm01 kernel: [ 93.089931] block drbd1: sock_sendmsg
returned -32
Mar 1 13:37:17 xm01 kernel: [ 93.089935] block drbd1: short sent
ReportBitMap size=4096 sent=2180
Mar 1 13:37:17 xm01 kernel: [ 93.089985] block drbd1: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:37:17 xm01 kernel: [ 93.094402] block drbd1: 120 MB
(30720 bits) marked out-of-sync by on disk bit-map.
Mar 1 13:37:17 xm01 kernel: [ 93.100362] block drbd1: asender terminated
Mar 1 13:37:17 xm01 kernel: [ 93.100367] block drbd1: Terminating
asender thread
Mar 1 13:37:17 xm01 kernel: [ 93.100451] block drbd1: Connection closed
Mar 1 13:37:17 xm01 kernel: [ 93.100456] block drbd1: conn(
NetworkFailure -> Unconnected )
Mar 1 13:37:17 xm01 kernel: [ 93.100464] block drbd1: receiver terminated
Mar 1 13:37:17 xm01 kernel: [ 93.100466] block drbd1: Restarting
receiver thread
Mar 1 13:37:17 xm01 kernel: [ 93.100468] block drbd1: receiver (re)started
Mar 1 13:37:17 xm01 kernel: [ 93.100472] block drbd1: conn(
Unconnected -> WFConnection )
Mar 1 13:37:17 xm01 kernel: [ 93.102859] block drbd3: 20 MB (5120
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:37:17 xm01 kernel: [ 93.119786] block drbd3: asender terminated
Mar 1 13:37:17 xm01 kernel: [ 93.119794] block drbd3: Terminating
asender thread
Mar 1 13:37:17 xm01 kernel: [ 93.119847] block drbd3: Connection closed
Mar 1 13:37:17 xm01 kernel: [ 93.119853] block drbd3: conn(
NetworkFailure -> Unconnected )
Mar 1 13:37:17 xm01 kernel: [ 93.119859] block drbd3: receiver terminated
Mar 1 13:37:17 xm01 kernel: [ 93.119861] block drbd3: Restarting
receiver thread
Mar 1 13:37:17 xm01 kernel: [ 93.119864] block drbd3: receiver (re)started
Mar 1 13:37:17 xm01 kernel: [ 93.119868] block drbd3: conn(
Unconnected -> WFConnection )
Mar 1 13:37:17 xm01 kernel: [ 93.625232] block drbd2: Handshake
successful: Agreed network protocol version 96
Mar 1 13:37:17 xm01 kernel: [ 93.625450] block drbd2: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:37:17 xm01 kernel: [ 93.625460] block drbd2: conn(
WFConnection -> WFReportParams )
Mar 1 13:37:17 xm01 kernel: [ 93.625476] block drbd2: Starting
asender thread (from drbd2_receiver [5120])
Mar 1 13:37:17 xm01 kernel: [ 93.625592] block drbd2:
data-integrity-alg: <not-used>
Mar 1 13:37:17 xm01 kernel: [ 93.625639] block drbd2: drbd_sync_handshake:
Mar 1 13:37:17 xm01 kernel: [ 93.625643] block drbd2: self
324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B
bits:12288 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.625647] block drbd2: peer
324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B
bits:0 flags:0
Mar 1 13:37:17 xm01 kernel: [ 93.625651] block drbd2:
uuid_compare()=1 by rule 40
Mar 1 13:37:17 xm01 kernel: [ 93.625657] block drbd2: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk(
DUnknown -> Consistent )
Mar 1 13:37:17 xm01 kernel: [ 93.625804] block drbd2: meta
connection shut down by peer.
Mar 1 13:37:17 xm01 kernel: [ 93.625812] block drbd2: peer(
Primary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk(
Consistent -> DUnknown )
Mar 1 13:37:17 xm01 kernel: [ 93.625819] block drbd2: sock_sendmsg
returned -32
Mar 1 13:37:17 xm01 kernel: [ 93.625824] block drbd2: short sent
ReportBitMap size=4096 sent=2180
Mar 1 13:37:17 xm01 kernel: [ 93.625875] block drbd2: bitmap WRITE
of 0 pages took 0 jiffies
Mar 1 13:37:17 xm01 kernel: [ 93.632366] block drbd2: 48 MB (12288
bits) marked out-of-sync by on disk bit-map.
Mar 1 13:37:17 xm01 kernel: [ 93.638339] block drbd2: asender terminated
Mar 1 13:37:17 xm01 kernel: [ 93.638344] block drbd2: Terminating
asender thread
Mar 1 13:37:17 xm01 kernel: [ 93.638395] block drbd2: Connection closed
Mar 1 13:37:17 xm01 kernel: [ 93.638400] block drbd2: conn(
NetworkFailure -> Unconnected )
Mar 1 13:37:17 xm01 kernel: [ 93.638405] block drbd2: receiver terminated
Mar 1 13:37:17 xm01 kernel: [ 93.638407] block drbd2: Restarting
receiver thread
Mar 1 13:37:17 xm01 kernel: [ 93.638409] block drbd2: receiver (re)started
Mar 1 13:37:18 xm01 kernel: [ 93.638413] block drbd2: conn(
Unconnected -> WFConnection )
Mar 1 13:37:19 xm01 lrmd: [5649]: info: rsc:vmconfig:0 promote[20] (pid 6032)
Mar 1 13:37:19 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stdout) allow-two-primaries;
Mar 1 13:37:19 xm01 kernel: [ 94.962300] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3
Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) 3: State change failed: (-7) Refusing to
be Primary while peer is not outdated
Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) Command 'drbdsetup 3 primary
Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) ' terminated with exit code 11
Mar 1 13:37:20 xm01 kernel: [ 95.978113] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00)
Mar 1 13:37:20 xm01 kernel: [ 95.978117] block drbd3: fence-peer
helper broken, returned 126
Mar 1 13:37:20 xm01 kernel: [ 95.978124] block drbd3: State change
failed: Refusing to be Primary while peer is not outdated
Mar 1 13:37:20 xm01 kernel: [ 95.978128] block drbd3: state = {
cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown r----- }
Mar 1 13:37:20 xm01 kernel: [ 95.978132] block drbd3: wanted = {
cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---F- }
Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Called drbdadm -c
/etc/drbd.conf primary vmconfig
Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Exit code 11
Mar 1 13:37:20 xm01 drbd[6032]: ERROR: vmconfig: Command output:
Mar 1 13:37:20 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stdout)
Mar 1 13:37:20 xm01 kernel: [ 96.012979] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3
Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) 3: State change failed: (-7) Refusing to
be Primary while peer is not outdated
Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) Command 'drbdsetup 3 primary
Mar 1 13:37:21 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:promote:stderr) ' terminated with exit code 11
Mar 1 13:37:21 xm01 kernel: [ 97.020366] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00)
Mar 1 13:37:21 xm01 kernel: [ 97.020369] block drbd3: fence-peer
helper broken, returned 126
Mar 1 13:37:21 xm01 kernel: [ 97.020375] block drbd3: State change
failed: Refusing to be Primary while peer is not outdated
Mar 1 13:37:21 xm01 kernel: [ 97.020379] block drbd3: state = {
cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown r----- }
Mar 1 13:37:21 xm01 kernel: [ 97.020383] block drbd3: wanted = {
cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---F- }
Mar 1 13:37:21 xm01 drbd[6032]: ERROR: vmconfig: Called drbdadm -c
/etc/drbd.conf primary vmconfig
Mar 1 13:37:21 xm01 drbd[6032]: ERROR: vmconfig: Exit code 11
several times until I get this:
Mar 1 13:38:47 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:promote:stdout)
Mar 1 13:38:48 xm01 kernel: [ 184.088528] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3
Mar 1 13:38:49 xm01 lrmd: [5649]: WARN: vmconfig:0:promote process
(PID 6032) timed out (try 1). Killing with signal SIGTERM (15).
Mar 1 13:38:49 xm01 lrmd: [5649]: WARN: operation promote[20] on
vmconfig:0 for client 5652: pid 6032 timed out
Mar 1 13:38:49 xm01 crmd: [5652]: ERROR: process_lrm_event: LRM
operation vmconfig:0_promote_0 (20) Timed Out (timeout=90000ms)
Mar 1 13:38:49 xm01 attrd: [5650]: notice: attrd_ais_dispatch:
Update relayed from xm02
Mar 1 13:38:49 xm01 attrd: [5650]: info: find_hash_entry: Creating
hash entry for fail-count-vmconfig:0
Mar 1 13:38:49 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing
key=211:6:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:0_notify_0 )
Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_local_callback:
Expanded fail-count-vmconfig:0=value++ to 1
Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-vmconfig:0 (1)
Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_perform_update: Sent
update 33: fail-count-vmconfig:0=1
Mar 1 13:38:49 xm01 lrmd: [5649]: info: rsc:vmconfig:0 notify[21] (pid 7100)
Mar 1 13:38:49 xm01 attrd: [5650]: notice: attrd_ais_dispatch:
Update relayed from xm02
Mar 1 13:38:49 xm01 attrd: [5650]: info: find_hash_entry: Creating
hash entry for last-failure-vmconfig:0
Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-vmconfig:0 (1330619909)
Mar 1 13:38:49 xm01 attrd: [5650]: info: attrd_perform_update: Sent
update 36: last-failure-vmconfig:0=1330619909
Mar 1 13:38:52 xm01 lrmd: [5649]: info: RA output:
(vmconfig:0:notify:stderr) lock on /var/lock/drbd-147-3 currently
held by pid:7099
Mar 1 13:38:52 xm01 crm_attribute: [7128]: info: Invoked:
crm_attribute -N xm01 -n master-vmconfig:0 -l reboot -D
Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_trigger_update:
Sending flush op to all hosts for: master-vmconfig:0 (<null>)
Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_perform_update: Sent
delete 38: node=xm01, attr=master-vmconfig:0, id=<n/a>, set=(null),
section=status
Mar 1 13:38:52 xm01 attrd: [5650]: info: attrd_perform_update: Sent
delete 40: node=xm01, attr=master-vmconfig:0, id=<n/a>, set=(null),
section=status
Mar 1 13:38:52 xm01 lrmd: [5649]: info: RA output: (vmconfig:0:notify:stdout)
Mar 1 13:38:52 xm01 lrmd: [5649]: info: operation notify[21] on
vmconfig:0 for client 5652: pid 7100 exited with return code 0
Mar 1 13:38:52 xm01 crmd: [5652]: info: process_lrm_event: LRM
operation vmconfig:0_notify_0 (call=21, rc=0, cib-update=26, confirmed=true) ok
Mar 1 13:38:55 xm01 external/ipmi[7135]: [7146]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:38:56 xm01 stonith: [7131]: info: external/ipmi device OK.
Mar 1 13:39:01 xm01 /usr/sbin/cron[7148]: (root) CMD
(/usr/sbin/logwatch --service dmeventd)
Mar 1 13:39:11 xm01 external/ipmi[7176]: [7187]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:39:12 xm01 stonith: [7172]: info: external/ipmi device OK.
Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing
key=216:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:0_notify_0 )
Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:vmconfig:0 notify[22] (pid 7188)
Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing
key=224:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmsvn-drbd:0_notify_0 )
Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:vmsvn-drbd:0 notify[23] (pid 7189)
Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing
key=232:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn1-drbd:0_notify_0 )
Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:srvsvn1-drbd:0
notify[24] (pid 7190)
Mar 1 13:39:19 xm01 crmd: [5652]: info: do_lrm_rsc_op: Performing
key=240:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn2-drbd:0_notify_0 )
Mar 1 13:39:19 xm01 lrmd: [5649]: info: rsc:srvsvn2-drbd:0
notify[25] (pid 7191)
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: crm_new_peer: Node
xm02 now has id: 33554532
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: crm_new_peer: Node
33554532 is now known as xm02
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_queryQuery
<stonith_command t="stonith-ng"
st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_query"
st_callid="0" st_callopt="0" st_
remote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02"
st_device_action="reboot"
st_clientid="bb653c7a-6351-4517-ad06-6fb0e20fe375" st_timeout="6000"
src="xm02" seq="5" />
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info:
can_fence_host_with_device: Refreshing port list for ipmi-stonith-xm02
Mar 1 13:39:19 xm01 stonith-ng: [5647]: WARN: parse_host_line: Could
not parse (0 0):
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info:
can_fence_host_with_device: ipmi-stonith-xm02 can fence xm02: dynamic-list
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_query: Found 1
matching devices for 'xm02'
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_fenceExec
<stonith_command t="stonith-ng"
st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_fence"
st_callid="0" st_callopt="0" st_r
emote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02"
st_device_action="reboot" st_timeout="54000" src="xm02" seq="7" />
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info:
can_fence_host_with_device: ipmi-stonith-xm02 can fence xm02: dynamic-list
Mar 1 13:39:19 xm01 stonith-ng: [5647]: info: stonith_fence: Found 1
matching devices for 'xm02'
Mar 1 13:39:20 xm01 external/ipmi[7288]: [7302]: debug: ipmitool
output: Chassis Power Control: Reset
Mar 1 13:39:21 xm01 stonith-ng: [5647]: info: log_operation:
Operation 'reboot' [7277] for host 'xm02' with device
'ipmi-stonith-xm02' returned: 0 (call 0 from (null))
Mar 1 13:39:21 xm01 lrmd: [5649]: info: operation notify[22] on
vmconfig:0 for client 5652: pid 7188 exited with return code 0
Mar 1 13:39:21 xm01 crmd: [5652]: info: process_lrm_event: LRM
operation vmconfig:0_notify_0 (call=22, rc=0, cib-update=27, confirmed=true) ok
Mar 1 13:39:22 xm01 kernel: [ 218.177661] bnx2: eth1 NIC Copper Link is Down
Mar 1 13:39:24 xm01 kernel: [ 220.488280] bnx2: eth1 NIC Copper
Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Mar 1 13:39:24 xm01 corosync[5621]: [TOTEM ] A processor failed,
forming new configuration.
Mar 1 13:39:27 xm01 external/ipmi[7311]: [7322]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:39:28 xm01 stonith: [7307]: info: external/ipmi device OK.
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] CLM CONFIGURATION CHANGE
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] New Configuration:
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.1)
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Left:
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.2)
Mar 1 13:39:30 xm01 cib: [5648]: notice: ais_dispatch_message:
Membership 1028: quorum lost
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Joined:
Mar 1 13:39:30 xm01 crmd: [5652]: notice: ais_dispatch_message:
Membership 1028: quorum lost
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 1028: memb=1,
new=0, lost=1
Mar 1 13:39:30 xm01 cib: [5648]: info: crm_update_peer: Node xm02:
id=33554532 state=lost (new) addr=r(0) ip(100.0.0.2) votes=1
born=1016 seen=1024 proc=00000000000000000000000000151312
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info:
pcmk_peer_update: memb: xm01 16777316
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info:
pcmk_peer_update: lost: xm02 33554532
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] CLM CONFIGURATION CHANGE
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] New Configuration:
Mar 1 13:39:30 xm01 crmd: [5652]: info: ais_status_callback: status:
xm02 is now lost (was member)
Mar 1 13:39:30 xm01 crmd: [5652]: info: crm_update_peer: Node xm02:
id=33554532 state=lost (new) addr=r(0) ip(100.0.0.2) votes=1
born=1016 seen=1024 proc=00000000000000000000000000151312
Mar 1 13:39:30 xm01 stonith-ng: [5647]: info:
process_remote_stonith_execExecResult <st-reply
st_origin="stonith_construct_async_reply" t="stonith-ng"
st_op="st_notify" st_remote_op="c1be22cc-e535-
441c-a674-89551a2b9d4c" st_callid="0" st_callopt="0" st_rc="0"
st_output="Performing: stonith -t external/ipmi -T reset xm02
success: xm02 0 " src="xm01" seq="2" />
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] r(0) ip(100.0.0.1)
Mar 1 13:39:30 xm01 crmd: [5652]: WARN: check_dead_member: Our DC
node (xm02) left the cluster
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Left:
Mar 1 13:39:30 xm01 stonith-ng: [5647]: info: remote_op_done:
Notifing clients of c1be22cc-e535-441c-a674-89551a2b9d4c (reboot of
xm02 from bb653c7a-6351-4517-ad06-6fb0e20fe375 by xm01): 0, rc=0
Mar 1 13:39:30 xm01 corosync[5621]: [CLM ] Members Joined:
Mar 1 13:39:30 xm01 stonith-ng: [5647]: info: stonith_notify_client:
Sending st_fence-notification to client
5652/c9e6b033-73f2-43a9-b848-81bffa3c6d9b
Mar 1 13:39:30 xm01 crmd: [5652]: info: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=check_dead_member ]
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 1028: memb=1, new=0, lost=0
Mar 1 13:39:30 xm01 crmd: [5652]: info: update_dc: Unset DC xm02
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info:
pcmk_peer_update: MEMB: xm01 16777316
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info:
ais_mark_unseen_peer_dead: Node xm02 was not seen in the previous transition
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info: update_member:
Node 33554532/xm02 is now: lost
Mar 1 13:39:30 xm01 corosync[5621]: [pcmk ] info:
send_member_notification: Sending membership update 1028 to 2 children
Mar 1 13:39:30 xm01 corosync[5621]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Mar 1 13:39:30 xm01 corosync[5621]: [CPG ] chosen downlist:
sender r(0) ip(100.0.0.1) ; members(old:2 left:1)
Mar 1 13:39:30 xm01 crmd: [5652]: info: tengine_stonith_notify: Peer
xm02 was terminated (reboot) by xm01 for xm02
(ref=c1be22cc-e535-441c-a674-89551a2b9d4c): OK
Mar 1 13:39:30 xm01 crmd: [5652]: notice: tengine_stonith_notify:
Target was our leader xm02/xm02 (recorded leader: <unset>)
Mar 1 13:39:30 xm01 corosync[5621]: [MAIN ] Completed service
synchronization, ready to provide service.
Mar 1 13:39:30 xm01 crmd: [5652]: info: send_stonith_update: Sending
fencing update 28 for xm02
Mar 1 13:39:30 xm01 crmd: [5652]: notice: crmd_peer_update: Status
update: Client xm02/crmd now has status [offline] (DC=<null>)
Mar 1 13:39:30 xm01 crmd: [5652]: info: crm_update_peer: Node xm02:
id=33554532 state=lost addr=r(0) ip(100.0.0.2) votes=1 born=1016
seen=1024 proc=00000000000000000000000000000001 (new)
Mar 1 13:39:30 xm01 crmd: [5652]: info: cib_fencing_updated: Fencing
update 28 for xm02: complete
Mar 1 13:39:30 xm01 crmd: [5652]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
@ 13:39:32 XM02 has been stonithed.
WHY????
With the drbd.conf modifications, I no longer have the constraints
(which is fine!) and they both become Master. BUT...
The VM never fails over to XM02 as it should when XM01 goes down.
Here's the XM02 log between 13:33:00 and 13:36:40 when XM01 is up again.
Mar 1 13:32:56 xm02 mgmtd: [6300]: info: CIB query: cib
Mar 1 13:33:01 xm02 /usr/sbin/cron[8783]: (root) CMD
(/usr/sbin/logwatch --service dmeventd)
Mar 1 13:33:09 xm02 external/ipmi[8858]: [8869]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:33:10 xm02 stonith: [8854]: info: external/ipmi device OK.
Mar 1 13:33:21 xm02 kernel: [ 238.815026] bnx2: eth1 NIC Copper Link is Down
Mar 1 13:33:23 xm02 kernel: [ 241.298581] bnx2: eth1 NIC Copper
Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Mar 1 13:33:27 xm02 external/ipmi[9035]: [9061]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:33:28 xm02 stonith: [9031]: info: external/ipmi device OK.
Mar 1 13:33:36 xm02 kernel: [ 254.005922] bnx2: eth1 NIC Copper Link is Down
Mar 1 13:33:39 xm02 kernel: [ 256.432743] bnx2: eth1 NIC Copper
Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Mar 1 13:33:39 xm02 kernel: [ 256.820486] bnx2: eth1 NIC Copper Link is Down
Mar 1 13:33:41 xm02 kernel: [ 259.290456] bnx2: eth1 NIC Copper
Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Mar 1 13:33:44 xm02 external/ipmi[9254]: [9265]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:33:45 xm02 stonith: [9250]: info: external/ipmi device OK.
Mar 1 13:33:55 xm02 lrmd: [6296]: WARN: VMSVN:start process (PID
8644) timed out (try 1). Killing with signal SIGTERM (15).
Mar 1 13:34:00 xm02 lrmd: [6296]: WARN: VMSVN:start process (PID
8644) timed out (try 2). Killing with signal SIGKILL (9).
Mar 1 13:34:00 xm02 external/ipmi[9478]: [9489]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:34:01 xm02 /usr/sbin/cron[9491]: (root) CMD
(/usr/sbin/logwatch --service dmeventd)
Mar 1 13:34:01 xm02 stonith: [9474]: info: external/ipmi device OK.
Mar 1 13:34:05 xm02 lrmd: [6296]: ERROR: TrackedProcTimeoutFunction:
VMSVN:start process (PID 8644) will not die!
Mar 1 13:34:17 xm02 external/ipmi[9697]: [9708]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:34:18 xm02 stonith: [9693]: info: external/ipmi device OK.
Mar 1 13:34:31 xm02 kernel: [ 308.659429] bnx2: eth1 NIC Copper Link is Down
Mar 1 13:34:33 xm02 external/ipmi[9804]: [9815]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:34:33 xm02 kernel: [ 311.171354] bnx2: eth1 NIC Copper
Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Mar 1 13:34:34 xm02 stonith: [9800]: info: external/ipmi device OK.
Mar 1 13:34:50 xm02 external/ipmi[10014]: [10025]: debug: ipmitool
output: Chassis Power is on
Mar 1 13:34:51 xm02 stonith: [10010]: info: external/ipmi device OK.
Mar 1 13:34:55 xm02 crmd: [6299]: WARN: action_timer_callback: Timer
popped (timeout=60000, abort_level=1000000, complete=false)
Mar 1 13:34:55 xm02 crmd: [6299]: ERROR: print_elem: Aborting
transition, action lost: [Action 180]: In-flight (id: VMSVN_start_0,
loc: xm02, priority: 0)
Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph:
action_timer_callback:486 - Triggered transition abort (complete=0) :
Action lost
Mar 1 13:34:55 xm02 crmd: [6299]: WARN: cib_action_update: rsc_op
180: VMSVN_start_0 on xm02 timed out
Mar 1 13:34:55 xm02 crmd: [6299]: info: create_operation_update:
cib_action_update: Updating resouce VMSVN after Timed Out start op (interval=0)
Mar 1 13:34:55 xm02 crmd: [6299]: info: run_graph:
====================================================
Mar 1 13:34:55 xm02 crmd: [6299]: notice: run_graph: Transition 0
(Complete=31, Pending=0, Fired=0, Skipped=35, Incomplete=37,
Source=/var/lib/pengine/pe-warn-309.bz2): Stopped
Mar 1 13:34:55 xm02 crmd: [6299]: info: te_graph_trigger: Transition
0 is now complete
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 78:
Requesting the current CIB: S_POLICY_ENGINE
Mar 1 13:34:55 xm02 crmd: [6299]: info: process_graph_event: Action
VMSVN_start_0 arrived after a completed transition
Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph:
process_graph_event:482 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=VMSVN_start_0, magic=2:1;180:0:0:8b7a050b-901b-4
db7-b1f7-c3c5dd8a9653, cib=0.2472.134) : Inactive graph
Mar 1 13:34:55 xm02 crmd: [6299]: WARN: update_failcount: Updating
failcount for VMSVN on xm02 after failed start: rc=1
(update=INFINITY, time=1330619695)
Mar 1 13:34:55 xm02 attrd: [6297]: info: find_hash_entry: Creating
hash entry for fail-count-VMSVN
Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-VMSVN (INFINITY)
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 79:
Requesting the current CIB: S_POLICY_ENGINE
Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_perform_update: Sent
update 35: fail-count-VMSVN=INFINITY
Mar 1 13:34:55 xm02 attrd: [6297]: info: find_hash_entry: Creating
hash entry for last-failure-VMSVN
Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-VMSVN (1330619695)
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke_callback:
Invoking the PE: query=79, ref=pe_calc-dc-1330619695-20, seq=1020, quorate=0
Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph:
te_update_diff:142 - Triggered transition abort (complete=1,
tag=nvpair, id=status-xm02-fail-count-VMSVN, magic=NA, cib=0.2472.135) :
Transient attribute: update
Mar 1 13:34:55 xm02 attrd: [6297]: info: attrd_perform_update: Sent
update 38: last-failure-VMSVN=1330619695
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar 1 13:34:55 xm02 crmd: [6299]: info: abort_transition_graph:
te_update_diff:142 - Triggered transition abort (complete=1,
tag=nvpair, id=status-xm02-last-failure-VMSVN, magic=NA, cib=0.2472.136)
: Transient attribute: update
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 80:
Requesting the current CIB: S_POLICY_ENGINE
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke: Query 81:
Requesting the current CIB: S_POLICY_ENGINE
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation vmsvn-drbd:1_monitor_0 found resource vmsvn-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation srvsvn1-drbd:1_monitor_0 found resource srvsvn1-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation srvsvn2-drbd:1_monitor_0 found resource srvsvn2-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation vmconfig:1_monitor_0 found resource vmconfig:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing
failed op VMSVN_start_0 on xm02: unknown error (1)
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_pe_invoke_callback:
Invoking the PE: query=81, ref=pe_calc-dc-1330619695-21, seq=1020, quorate=0
Mar 1 13:34:55 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (30s) for VMSVN on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm01 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm02 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmsvn-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmsvn-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn1-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn1-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn2-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn2-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Recover
VMSVN (Started xm02)
Mar 1 13:34:55 xm02 crmd: [6299]: info: handle_response: pe_calc
calculation pe_calc-dc-1330619695-20 is obsolete
Mar 1 13:34:55 xm02 pengine: [6298]: notice: process_pe_message:
Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-2288.bz2
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation vmsvn-drbd:1_monitor_0 found resource vmsvn-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation srvsvn1-drbd:1_monitor_0 found resource srvsvn1-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation srvsvn2-drbd:1_monitor_0 found resource srvsvn2-drbd:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: notice: unpack_rsc_op:
Operation vmconfig:1_monitor_0 found resource vmconfig:1 active on xm02
Mar 1 13:34:55 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing
failed op VMSVN_start_0 on xm02: unknown error (1)
Mar 1 13:34:55 xm02 pengine: [6298]: WARN: common_apply_stickiness:
Forcing VMSVN away from xm02 after 1000000 failures (max=1000000)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm01 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm02 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmsvn-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmsvn-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn1-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn1-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn2-drbd:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave srvsvn2-drbd:1 (Master xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:0 (Stopped)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:1 (Started xm02)
Mar 1 13:34:55 xm02 pengine: [6298]: notice: LogActions: Stop VMSVN (xm02)
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Mar 1 13:34:55 xm02 crmd: [6299]: info: unpack_graph: Unpacked
transition 2: 2 actions in 2 synapses
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_te_invoke: Processing
graph 2 (ref=pe_calc-dc-1330619695-21) derived from
/var/lib/pengine/pe-input-2289.bz2
Mar 1 13:34:55 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 5: stop VMSVN_stop_0 on xm02 (local)
Mar 1 13:34:55 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing
key=5:2:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=VMSVN_stop_0 )
Mar 1 13:34:55 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:34:55 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:34:55 xm02 pengine: [6298]: notice: process_pe_message:
Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-2289.bz2
Mar 1 13:34:56 xm02 mgmtd: [6300]: info: CIB query: cib
Mar 1 13:34:56 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
I got this several times until I get the following:
Mar 1 13:36:16 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:17 xm02 kernel: [ 414.513459] block drbd0: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:17 xm02 kernel: [ 414.513468] block drbd1: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:17 xm02 kernel: [ 414.513708] block drbd1: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:17 xm02 kernel: [ 414.513726] block drbd1: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:17 xm02 kernel: [ 414.513775] block drbd0: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:17 xm02 kernel: [ 414.513780] block drbd0: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:17 xm02 kernel: [ 414.513797] block drbd0: Starting
asender thread (from drbd0_receiver [5689])
Mar 1 13:36:17 xm02 kernel: [ 414.513822] block drbd1: Starting
asender thread (from drbd1_receiver [5691])
Mar 1 13:36:17 xm02 kernel: [ 414.513965] block drbd1:
data-integrity-alg: <not-used>
Mar 1 13:36:17 xm02 kernel: [ 414.513984] block drbd1: drbd_sync_handshake:
Mar 1 13:36:17 xm02 kernel: [ 414.513988] block drbd1: self
E6E23470FD3656AD:0000000000000000:65C464E576893481:65C364E576893481
bits:0 flags:0
Mar 1 13:36:17 xm02 kernel: [ 414.513992] block drbd1: peer
E6E23470FD3656AC:0000000000000000:65C464E576893480:65C364E576893481
bits:30720 flags:2
Mar 1 13:36:17 xm02 kernel: [ 414.513995] block drbd1:
uuid_compare()=-1 by rule 40
Mar 1 13:36:17 xm02 kernel: [ 414.513997] block drbd1: I shall
become SyncTarget, but I am primary!
Mar 1 13:36:17 xm02 kernel: [ 414.514001] block drbd1: conn(
WFReportParams -> Disconnecting )
Mar 1 13:36:17 xm02 kernel: [ 414.514008] block drbd1: error
receiving ReportState, l: 4!
Mar 1 13:36:17 xm02 kernel: [ 414.514039] block drbd1: asender terminated
Mar 1 13:36:17 xm02 kernel: [ 414.514045] block drbd1: Terminating
asender thread
Mar 1 13:36:17 xm02 kernel: [ 414.514051] block drbd0:
data-integrity-alg: <not-used>
Mar 1 13:36:17 xm02 kernel: [ 414.514090] block drbd0: drbd_sync_handshake:
Mar 1 13:36:17 xm02 kernel: [ 414.514095] block drbd0: self
EEDF542BD48564B5:0000000000000000:AF298F27A3172093:AF288F27A3172093
bits:0 flags:0
Mar 1 13:36:17 xm02 kernel: [ 414.514099] block drbd0: peer
EEDF542BD48564B4:0000000000000000:AF298F27A3172092:AF288F27A3172093
bits:57344 flags:2
Mar 1 13:36:17 xm02 kernel: [ 414.514103] block drbd0:
uuid_compare()=-1 by rule 40
Mar 1 13:36:17 xm02 kernel: [ 414.514105] block drbd0: I shall
become SyncTarget, but I am primary!
Mar 1 13:36:17 xm02 kernel: [ 414.514109] block drbd0: conn(
WFReportParams -> Disconnecting )
Mar 1 13:36:17 xm02 kernel: [ 414.514117] block drbd0: error
receiving ReportState, l: 4!
Mar 1 13:36:17 xm02 kernel: [ 414.514158] block drbd0: asender terminated
Mar 1 13:36:17 xm02 kernel: [ 414.514164] block drbd0: Terminating
asender thread
Mar 1 13:36:17 xm02 kernel: [ 414.514253] block drbd0: Connection closed
Mar 1 13:36:17 xm02 kernel: [ 414.514285] block drbd1: Connection closed
Mar 1 13:36:17 xm02 kernel: [ 414.514320] block drbd0: helper
command: /sbin/drbdadm fence-peer minor-0
Mar 1 13:36:17 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:17 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:17 xm02 kernel: [ 414.514327] block drbd0: conn(
Disconnecting -> StandAlone )
Mar 1 13:36:17 xm02 kernel: [ 414.514347] block drbd1: conn(
Disconnecting -> StandAlone )
Mar 1 13:36:17 xm02 kernel: [ 414.514350] block drbd1: helper
command: /sbin/drbdadm fence-peer minor-1
Mar 1 13:36:17 xm02 kernel: [ 414.514433] block drbd0: receiver terminated
Mar 1 13:36:17 xm02 kernel: [ 414.514437] block drbd0: Terminating
receiver thread
Mar 1 13:36:17 xm02 kernel: [ 414.514473] block drbd1: receiver terminated
Mar 1 13:36:17 xm02 kernel: [ 414.514475] block drbd1: Terminating
receiver thread
Mar 1 13:36:17 xm02 kernel: [ 414.517576] block drbd2: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:17 xm02 kernel: [ 414.517651] block drbd3: Handshake
successful: Agreed network protocol version 96
Mar 1 13:36:17 xm02 kernel: [ 414.517944] block drbd3: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:17 xm02 kernel: [ 414.517956] block drbd3: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:17 xm02 kernel: [ 414.517986] block drbd3: Starting
asender thread (from drbd3_receiver [5703])
Mar 1 13:36:17 xm02 kernel: [ 414.518045] block drbd2: Peer
authenticated using 20 bytes of 'sha1' HMAC
Mar 1 13:36:17 xm02 kernel: [ 414.518054] block drbd2: conn(
WFConnection -> WFReportParams )
Mar 1 13:36:17 xm02 kernel: [ 414.518073] block drbd2: Starting
asender thread (from drbd2_receiver [5699])
Mar 1 13:36:17 xm02 kernel: [ 414.518164] block drbd3:
data-integrity-alg: <not-used>
Mar 1 13:36:17 xm02 kernel: [ 414.518204] block drbd3: drbd_sync_handshake:
Mar 1 13:36:17 xm02 kernel: [ 414.518210] block drbd3: self
75C7AE841CB0682F:0000000000000000:99ABCCCBF1E4D001:99AACCCBF1E4D001
bits:0 flags:0
Mar 1 13:36:17 xm02 kernel: [ 414.518214] block drbd3: peer
75C7AE841CB0682E:0000000000000000:99ABCCCBF1E4D000:99AACCCBF1E4D001
bits:5120 flags:2
Mar 1 13:36:17 xm02 kernel: [ 414.518218] block drbd3:
uuid_compare()=-1 by rule 40
Mar 1 13:36:17 xm02 kernel: [ 414.518220] block drbd3: I shall
become SyncTarget, but I am primary!
Mar 1 13:36:17 xm02 kernel: [ 414.518233] block drbd3: conn(
WFReportParams -> Disconnecting )
Mar 1 13:36:17 xm02 kernel: [ 414.518243] block drbd3: error
receiving ReportState, l: 4!
Mar 1 13:36:17 xm02 kernel: [ 414.518255] block drbd3: asender terminated
Mar 1 13:36:17 xm02 kernel: [ 414.518258] block drbd3: Terminating
asender thread
Mar 1 13:36:17 xm02 kernel: [ 414.518333] block drbd3: Connection closed
Mar 1 13:36:17 xm02 kernel: [ 414.518414] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3
Mar 1 13:36:17 xm02 kernel: [ 414.518417] block drbd3: conn(
Disconnecting -> StandAlone )
Mar 1 13:36:17 xm02 kernel: [ 414.518455] block drbd3: receiver terminated
Mar 1 13:36:17 xm02 kernel: [ 414.518460] block drbd3: Terminating
receiver thread
Mar 1 13:36:17 xm02 kernel: [ 414.518551] block drbd2:
data-integrity-alg: <not-used>
Mar 1 13:36:17 xm02 kernel: [ 414.518572] block drbd2: drbd_sync_handshake:
Mar 1 13:36:17 xm02 kernel: [ 414.518576] block drbd2: self
324E9CEEF0227FAD:0000000000000000:F91D77DB4FF3672B:F91C77DB4FF3672B
bits:0 flags:0
Mar 1 13:36:17 xm02 kernel: [ 414.518580] block drbd2: peer
324E9CEEF0227FAC:0000000000000000:F91D77DB4FF3672A:F91C77DB4FF3672B
bits:12288 flags:2
Mar 1 13:36:17 xm02 kernel: [ 414.518584] block drbd2:
uuid_compare()=-1 by rule 40
Mar 1 13:36:17 xm02 kernel: [ 414.518587] block drbd2: I shall
become SyncTarget, but I am primary!
Mar 1 13:36:17 xm02 kernel: [ 414.518592] block drbd2: conn(
WFReportParams -> Disconnecting )
Mar 1 13:36:17 xm02 kernel: [ 414.518598] block drbd2: error
receiving ReportState, l: 4!
Mar 1 13:36:17 xm02 kernel: [ 414.518616] block drbd2: asender terminated
Mar 1 13:36:17 xm02 kernel: [ 414.518626] block drbd2: Terminating
asender thread
Mar 1 13:36:17 xm02 kernel: [ 414.518770] block drbd2: Connection closed
Mar 1 13:36:17 xm02 kernel: [ 414.518839] block drbd2: conn(
Disconnecting -> StandAlone )
Mar 1 13:36:17 xm02 kernel: [ 414.518836] block drbd2: helper
command: /sbin/drbdadm fence-peer minor-2
Mar 1 13:36:17 xm02 kernel: [ 414.518869] block drbd2: receiver terminated
Mar 1 13:36:17 xm02 kernel: [ 414.518871] block drbd2: Terminating
receiver thread
Mar 1 13:36:17 xm02 kernel: [ 414.522450] block drbd0: helper
command: /sbin/drbdadm fence-peer minor-0 exit code 126 (0x7e00)
Mar 1 13:36:17 xm02 kernel: [ 414.522454] block drbd0: fence-peer
helper broken, returned 126
Mar 1 13:36:17 xm02 kernel: [ 414.522902] block drbd1: helper
command: /sbin/drbdadm fence-peer minor-1 exit code 126 (0x7e00)
Mar 1 13:36:17 xm02 kernel: [ 414.522905] block drbd1: fence-peer
helper broken, returned 126
Mar 1 13:36:17 xm02 kernel: [ 414.526993] block drbd2: helper
command: /sbin/drbdadm fence-peer minor-2 exit code 126 (0x7e00)
Mar 1 13:36:17 xm02 kernel: [ 414.526996] block drbd2: fence-peer
helper broken, returned 126
Mar 1 13:36:17 xm02 kernel: [ 414.527230] block drbd3: helper
command: /sbin/drbdadm fence-peer minor-3 exit code 126 (0x7e00)
Mar 1 13:36:17 xm02 kernel: [ 414.527233] block drbd3: fence-peer
helper broken, returned 126
Mar 1 13:36:18 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:18 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:19 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:19 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:20 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:20 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:21 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:21 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:22 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:36:22 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] CLM CONFIGURATION CHANGE
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] New Configuration:
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.2)
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Left:
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Joined:
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 1024: memb=1,
new=0, lost=0
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
pcmk_peer_update: memb: xm02 33554532
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] CLM CONFIGURATION CHANGE
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] New Configuration:
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.1)
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.2)
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Left:
Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: notice:
ais_dispatch_message: Membership 1024: quorum acquired
Mar 1 13:36:22 xm02 crmd: [6299]: notice: ais_dispatch_message:
Membership 1024: quorum acquired
Mar 1 13:36:22 xm02 crmd: [6299]: notice: crmd_peer_update: Status
update: Client xm01/crmd now has status [online] (DC=true)
Mar 1 13:36:22 xm02 cluster-dlm: [7099]: notice:
ais_dispatch_message: Membership 1024: quorum acquired
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] Members Joined:
Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: info: crm_update_peer:
Node xm01: id=16777316 state=member (new) addr=r(0)
ip(100.0.0.1) votes=1 born=1016 seen=1024 proc=000000000000000000000000001513
12
Mar 1 13:36:22 xm02 cib: [6295]: notice: ais_dispatch_message:
Membership 1024: quorum acquired
Mar 1 13:36:22 xm02 cluster-dlm: [7099]: info: crm_update_peer: Node
xm01: id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1
born=1016 seen=1024 proc=00000000000000000000000000151312
Mar 1 13:36:22 xm02 crmd: [6299]: info: ais_status_callback: status:
xm01 is now member (was lost)
Mar 1 13:36:22 xm02 corosync[6228]: [CLM ] r(0) ip(100.0.0.1)
Mar 1 13:36:22 xm02 cib: [6295]: info: crm_update_peer: Node xm01:
id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1
born=1016 seen=1024 proc=00000000000000000000000000151312
Mar 1 13:36:22 xm02 cluster-dlm: update_cluster: Processing membership 1024
Mar 1 13:36:22 xm02 cib: [6295]: info: ais_dispatch_message:
Membership 1024: quorum retained
Mar 1 13:36:22 xm02 crmd: [6299]: info: crm_update_peer: Node xm01:
id=16777316 state=member (new) addr=r(0) ip(100.0.0.1) votes=1
born=1016 seen=1024 proc=00000000000000000000000000151312 (new)
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 1024: memb=2, new=1, lost=0
Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Adding address
ip(100.0.0.1) to configfs for node 16777316
Mar 1 13:36:22 xm02 ocfs2_controld: [7172]: info:
ais_dispatch_message: Membership 1024: quorum retained
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: update_member:
Node 16777316/xm01 is now: member
Mar 1 13:36:22 xm02 cluster-dlm: add_configfs_node:
set_configfs_node 16777316 100.0.0.1 local 0
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
pcmk_peer_update: NEW: xm01 16777316
Mar 1 13:36:22 xm02 crmd: [6299]: info: crm_update_quorum: Updating
quorum status to true (call=87)
Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Added active node
16777316: born-on=1016, last-seen=1024, this-event=1024, last-event=1020
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
pcmk_peer_update: MEMB: xm01 16777316
Mar 1 13:36:22 xm02 cluster-dlm: dlm_process_node: Skipped active
node 33554532: born-on=1016, last-seen=1024, this-event=1024, last-event=1020
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
pcmk_peer_update: MEMB: xm02 33554532
Mar 1 13:36:22 xm02 cluster-dlm: [7099]: info: ais_dispatch_message:
Membership 1024: quorum retained
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
send_member_notification: Sending membership update 1024 to 4 children
Mar 1 13:36:22 xm02 corosync[6228]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info: update_member:
0x6acba0 Node 16777316 (xm01) born on: 1024
Mar 1 13:36:22 xm02 corosync[6228]: [pcmk ] info:
send_member_notification: Sending membership update 1024 to 4 children
Mar 1 13:36:22 xm02 corosync[6228]: [CPG ] chosen downlist:
sender r(0) ip(100.0.0.1) ; members(old:1 left:0)
Mar 1 13:36:22 xm02 corosync[6228]: [MAIN ] Completed service
synchronization, ready to provide service.
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='xm01']/lrm (origin=local/crmd/83,
version=0.2472.138): ok (rc=0)
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='xm01']/transient_attributes
(origin=local/crmd/84, version=0.2472.139)
: ok (rc=0)
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/85, version=0.2472.140): ok (rc=0)
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_modify for section cib
(origin=local/crmd/87, version=0.2472.142): ok (rc=0)
Mar 1 13:36:22 xm02 crmd: [6299]: info: crmd_ais_dispatch: Setting
expected votes to 2
Mar 1 13:36:22 xm02 crmd: [6299]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=crmd_peer_update ]
Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph:
do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt
Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort
action done superceeded by stop
Mar 1 13:36:22 xm02 crmd: [6299]: WARN: match_down_event: No match
for shutdown action on xm01
Mar 1 13:36:22 xm02 crmd: [6299]: info: te_update_diff:
Stonith/shutdown of xm01 not matched
Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph:
te_update_diff:193 - Triggered transition abort (complete=0,
tag=node_state, id=xm01, magic=NA, cib=0.2472.137) : Node failure
Mar 1 13:36:22 xm02 crmd: [6299]: info: update_abort_priority: Abort
action stop superceeded by restart
Mar 1 13:36:22 xm02 crmd: [6299]: info: erase_xpath_callback:
Deletion of "//node_state[@uname='xm01']/lrm": ok (rc=0)
Mar 1 13:36:22 xm02 crmd: [6299]: info: erase_xpath_callback:
Deletion of "//node_state[@uname='xm01']/transient_attributes": ok (rc=0)
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/89, version=0.2472.143): ok (rc=0)
Mar 1 13:36:22 xm02 crmd: [6299]: info: ais_dispatch_message:
Membership 1024: quorum retained
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/90, version=0.2472.144): ok (rc=0)
Mar 1 13:36:22 xm02 crmd: [6299]: info: crmd_ais_dispatch: Setting
expected votes to 2
Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph:
do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt
Mar 1 13:36:22 xm02 cib: [6295]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/93, version=0.2472.146): ok (rc=0)
Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph:
do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt
Mar 1 13:36:22 xm02 crmd: [6299]: info: abort_transition_graph:
do_te_invoke:175 - Triggered transition abort (complete=0) : Peer Halt
Mar 1 13:36:23 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
and again...
until 13:38:59 when XM02 goes down:
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: unpack_rsc_op: Processing
failed op VMSVN_stop_0 on xm02: unknown error (1)
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: pe_fence_node: Node xm02
will be fenced to recover from resource failure(s)
Mar 1 13:38:59 xm02 pengine: [6298]: notice:
common_apply_stickiness: ms_drbd_vmconfig can fail 9 more times on
xm01 before being forced off
Mar 1 13:38:59 xm02 pengine: [6298]: notice:
common_apply_stickiness: ms_drbd_vmconfig can fail 9 more times on
xm01 before being forced off
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: common_apply_stickiness:
Forcing VMSVN away from xm02 after 1000000 failures (max=1000000)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmsvn-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmsvn-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn1-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn1-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn2-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn2-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for dlm:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for o2cb:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for clvm:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmconfig-pri:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (30s) for VMSVN on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: stage6: Scheduling Node
xm02 for STONITH
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: native_stop_constraints:
Stop of failed resource VMSVN is implicit after xm02 is fenced
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Stop ipmi-stonith-xm01 (xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm02 (Started xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmconfig:0 (Master -> Slave xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Recover
vmconfig:0 (Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmconfig:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
vmsvn-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmsvn-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
srvsvn1-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote srvsvn1-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
srvsvn2-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote srvsvn2-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move dlm:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move o2cb:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move clvm:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move vmconfig-pri:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move vg_svn:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move VMSVN (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 crmd: [6299]: info: handle_response: pe_calc
calculation pe_calc-dc-1330619939-67 is obsolete
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: common_apply_stickiness:
Forcing VMSVN away from xm02 after 1000000 failures (max=1000000)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmsvn-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmsvn-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn1-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn1-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn2-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for srvsvn2-drbd:0 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for dlm:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for o2cb:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (10s) for clvm:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (20s) for vmconfig-pri:1 on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: notice: RecurringOp: Start
recurring monitor (30s) for VMSVN on xm01
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: stage6: Scheduling Node
xm02 for STONITH
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: native_stop_constraints:
Stop of failed resource VMSVN is implicit after xm02 is fenced
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Stop ipmi-stonith-xm01 (xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave ipmi-stonith-xm02 (Started xm01)
Mar 1 13:38:59 xm02 mgmtd: [6300]: info: CIB query: cib
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmconfig:0 (Master -> Slave xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Recover
vmconfig:0 (Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmconfig:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
vmsvn-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote vmsvn-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
srvsvn1-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote srvsvn1-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions: Promote
srvsvn2-drbd:0 (Slave -> Master xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Demote srvsvn2-drbd:1 (Master -> Stopped xm02)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave dlm:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave o2cb:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave clvm:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move dlm:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move o2cb:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move clvm:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave vmconfig-pri:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move vmconfig-pri:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Leave vg_svn:0 (Stopped)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move vg_svn:1 (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 pengine: [6298]: notice: LogActions:
Move VMSVN (Started xm02 -> xm01)
Mar 1 13:38:59 xm02 lrmd: [6296]: info: perform_op:2932: operation
start[45] with pid 8644 on VMSVN for client 6299, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.5] xmfile=[/etc/xen/v
m/vmsvn] CRM_meta_timeout=[60000] for rsc is already running.
Mar 1 13:38:59 xm02 lrmd: [6296]: info: perform_op:2942: postponing
all ops on resource VMSVN by 1000 ms
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Mar 1 13:38:59 xm02 crmd: [6299]: info: unpack_graph: Unpacked
transition 8: 128 actions in 128 synapses
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_te_invoke: Processing
graph 8 (ref=pe_calc-dc-1330619939-68) derived from
/var/lib/pengine/pe-warn-311.bz2
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 15 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 42 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 73 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 104 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 135 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_pseudo_action: Pseudo
action 184 fired and confirmed
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 216: notify vmconfig:0_pre_notify_demote_0 on xm01
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 218: notify vmconfig:1_pre_notify_demote_0 on xm02 (local)
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing
key=218:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmconfig:1_notify_0 )
Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:vmconfig:1 notify[57] (pid 13343)
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 224: notify vmsvn-drbd:0_pre_notify_demote_0 on xm01
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 226: notify vmsvn-drbd:1_pre_notify_demote_0 on xm02 (local)
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing
key=226:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=vmsvn-drbd:1_notify_0 )
Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:vmsvn-drbd:1 notify[58]
(pid 13344)
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 232: notify srvsvn1-drbd:0_pre_notify_demote_0 on xm01
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 234: notify srvsvn1-drbd:1_pre_notify_demote_0 on xm02 (local)
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing
key=234:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn1-drbd:1_notify_0 )
Mar 1 13:38:59 xm02 lrmd: [6296]: info: rsc:srvsvn1-drbd:1
notify[59] (pid 13345)
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 240: notify srvsvn2-drbd:0_pre_notify_demote_0 on xm01
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_rsc_command: Initiating
action 242: notify srvsvn2-drbd:1_pre_notify_demote_0 on xm02 (local)
Mar 1 13:38:59 xm02 crmd: [6299]: info: do_lrm_rsc_op: Performing
key=242:8:0:8b7a050b-901b-4db7-b1f7-c3c5dd8a9653 op=srvsvn2-drbd:1_notify_0 )
Mar 1 13:38:59 xm02 crmd: [6299]: info: te_fence_node: Executing
reboot fencing operation (186) on xm02 (timeout=60000)
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info:
initiate_remote_stonith_op: Initiating remote operation reboot for
xm02: c1be22cc-e535-441c-a674-89551a2b9d4c
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_queryQuery
<stonith_command t="stonith-ng"
st_async_id="c1be22cc-e535-441c-a674-89551a2b9d4c" st_op="st_query"
st_callid="0" st_callopt="0" st_
remote_op="c1be22cc-e535-441c-a674-89551a2b9d4c" st_target="xm02"
st_device_action="reboot"
st_clientid="bb653c7a-6351-4517-ad06-6fb0e20fe375" st_timeout="6000"
src="xm02" seq="5" />
Mar 1 13:38:59 xm02 pengine: [6298]: WARN: process_pe_message:
Transition 8: WARNINGs found during PE processing. PEngine Input
stored in: /var/lib/pengine/pe-warn-311.bz2
Mar 1 13:38:59 xm02 pengine: [6298]: notice: process_pe_message:
Configuration WARNINGs found during PE processing. Please run
"crm_verify -L" to identify issues.
Mar 1 13:38:59 xm02 lrmd: [6296]: info: operation notify[58] on
vmsvn-drbd:1 for client 6299: pid 13344 exited with return code 0
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info:
can_fence_host_with_device: Refreshing port list for ipmi-stonith-xm01
Mar 1 13:38:59 xm02 stonith-ng: [6294]: WARN: parse_host_line: Could
not parse (0 0):
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info:
can_fence_host_with_device: ipmi-stonith-xm01 can not fence xm02: dynamic-list
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_query: Found 0
matching devices for 'xm02'
Mar 1 13:38:59 xm02 stonith-ng: [6294]: info: stonith_command:
Processed st_query from xm02: rc=0
Mar 1 13:38:59 xm02 crmd: [6299]: info: process_lrm_event: LRM
operation vmsvn-drbd:1_notify_0 (call=58, rc=0, cib-update=130,
confirmed=true) ok
After the storm, both nodes became online, Master/Master and VMSVN is
also online. However, the cloned init-group in Pacemaker (dlm, o2cb,
clvm) is not running on xm01.
Feedbacks?
Thanks!
Daniel