<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello,<br>
First I setup a perfect dual primary setup, then I configured the
pacemaker cluster resource to start the drbd resource. As soon
cluster starts<br>
the drbd resource splait-brain occurs, please let me know what I am
doing wrong.<br>
<br>
<br>
Here is the drbd configuration:<br>
<br>
global_common.conf:<br>
global { usage-count no; }<br>
common {<br>
handlers {<br>
pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f";<br>
pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f";<br>
local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f";<br>
split-brain "/usr/lib/drbd/notify-split-brain.sh root";<br>
}<br>
<br>
startup { wfc-timeout 0; degr-wfc-timeout 120; become-primary-on
both; }<br>
<br>
disk { on-io-error detach; al-extents 3389; }<br>
<br>
net { <br>
allow-two-primaries; after-sb-0pri
discard-zero-changes;<br>
after-sb-1pri discard-secondary; after-sb-2pri
disconnect;<br>
max-buffers 8000; max-epoch-size 8000;<br>
sndbuf-size 0; verify-alg md5;<br>
ping-int 2; ping-timeout 2;<br>
connect-int 2; timeout 5; ko-count 5;<br>
}<br>
}<br>
<br>
r0.res:<br>
resource r0 {<br>
device /dev/drbd_r0 minor 0;<br>
meta-disk internal;<br>
on node1 {<br>
disk "/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0-part1";<br>
address 172.16.241.131:7780;<br>
}<br>
on node2 {<br>
disk "/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0-part1";<br>
address 172.16.241.132:7780;<br>
}<br>
syncer { rate 100M; }<br>
}<br>
<br>
below are the cluster drbd resource configuration:<br>
primitive p-drbd ocf:linbit:drbd \<br>
params drbd_resource="r0" \<br>
op monitor interval="50" role="Master" timeout="30" \<br>
op monitor interval="60" role="Slave" timeout="30" \<br>
op start interval="0" timeout="240" \<br>
op stop interval="0" timeout="100"<br>
ms ms-drbd p-drbd \<br>
meta master-max="2" clone-max="2" notify="true"
interleave="true"<br>
<br>
when cluster starts the drbd resource, /var/log/messages:<br>
Jul 1 19:04:40 node2 cibadmin[4754]: notice: crm_log_args:
Invoked: cibadmin -p -R -o resources <br>
Jul 1 19:04:41 node2 kernel: [ 494.932537] events: mcg drbd: 3<br>
Jul 1 19:04:41 node2 kernel: [ 494.943147] drbd: initialized.
Version: 8.4.3 (api:1/proto:86-101)<br>
Jul 1 19:04:41 node2 kernel: [ 494.943151] drbd: GIT-hash:
89a294209144b68adb3ee85a73221f964d3ee515 build by phil@fat-tyre,
2013-02-05 15:35:49<br>
Jul 1 19:04:41 node2 kernel: [ 494.943153] drbd: registered as
block device major 147<br>
Jul 1 19:04:42 node2 kernel: [ 495.981244] d-con r0: Starting
worker thread (from drbdsetup [4801])<br>
Jul 1 19:04:42 node2 kernel: [ 495.981560] block drbd0: disk(
Diskless -> Attaching ) <br>
Jul 1 19:04:42 node2 kernel: [ 495.982168] d-con r0: Method to
ensure write ordering: flush<br>
Jul 1 19:04:42 node2 kernel: [ 495.982174] block drbd0: max BIO
size = 1048576<br>
Jul 1 19:04:42 node2 kernel: [ 495.982179] block drbd0:
drbd_bm_resize called with capacity == 4192056<br>
Jul 1 19:04:42 node2 kernel: [ 495.982201] block drbd0: resync
bitmap: bits=524007 words=8188 pages=16<br>
Jul 1 19:04:42 node2 kernel: [ 495.982204] block drbd0: size =
2047 MB (2096028 KB)<br>
Jul 1 19:04:42 node2 kernel: [ 495.983736] block drbd0: bitmap
READ of 16 pages took 1 jiffies<br>
Jul 1 19:04:42 node2 kernel: [ 495.983757] block drbd0: recounting
of set bits took additional 0 jiffies<br>
Jul 1 19:04:42 node2 kernel: [ 495.983760] block drbd0: 0 KB (0
bits) marked out-of-sync by on disk bit-map.<br>
Jul 1 19:04:42 node2 kernel: [ 495.983767] block drbd0: disk(
Attaching -> UpToDate ) <br>
Jul 1 19:04:42 node2 kernel: [ 495.983771] block drbd0: attached
to UUIDs
62EE6E5BA23AC477:37CECFD41B2C30A4:1B8441319CED9865:1B8341319CED9865<br>
Jul 1 19:04:42 node2 attrd[4231]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p-drbd (1000)<br>
Jul 1 19:04:42 node2 attrd[4231]: notice: attrd_perform_update:
Sent update 24: master-p-drbd=1000<br>
Jul 1 19:04:42 node2 attrd[4231]: notice: attrd_perform_update:
Sent update 27: master-p-drbd=1000<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_start_0 (call=68, rc=0, cib-update=18,
confirmed=true) ok<br>
Jul 1 19:04:42 node2 kernel: [ 495.993653] d-con r0: conn(
StandAlone -> Unconnected ) <br>
Jul 1 19:04:42 node2 kernel: [ 496.044937] d-con r0: Starting
receiver thread (from drbd_w_r0 [4802])<br>
Jul 1 19:04:42 node2 kernel: [ 496.045820] d-con r0: receiver
(re)started<br>
Jul 1 19:04:42 node2 kernel: [ 496.045830] d-con r0: conn(
Unconnected -> WFConnection ) <br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_notify_0 (call=71, rc=0, cib-update=0,
confirmed=true) ok<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_notify_0 (call=74, rc=0, cib-update=0,
confirmed=true) ok<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_promote_0 (call=77, rc=0, cib-update=19,
confirmed=true) ok<br>
Jul 1 19:04:42 node2 kernel: [ 496.197480] block drbd0: role(
Secondary -> Primary ) <br>
Jul 1 19:04:42 node2 attrd[4231]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p-drbd (10000)<br>
Jul 1 19:04:42 node2 attrd[4231]: notice: attrd_perform_update:
Sent update 31: master-p-drbd=10000<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_notify_0 (call=80, rc=0, cib-update=0,
confirmed=true) ok<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event: LRM
operation p-drbd_monitor_50000 (call=83, rc=8, cib-update=20,
confirmed=false) master<br>
Jul 1 19:04:42 node2 crmd[4233]: notice: process_lrm_event:
node2-p-drbd_monitor_50000:83 [ ]<br>
Jul 1 19:04:42 node2 kernel: [ 496.342704] d-con r0: Handshake
successful: Agreed network protocol version 101<br>
Jul 1 19:04:42 node2 kernel: [ 496.342890] d-con r0: conn(
WFConnection -> WFReportParams ) <br>
Jul 1 19:04:42 node2 kernel: [ 496.342893] d-con r0: Starting
asender thread (from drbd_r_r0 [4821])<br>
Jul 1 19:04:42 node2 kernel: [ 496.356028] block drbd0:
drbd_sync_handshake:<br>
Jul 1 19:04:42 node2 kernel: [ 496.356033] block drbd0: self
62EE6E5BA23AC477:37CECFD41B2C30A4:1B8441319CED9865:1B8341319CED9865
bits:0 flags:0<br>
Jul 1 19:04:42 node2 kernel: [ 496.356035] block drbd0: peer
20FA2D65F94F24B7:37CECFD41B2C30A5:1B8441319CED9865:1B8341319CED9865
bits:0 flags:0<br>
Jul 1 19:04:42 node2 kernel: [ 496.356038] block drbd0:
uuid_compare()=100 by rule 90<br>
Jul 1 19:04:42 node2 kernel: [ 496.356041] block drbd0: helper
command: /sbin/drbdadm initial-split-brain minor-0<br>
Jul 1 19:04:42 node2 kernel: [ 496.358760] block drbd0: helper
command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)<br>
Jul 1 19:04:42 node2 kernel: [ 496.358776] block drbd0:
Split-Brain detected but unresolved, dropping connection!<br>
Jul 1 19:04:42 node2 kernel: [ 496.358811] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0<br>
Jul 1 19:04:42 node2 notify-split-brain.sh[4966]: invoked for r0/0
(drbd0)<br>
Jul 1 19:04:42 node2 kernel: [ 496.385210] d-con r0: meta
connection shut down by peer.<br>
Jul 1 19:04:42 node2 kernel: [ 496.385225] d-con r0: conn(
WFReportParams -> NetworkFailure ) <br>
Jul 1 19:04:42 node2 kernel: [ 496.385228] d-con r0: asender
terminated<br>
Jul 1 19:04:42 node2 kernel: [ 496.385229] d-con r0: Terminating
drbd_a_r0<br>
Jul 1 19:04:42 node2 kernel: [ 496.389939] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)<br>
Jul 1 19:04:42 node2 kernel: [ 496.389961] d-con r0: conn(
NetworkFailure -> Disconnecting ) <br>
Jul 1 19:04:42 node2 kernel: [ 496.389964] d-con r0: error
receiving ReportState, e: -5 l: 0!<br>
Jul 1 19:04:42 node2 kernel: [ 496.390147] d-con r0: Connection
closed<br>
Jul 1 19:04:42 node2 kernel: [ 496.390174] d-con r0: conn(
Disconnecting -> StandAlone ) <br>
Jul 1 19:04:42 node2 kernel: [ 496.390176] d-con r0: receiver
terminated<br>
Jul 1 19:04:42 node2 kernel: [ 496.390177] d-con r0: Terminating
drbd_r_r0<br>
<br>
<br>
<div class="moz-signature">-- <br>
<font size="1">Regards,<br>
<br>
</font>
<font size="2">Muhammad Sharfuddin<br>
</font>
<font size="1">
</font><br>
</div>
</body>
</html>