[DRBD-user] DRBD and Pacemaker configuration

webPragmatist chris at webpragmatist.com
Thu Jun 3 22:25:27 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have a simple DRBD mount with two nodes that the CRM promotes to Master on
fail. My problem occurs when I sleep one node and then wake the failing node
back up.

In this instance I end up with two Primary (Split Brain) nodes but it is my
understanding that Pacemaker tries to promote and use the failing node since
DRBD has disconnected due to the split brain. I have set stickiness.

I've also toyed around with the auto resolve split brain settings but I
couldn't find a working combination.

Please help!

== disk0.res ==
esource disk0 {
        protocol C;
        syncer {
                rate 40M;
        }
        net {
                cram-hmac-alg sha1;
                shared-secret "mysharedsecret";
        }
        on cluster1 {
                device /dev/drbd0;
                disk /dev/sda7;
                address 10.211.55.5:7788;
                meta-disk internal;
        }
        on cluster2 {
                device /dev/drbd0;
                disk /dev/sda7;
                address 10.211.55.6:7788;
                meta-disk internal;
        }
}

== crm configure show ==
node cluster1
node cluster2
primitive Apache ocf:heartbeat:apache \
	params configfile="/etc/apache2/apache2.conf" httpd="/usr/sbin/apache2" \
	op monitor interval="5s"
primitive ApacheIP ocf:heartbeat:IPaddr2 \
	params ip="10.211.55.10" nic="eth0"
primitive SrvDisk ocf:linbit:drbd \
	params drbd_resource="disk0" \
	op monitor interval="15s"
primitive SrvFs ocf:heartbeat:Filesystem \
	params device="/dev/drbd0" directory="/srv" fstype="ext4"
ms SrvGroup SrvDisk \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"
colocation Apache-with-SrvFs inf: Apache SrvFs
colocation fs_on_drbd inf: SrvFs SrvGroup:Master
colocation website-with-ip inf: Apache ApacheIP
order Apache-after-SrvFS inf: SrvFs Apache
order SrvFs-after-SrvDisk inf: SrvGroup:promote SrvFs:start
order apache-after-ip inf: ApacheIP Apache
property $id="cib-bootstrap-options" \
	dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	no-quorum-policy="ignore" \
	stonith-enabled="false"
rsc_defaults $id="rsc-options" \
	resource-stickiness="100"

== /var/log/messages on sleep ==
Jun  3 15:19:36 cluster2 kernel: [  832.574335] block drbd0: peer( Primary
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown
) 
Jun  3 15:19:36 cluster2 kernel: [  832.574350] block drbd0: asender
terminated
Jun  3 15:19:36 cluster2 kernel: [  832.574352] block drbd0: Terminating
asender thread
Jun  3 15:19:36 cluster2 kernel: [  832.597942] block drbd0: Connection
closed
Jun  3 15:19:36 cluster2 kernel: [  832.597949] block drbd0: conn(
NetworkFailure -> Unconnected ) 
Jun  3 15:19:36 cluster2 kernel: [  832.597953] block drbd0: receiver
terminated
Jun  3 15:19:36 cluster2 kernel: [  832.597954] block drbd0: Restarting
receiver thread
Jun  3 15:19:36 cluster2 kernel: [  832.597956] block drbd0: receiver
(re)started
Jun  3 15:19:36 cluster2 kernel: [  832.597959] block drbd0: conn(
Unconnected -> WFConnection ) 
Jun  3 15:19:36 cluster2 kernel: [  832.620505] block drbd0: role( Secondary
-> Primary ) 
Jun  3 15:19:36 cluster2 kernel: [  832.621577] block drbd0: Creating new
current UUID
Jun  3 15:19:36 cluster2 kernel: [  833.052499] EXT4-fs (drbd0): mounted
filesystem with ordered data mode

== /var/log/messages on wake ==
Jun  3 15:21:18 cluster2 kernel: [  935.077211] block drbd0: Handshake
successful: Agreed network protocol version 91
Jun  3 15:21:18 cluster2 kernel: [  935.090186] block drbd0: Peer
authenticated using 20 bytes of 'sha1' HMAC
Jun  3 15:21:18 cluster2 kernel: [  935.090194] block drbd0: conn(
WFConnection -> WFReportParams ) 
Jun  3 15:21:18 cluster2 kernel: [  935.090208] block drbd0: Starting
asender thread (from drbd0_receiver [2461])
Jun  3 15:21:18 cluster2 kernel: [  935.094911] block drbd0:
data-integrity-alg: <not-used>
Jun  3 15:21:18 cluster2 kernel: [  935.094958] block drbd0:
drbd_sync_handshake:
Jun  3 15:21:18 cluster2 kernel: [  935.094961] block drbd0: self
46BFC24C5A5ACEFB:D742CA23DC5EF406:AE656702A2EB3902:48414BE3B700DFC3 bits:3
flags:0
Jun  3 15:21:18 cluster2 kernel: [  935.094963] block drbd0: peer
66A59B86D1B19A5F:D742CA23DC5EF407:AE656702A2EB3902:48414BE3B700DFC3 bits:1
flags:0
Jun  3 15:21:18 cluster2 kernel: [  935.094966] block drbd0:
uuid_compare()=100 by rule 90
Jun  3 15:21:18 cluster2 kernel: [  935.099958] block drbd0: helper command:
/sbin/drbdadm split-brain minor-0
Jun  3 15:21:18 cluster2 kernel: [  935.127178] block drbd0: helper command:
/sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Jun  3 15:21:18 cluster2 kernel: [  935.127185] block drbd0: conn(
WFReportParams -> Disconnecting ) 
Jun  3 15:21:18 cluster2 kernel: [  935.133719] block drbd0: asender
terminated
Jun  3 15:21:18 cluster2 kernel: [  935.133723] block drbd0: Terminating
asender thread
Jun  3 15:21:18 cluster2 kernel: [  935.136794] block drbd0: Connection
closed
Jun  3 15:21:18 cluster2 kernel: [  935.136808] block drbd0: conn(
Disconnecting -> StandAlone ) 
Jun  3 15:21:18 cluster2 kernel: [  935.136816] block drbd0: receiver
terminated
Jun  3 15:21:18 cluster2 kernel: [  935.136817] block drbd0: Terminating
receiver thread
Jun  3 15:21:18 cluster2 kernel: [  935.244176] block drbd0: role( Primary
-> Secondary ) 
Jun  3 15:21:19 cluster2 kernel: [  935.548493] block drbd0: disk( UpToDate
-> Outdated ) 
-- 
View this message in context: http://old.nabble.com/DRBD-and-Pacemaker-configuration-tp28772524p28772524.html
Sent from the DRBD - User mailing list archive at Nabble.com.




More information about the drbd-user mailing list