Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a simple DRBD mount with two nodes that the CRM promotes to Master on fail. My problem occurs when I sleep one node and then wake the failing node back up. In this instance I end up with two Primary (Split Brain) nodes but it is my understanding that Pacemaker tries to promote and use the failing node since DRBD has disconnected due to the split brain. I have set stickiness. I've also toyed around with the auto resolve split brain settings but I couldn't find a working combination. Please help! == disk0.res == esource disk0 { protocol C; syncer { rate 40M; } net { cram-hmac-alg sha1; shared-secret "mysharedsecret"; } on cluster1 { device /dev/drbd0; disk /dev/sda7; address 10.211.55.5:7788; meta-disk internal; } on cluster2 { device /dev/drbd0; disk /dev/sda7; address 10.211.55.6:7788; meta-disk internal; } } == crm configure show == node cluster1 node cluster2 primitive Apache ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" httpd="/usr/sbin/apache2" \ op monitor interval="5s" primitive ApacheIP ocf:heartbeat:IPaddr2 \ params ip="10.211.55.10" nic="eth0" primitive SrvDisk ocf:linbit:drbd \ params drbd_resource="disk0" \ op monitor interval="15s" primitive SrvFs ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/srv" fstype="ext4" ms SrvGroup SrvDisk \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation Apache-with-SrvFs inf: Apache SrvFs colocation fs_on_drbd inf: SrvFs SrvGroup:Master colocation website-with-ip inf: Apache ApacheIP order Apache-after-SrvFS inf: SrvFs Apache order SrvFs-after-SrvDisk inf: SrvGroup:promote SrvFs:start order apache-after-ip inf: ApacheIP Apache property $id="cib-bootstrap-options" \ dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" == /var/log/messages on sleep == Jun 3 15:19:36 cluster2 kernel: [ 832.574335] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 3 15:19:36 cluster2 kernel: [ 832.574350] block drbd0: asender terminated Jun 3 15:19:36 cluster2 kernel: [ 832.574352] block drbd0: Terminating asender thread Jun 3 15:19:36 cluster2 kernel: [ 832.597942] block drbd0: Connection closed Jun 3 15:19:36 cluster2 kernel: [ 832.597949] block drbd0: conn( NetworkFailure -> Unconnected ) Jun 3 15:19:36 cluster2 kernel: [ 832.597953] block drbd0: receiver terminated Jun 3 15:19:36 cluster2 kernel: [ 832.597954] block drbd0: Restarting receiver thread Jun 3 15:19:36 cluster2 kernel: [ 832.597956] block drbd0: receiver (re)started Jun 3 15:19:36 cluster2 kernel: [ 832.597959] block drbd0: conn( Unconnected -> WFConnection ) Jun 3 15:19:36 cluster2 kernel: [ 832.620505] block drbd0: role( Secondary -> Primary ) Jun 3 15:19:36 cluster2 kernel: [ 832.621577] block drbd0: Creating new current UUID Jun 3 15:19:36 cluster2 kernel: [ 833.052499] EXT4-fs (drbd0): mounted filesystem with ordered data mode == /var/log/messages on wake == Jun 3 15:21:18 cluster2 kernel: [ 935.077211] block drbd0: Handshake successful: Agreed network protocol version 91 Jun 3 15:21:18 cluster2 kernel: [ 935.090186] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Jun 3 15:21:18 cluster2 kernel: [ 935.090194] block drbd0: conn( WFConnection -> WFReportParams ) Jun 3 15:21:18 cluster2 kernel: [ 935.090208] block drbd0: Starting asender thread (from drbd0_receiver [2461]) Jun 3 15:21:18 cluster2 kernel: [ 935.094911] block drbd0: data-integrity-alg: <not-used> Jun 3 15:21:18 cluster2 kernel: [ 935.094958] block drbd0: drbd_sync_handshake: Jun 3 15:21:18 cluster2 kernel: [ 935.094961] block drbd0: self 46BFC24C5A5ACEFB:D742CA23DC5EF406:AE656702A2EB3902:48414BE3B700DFC3 bits:3 flags:0 Jun 3 15:21:18 cluster2 kernel: [ 935.094963] block drbd0: peer 66A59B86D1B19A5F:D742CA23DC5EF407:AE656702A2EB3902:48414BE3B700DFC3 bits:1 flags:0 Jun 3 15:21:18 cluster2 kernel: [ 935.094966] block drbd0: uuid_compare()=100 by rule 90 Jun 3 15:21:18 cluster2 kernel: [ 935.099958] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Jun 3 15:21:18 cluster2 kernel: [ 935.127178] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Jun 3 15:21:18 cluster2 kernel: [ 935.127185] block drbd0: conn( WFReportParams -> Disconnecting ) Jun 3 15:21:18 cluster2 kernel: [ 935.133719] block drbd0: asender terminated Jun 3 15:21:18 cluster2 kernel: [ 935.133723] block drbd0: Terminating asender thread Jun 3 15:21:18 cluster2 kernel: [ 935.136794] block drbd0: Connection closed Jun 3 15:21:18 cluster2 kernel: [ 935.136808] block drbd0: conn( Disconnecting -> StandAlone ) Jun 3 15:21:18 cluster2 kernel: [ 935.136816] block drbd0: receiver terminated Jun 3 15:21:18 cluster2 kernel: [ 935.136817] block drbd0: Terminating receiver thread Jun 3 15:21:18 cluster2 kernel: [ 935.244176] block drbd0: role( Primary -> Secondary ) Jun 3 15:21:19 cluster2 kernel: [ 935.548493] block drbd0: disk( UpToDate -> Outdated ) -- View this message in context: http://old.nabble.com/DRBD-and-Pacemaker-configuration-tp28772524p28772524.html Sent from the DRBD - User mailing list archive at Nabble.com.