Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Easiest thing to do is to configure proper stonith (configure + test), then change drbd to use 'fencing resource-and-stonith;' and use the 'fence-peer "/usr/lib/drbd/crm-fence-peer.sh";' and 'before-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";'. That way, you avoid split-brains entirely. You also need stonith in pacemaker anyway, so win-win. On 15/04/15 08:49 PM, Keith Ouellette wrote: > We have two nodes that have two drbd resources for two different > applications on a pair of servers managed by Pacemaker. All looks to > work fine when the primary node is put into standby or power cycled. > Meaning that the drbd Primary gets moved to the new active node and the > applications continue to run as expected. I have an issue when I pull > the Ethernet out of the primary node and let it sit there for about a > half hour. When I unplug it the Primary gets moved as expected and the > applications continue to work. However, when I plug the Ethernet back > into the system, both nodes go into a standalone state. > > > > *Node 1:* > > > > drbd driver loaded OK; device status: > version: 8.4.3 (api:1/proto:86-101) > srcversion: F97798065516C94BE0F27DC > m:res cs ro ds p mounted > fstype > 0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4 > 1:r1 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4 > > > > *Node 2:* > > > > drbd driver loaded OK; device status: > version: 8.4.3 (api:1/proto:86-101) > srcversion: F97798065516C94BE0F27DC > m:res cs ro ds p > mounted fstype > 0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r----- > 1:r1 StandAlone Secondary/Unknown UpToDate/DUnknown r----- > > > > As you can see one knows it is Primary and that is what the applications > continue to run on. The second node knows it should be Secondary. All I > do to resolve this is connect the resources on each node with the > Secondary having the –discard-my-data option. > > > > Is there a way to have the connects done automatically. This looks to be > a type of “split brain’ and I do have that configured in the > global.common.conf: > > > > global { > usage-count no; > # minor-count dialog-refresh disable-ip-verification > } > > common { > handlers { > # These are EXAMPLE handlers only. > # They may have severe implications, > # like hard resetting the node under certain circumstances. > # Be careful when chosing your poison. > > # pri-on-incon-degr > "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > # pri-lost-after-sb > "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > # local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger > ; halt -f"; > # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > * split-brain "/usr/lib/drbd/notify-split-brain.sh root";* > # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; > # before-resync-target > "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; > # after-resync-target > /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; > } > > startup { > # wfc-timeout degr-wfc-timeout outdated-wfc-timeout > wait-after-sb > } > > options { > # cpu-mask on-no-data-accessible > } > > disk { > # size max-bio-bvecs on-io-error fencing disk-barrier > disk-flushes > # disk-drain md-flushes resync-rate resync-after al-extents > # c-plan-ahead c-delay-target c-fill-target c-max-rate > # c-min-rate disk-timeout > } > > net { > * after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > # after-sb-2pri consensus; > after-sb-2pri disconnect;* > # protocol timeout max-epoch-size max-buffers > unplug-watermark > # connect-int ping-int sndbuf-size rcvbuf-size ko-count > # allow-two-primaries cram-hmac-alg shared-secret > after-sb-0pri > # after-sb-1pri after-sb-2pri always-asbp rr-conflict > # ping-timeout data-integrity-alg tcp-cork on-congestion > # congestion-fill congestion-extents csums-alg verify-alg > # use-rle > } > } > > > > The following are also the resource files: > > > > r0.res: > > > > resource r0 { > on Node1 { > volume 0 { > device /dev/drbd0; > disk /dev/ Node1-vg/AOS; > flexible-meta-disk internal; > } > address 10.0.6.221:7788; > } > on Node2 { > volume 0 { > device /dev/drbd0; > disk /dev/ Node2-vg/AOS; > flexible-meta-disk internal; > } > address 10.0.6.222:7788; > } > } > > > > r1.res: > > > > resource r1 { > on Node1 { > volume 0 { > device /dev/drbd1; > disk /dev/ Node1-vg/Controller; > flexible-meta-disk internal; > } > address 10.0.6.221:7789; > } > on Node2 { > volume 0 { > device /dev/drbd1; > disk /dev/ Node2-vg/Controller; > flexible-meta-disk internal; > } > address 10.0.6.222:7789; > } > } > > > > I am not sure if this is possible, but I figured I would ask. > > > > Thanks, > Keith > > > > <http://www.fibermountain.com> > > Keith Ouellette > > > /KeithO at fibermountain.com/ > > 700 West Johnson Avenue > Cheshire, CT06410 > www.fibermountain.com > > > > > > P. (203) 806-4046 > C. (860) 810-4877 > F. (845) 358-7882 > > > > > > Disclaimer: The information contained in this communication is > confidential, may be privileged and is intended for the exclusive use of > the above named addressee(s). If you are not the intended recipient(s), > you are expressly prohibited from copying, distributing, disseminating, > or in any other way using any information contained within this > communication. If you have received this communication in error, please > contact the sender by telephone or by response via mail. We have taken > precautions to minimize the risk of transmitting software viruses, but > we advise you to carry out your own virus checks on this message, as > well as any attachments. We cannot accept liability for any loss or > damage caused by software viruses. > > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?