Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 2011-10-08 15:55, Bart Coninckx wrote: > On 10/08/11 00:25, Lars Ellenberg wrote: >> On Fri, Oct 07, 2011 at 10:21:08PM +0200, Bart Coninckx wrote: >>> On 10/06/11 22:03, Florian Haas wrote: >>>> On 2011-10-06 21:43, Bart Coninckx wrote: >>>>> Hi all, >>>>> >>>>> would you mind sending me examples of your crm config for a dual >>>>> primary >>>>> DRBD resource? >>>>> >>>>> I used the one on >>>>> >>>>> http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html >>>>> >>>>> and on >>>>> >>>>> http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 >>>>> >>>>> and they both result into split brain, except for when I start drbd >>>>> manually first. >>>> >>>> They clearly should not. Rather than soliciting other people's >>>> configurations and then try to adapt yours based on that, why don't you >>>> upload _your_ CIB (not just a "crm configure dump", but a full >>>> "cibadmin >>>> -Q") and your DRBD configuration to your pastebin/pastie/fpaste and let >>>> people tell you where your problem is? >>> >>> OK, I posted the drbd.conf on http://pastebin.com/SQe9YxhY >>> >>> cibadmin -Q is on http://pastebin.com/gTZqsACq >>> >>> The split brain logging is on http://pastebin.com/7unKKkdi . >> >> I somehow think you added some "--force" or "--overwrite-data-of-peer" >> to some drbdadm/drbdsetup primary invocation? >> >>> Could this be some sort of timing issue? Manually things are find, >>> but there are some seconds in between the primary promotions. >> > > OK, seems to be some sort of timing issue. I "fixed" this by adding a > "sleep 1" in the RA right before the "do_drbdadm primary $DRBD_RESOURCE" > line. > > I'm surprised though that I'm the first one to run into this. Er, wait. I'm cross-posting this to the Pacemaker list on a hunch. Andrew, in Boston last year you mentioned you were planning to implement a change to Master/Slave sets in which, iirc, startup and promotion would happen in one fell swoop (I believe the NTT folks made a compelling case for this). Has that change ever been implemented? And if so, at which Pacemaker version? Is there a configuration option to revert back to the old behavior where the resource would be started first, and then promotion would occur some time after that? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now