Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Jul 14, 2014 at 05:09:42PM -0400, Michael Monette wrote: > I have been having this really odd issue and I can't seem to figure it out. I have tried everything I can think of and I have compared it to all my other working DRBD setups and just cannot get this thing to work. > > node-1 is primary, /dev/drbd1 is mounted at /opt > node-2 is secondary > both are UpToDate > > shut down node-1, try to make node-2 primary and receive the error: > > 1: State change failed: (-7) Refusing to be Primary while peer is not outdated > Command 'drbdsetup primary 1' terminated with exit code 11 > > Also check out this one as well: > > node-1 is primary, /dev/drbd1 is mounted at /opt > node-2 is secondary > both are UpToDate(same as before) > > This time, I shut down node-2(secondary). Everything is fine and continues to run normally on node-1. I unmount /dev/drbd1 and put it into secondary, and immediately put it back into primary: > > umount /dev/drbd1 > drbdadm secondary all; drbdadm primary all # I ran these commands in one line so it switches as quick as possible. > 1: State change failed: (-7) Refusing to be Primary while peer is not outdated > Command 'drbdsetup primary 1' terminated with exit code 11 > > iptables is off, SELinux is off. I ran the drbdadm secondary and drbdadm primary in one line so it is as quick as possible. It was just running fine as a primary, so why can't I even make it a secondary, then make it primary again? Out of the 30+ times I have set this up, I have never encountered this problem. > > When either of the peers go offline, cat /proc/drbd shows: > > # cat /proc/drbd > version: 8.4.4 (api:1/proto:86-101) > GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06 > > 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----- > ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > If I restart DRBD and abort the timeout on the surviving node, it changes to this: > > # cat /proc/drbd > version: 8.4.4 (api:1/proto:86-101) > GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06 > > 1: cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown C r----- > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > Here is my config: > > ########## > > resource r0 { > protocol C; > net { > cram-hmac-alg sha1; > shared-secret "pazzwurd1"; > max-epoch-size 512; > sndbuf-size 0; > } > startup { > wfc-timeout 30; > outdated-wfc-timeout 20; > degr-wfc-timeout 30; > } > disk { > on-io-error detach; > fencing resource-only; > } > syncer { > rate 100M; > } > handlers { > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > } > volume 0 { > device /dev/drbd1; > disk /dev/mapper/vg_ottppencrzdb1-lv_pgsql; > meta-disk internal; > } > on db-node-1.myco.com { > address 172.16.99.1:7789; > } > on db-node-2.myco.com { > address 172.16.99.2:7789; > } > } > > ########## > > > I have tried to remove the fencing handlers and it did not help. > I haven't even gotten to the pacemaker stage yet anyways. There. *that* is your problem. A fence-peer-handler that uses pacemaker cannot possibly work without pacemaker. I would think that you should find loud complaints about that in your system logs. If you tell DRBD to use a fence handler, that handler has to report success, or you get above behavior, by design and configuration. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed