[DRBD-user] Can't become primary when peer goes offline.

Tue Jul 15 18:02:18 CEST 2014

I managed to solve my problem by removing these lines from my drbd config:

disk {
        on-io-error detach;
        fencing resource-only;

But all my other working nodes have these lines and there are no problems. It makes me wonder if they are being ignored or if there is a bug or something. 

Anyways, hope this helps somebody. 

Mike

On July 14, 2014 5:09:42 PM EDT, Michael Monette <mmonette at 2keys.ca> wrote:
>I have been having this really odd issue and I can't seem to figure it
>out. I have tried everything I can think of and I have compared it to
>all my other working DRBD setups and just cannot get this thing to
>work. 
>
>node-1 is primary, /dev/drbd1 is mounted at /opt
>node-2 is secondary
>both are UpToDate
>
>shut down node-1, try to make node-2 primary and receive the error:
>
>1: State change failed: (-7) Refusing to be Primary while peer is not
>outdated
>Command 'drbdsetup primary 1' terminated with exit code 11
>
>Also check out this one as well:
>
>node-1 is primary, /dev/drbd1 is mounted at /opt
>node-2 is secondary 
>both are UpToDate(same as before)
>
>This time, I shut down node-2(secondary). Everything is fine and
>continues to run normally on node-1. I unmount /dev/drbd1 and put it
>into secondary, and immediately put it back into primary:
>
>umount /dev/drbd1
>drbdadm secondary all; drbdadm primary all # I ran these commands in
>one line so it switches as quick as possible.
>1: State change failed: (-7) Refusing to be Primary while peer is not
>outdated
>Command 'drbdsetup primary 1' terminated with exit code 11
>
>iptables is off, SELinux is off. I ran the drbdadm secondary and
>drbdadm primary in one line so it is as quick as possible. It was just
>running fine as a primary, so why can't I even make it a secondary,
>then make it primary again? Out of the 30+ times I have set this up, I
>have never encountered this problem. 
>
>When either of the peers go offline, cat /proc/drbd shows:
>
># cat /proc/drbd
>version: 8.4.4 (api:1/proto:86-101)
>GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by
>phil at Build64R6, 2013-10-14 15:33:06
>
> 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
>    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
>If I restart DRBD and abort the timeout on the surviving node, it
>changes to this:
>
># cat /proc/drbd
>version: 8.4.4 (api:1/proto:86-101)
>GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by
>phil at Build64R6, 2013-10-14 15:33:06
>
>1: cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown C r-----
>    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
>
>Here is my config:
>
>##########
>
>resource r0 {
>protocol C;
>net {
>        cram-hmac-alg sha1;
>        shared-secret "pazzwurd1";
>        max-epoch-size 512;
>        sndbuf-size 0;
>    }
>startup {
>        wfc-timeout 30;
>        outdated-wfc-timeout 20;
>        degr-wfc-timeout 30;
>    }
>disk {
>        on-io-error detach;
>        fencing resource-only;
>    }
>syncer {
>rate 100M;
>}
>handlers {
>        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>    }
>volume 0 {
>device /dev/drbd1;
>disk /dev/mapper/vg_ottppencrzdb1-lv_pgsql;
>meta-disk internal;
>}
>on db-node-1.myco.com {
>address 172.16.99.1:7789;
>}
>on db-node-2.myco.com {
>address 172.16.99.2:7789;
>}
>}
>
>##########
>
>
>I have tried to remove the fencing handlers and it did not help. I
>haven't even gotten to the pacemaker stage yet anyways. I can send logs
>if needed, just tell me which ones you need.
>
>Thanks for any help.
>
>Mike
>_______________________________________________
>drbd-user mailing list
>drbd-user at lists.linbit.com
>http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140715/1747c428/attachment.htm>