[DRBD-user] Can't become primary when peer goes offline.

Mon Jul 14 23:09:42 CEST 2014

I have been having this really odd issue and I can't seem to figure it out. I have tried everything I can think of and I have compared it to all my other working DRBD setups and just cannot get this thing to work. 

node-1 is primary, /dev/drbd1 is mounted at /opt
node-2 is secondary
both are UpToDate

shut down node-1, try to make node-2 primary and receive the error:

1: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup primary 1' terminated with exit code 11

Also check out this one as well:

node-1 is primary, /dev/drbd1 is mounted at /opt
node-2 is secondary 
both are UpToDate(same as before)

This time, I shut down node-2(secondary). Everything is fine and continues to run normally on node-1. I unmount /dev/drbd1 and put it into secondary, and immediately put it back into primary:

umount /dev/drbd1
drbdadm secondary all; drbdadm primary all # I ran these commands in one line so it switches as quick as possible.
1: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup primary 1' terminated with exit code 11

iptables is off, SELinux is off. I ran the drbdadm secondary and drbdadm primary in one line so it is as quick as possible. It was just running fine as a primary, so why can't I even make it a secondary, then make it primary again? Out of the 30+ times I have set this up, I have never encountered this problem. 

When either of the peers go offline, cat /proc/drbd shows:

# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06

 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

If I restart DRBD and abort the timeout on the surviving node, it changes to this:

# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06

 1: cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Here is my config:

##########

resource r0 {
protocol C;
net {
        cram-hmac-alg sha1;
        shared-secret "pazzwurd1";
        max-epoch-size 512;
        sndbuf-size 0;
    }
startup {
        wfc-timeout 30;
        outdated-wfc-timeout 20;
        degr-wfc-timeout 30;
    }
disk {
        on-io-error detach;
        fencing resource-only;
    }
syncer {
rate 100M;
}
handlers {
        fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    }
volume 0 {
device /dev/drbd1;
disk /dev/mapper/vg_ottppencrzdb1-lv_pgsql;
meta-disk internal;
}
on db-node-1.myco.com {
address 172.16.99.1:7789;
}
on db-node-2.myco.com {
address 172.16.99.2:7789;
}
}

##########

I have tried to remove the fencing handlers and it did not help. I haven't even gotten to the pacemaker stage yet anyways. I can send logs if needed, just tell me which ones you need.

Thanks for any help.

Mike