Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have been having this really odd issue and I can't seem to figure it out. I have tried everything I can think of and I have compared it to all my other working DRBD setups and just cannot get this thing to work.
node-1 is primary, /dev/drbd1 is mounted at /opt
node-2 is secondary
both are UpToDate
shut down node-1, try to make node-2 primary and receive the error:
1: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup primary 1' terminated with exit code 11
Also check out this one as well:
node-1 is primary, /dev/drbd1 is mounted at /opt
node-2 is secondary
both are UpToDate(same as before)
This time, I shut down node-2(secondary). Everything is fine and continues to run normally on node-1. I unmount /dev/drbd1 and put it into secondary, and immediately put it back into primary:
umount /dev/drbd1
drbdadm secondary all; drbdadm primary all # I ran these commands in one line so it switches as quick as possible.
1: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup primary 1' terminated with exit code 11
iptables is off, SELinux is off. I ran the drbdadm secondary and drbdadm primary in one line so it is as quick as possible. It was just running fine as a primary, so why can't I even make it a secondary, then make it primary again? Out of the 30+ times I have set this up, I have never encountered this problem.
When either of the peers go offline, cat /proc/drbd shows:
# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06
1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
If I restart DRBD and abort the timeout on the surviving node, it changes to this:
# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06
1: cs:WFConnection ro:Secondary/Unknown ds:Consistent/DUnknown C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
Here is my config:
##########
resource r0 {
protocol C;
net {
cram-hmac-alg sha1;
shared-secret "pazzwurd1";
max-epoch-size 512;
sndbuf-size 0;
}
startup {
wfc-timeout 30;
outdated-wfc-timeout 20;
degr-wfc-timeout 30;
}
disk {
on-io-error detach;
fencing resource-only;
}
syncer {
rate 100M;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
volume 0 {
device /dev/drbd1;
disk /dev/mapper/vg_ottppencrzdb1-lv_pgsql;
meta-disk internal;
}
on db-node-1.myco.com {
address 172.16.99.1:7789;
}
on db-node-2.myco.com {
address 172.16.99.2:7789;
}
}
##########
I have tried to remove the fencing handlers and it did not help. I haven't even gotten to the pacemaker stage yet anyways. I can send logs if needed, just tell me which ones you need.
Thanks for any help.
Mike