Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
My two test servers are having a split-brain problems. When I reboot
primary of the two servers to simulate a failure, the logs show a the
secondary becoming the primary. So far so good. Once the rebooted
server is back up and running, I manually start the drbd and heartbeat
services on it using the /etc/init.d scripts. This is where the problem
occurs. The two servers see each other, but register a split brain.
Running "drbdadm -- --discard-my-data connect all" on the rebooted
server, now the secondary, causes them to fully re-sync and sort them
selves out but then I'm stuck waiting for a full sync to complete.
I'm trying to understand, why is it that only restarting the master
causes a split brain? No data is getting written to either server so
there is no difference in content. Shouldn't the rebooted server become
the secondary and re-sync itself with just the changes?
I've looked at the docs on drbd.org and the linux-ha.org wiki but I'm
not having any luck.
Any and all tips/hints/pointers appreciated.
Setup:
SLES 10
kernel 2.6.16.27-0.9-bigsmp
drbd-8.0.0 (compiled from source)
Config:
global {
usage-count no;
}
common {
syncer { rate 120M; }
}
resource r0 {
protocol C;
startup {
degr-wfc-timeout 120;
}
disk { on-io-error detach; }
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 120M;
al-extents 257;
}
on node1 {
device /dev/drbd0;
disk /dev/sda4;
address 172.16.0.2:7788;
meta-disk /dev/sda3[0];
}
on node2 {
device /dev/drbd0;
disk /dev/sda4;
address 172.16.0.1:7788;
meta-disk /dev/sda3[0];
}
}
--
David Filion