Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, My two test servers are having a split-brain problems. When I reboot primary of the two servers to simulate a failure, the logs show a the secondary becoming the primary. So far so good. Once the rebooted server is back up and running, I manually start the drbd and heartbeat services on it using the /etc/init.d scripts. This is where the problem occurs. The two servers see each other, but register a split brain. Running "drbdadm -- --discard-my-data connect all" on the rebooted server, now the secondary, causes them to fully re-sync and sort them selves out but then I'm stuck waiting for a full sync to complete. I'm trying to understand, why is it that only restarting the master causes a split brain? No data is getting written to either server so there is no difference in content. Shouldn't the rebooted server become the secondary and re-sync itself with just the changes? I've looked at the docs on drbd.org and the linux-ha.org wiki but I'm not having any luck. Any and all tips/hints/pointers appreciated. Setup: SLES 10 kernel 2.6.16.27-0.9-bigsmp drbd-8.0.0 (compiled from source) Config: global { usage-count no; } common { syncer { rate 120M; } } resource r0 { protocol C; startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 120M; al-extents 257; } on node1 { device /dev/drbd0; disk /dev/sda4; address 172.16.0.2:7788; meta-disk /dev/sda3[0]; } on node2 { device /dev/drbd0; disk /dev/sda4; address 172.16.0.1:7788; meta-disk /dev/sda3[0]; } } -- David Filion