[DRBD-user] Constant split brain after reboot

Fri Mar 2 16:47:52 CET 2007

Hi,

My two test servers are having a split-brain problems.   When I reboot 
primary of the two servers to simulate a failure, the logs show a the 
secondary becoming the primary.  So far so good.  Once the rebooted 
server is back up and running, I manually start the drbd and heartbeat 
services on it using the /etc/init.d scripts.  This is where the problem 
occurs.  The two servers see each other, but register a split brain.  
Running "drbdadm -- --discard-my-data connect all" on the rebooted 
server, now the secondary, causes them to fully re-sync and sort them 
selves out but then I'm stuck waiting for a full sync to complete.

I'm trying to understand, why is it that only restarting the master 
causes a split brain?  No data is getting written to either server so 
there is no difference in content.  Shouldn't the rebooted server become 
the secondary and re-sync itself with just the changes?

I've looked at the docs on drbd.org and the linux-ha.org wiki but I'm 
not having any luck.

Any and all tips/hints/pointers appreciated.

Setup:
SLES 10
kernel 2.6.16.27-0.9-bigsmp
drbd-8.0.0 (compiled from source)

Config:

global {
    usage-count no;
}

common {
  syncer { rate 120M; }
}

resource r0 {

  protocol C;

  startup {
    degr-wfc-timeout 120;
  }

  disk { on-io-error   detach; }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 120M;
    al-extents 257;
  }

  on node1 {
    device     /dev/drbd0;
    disk       /dev/sda4;
    address    172.16.0.2:7788;
    meta-disk  /dev/sda3[0];
  }

  on node2 {
    device     /dev/drbd0;
    disk       /dev/sda4;
    address    172.16.0.1:7788;
    meta-disk  /dev/sda3[0];
  }
}

-- 
David Filion