[DRBD-user] Split Brain Recovery - Maybe

ha at buglecreek.com ha at buglecreek.com
Tue Jan 20 17:20:23 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

I have a two node cluster that I have to upgrade the OS on. So in an
effort to test my upgrade procedure I setup a quick test environment on
virtual machines.  The nodes have Fedora 8 drbd-8.0.8, drbdlinks, and
heartbeat 2.1.2. I will attempt to upgrade the nodes to RH5,
drbd-8.0.14, heartbeat-2.1.4 one system at a time.  I had both machines 
running and everything seemed ok.

At this point Node A was running as primary and was running all
services.  I tried to stop heartbeat on Node A so Node B could take over
everything.  When I did this node A had a spontaneous reboot.  At the
same time Node B took over all resources and seemed to be functioning
properly, but drbd status showed a Standalone connection state on Node
B.  When
node A came backup it was in a WFConnection state and in Secondary mode.
 This seemed to indicate a splitbrain type of situation as described in
the manual. Splitbrain was not mentioned in the logs though. So I did
the steps outline in Manual split brain recovery.

On Node A 
> drdbadm secondary
> drbdadm -- --discard-my-data connect all

On Node B 
> drbdadm connect all

The logs on Node B kept saying:
kernel: drbd0: I shall become SyncTarget, but I am primary!

I could never get the nodes to reconnect.  I finally shut down both
Nodes and brought up Node A first and then Node B. Everything seemed to
connect fine and it is syncing.  So,  I was curious if their was
something else I could have done instead of the reboot in case this ever
happens on a production system.  The good thing is, it seems that no
data was lost.

More information about the drbd-user mailing list