[DRBD-user] Split Brain Recovery - Maybe
ha at buglecreek.com
ha at buglecreek.com
Tue Jan 20 17:20:23 CET 2009
I have a two node cluster that I have to upgrade the OS on. So in an
effort to test my upgrade procedure I setup a quick test environment on
virtual machines. The nodes have Fedora 8 drbd-8.0.8, drbdlinks, and
heartbeat 2.1.2. I will attempt to upgrade the nodes to RH5,
drbd-8.0.14, heartbeat-2.1.4 one system at a time. I had both machines
running and everything seemed ok.
At this point Node A was running as primary and was running all
services. I tried to stop heartbeat on Node A so Node B could take over
everything. When I did this node A had a spontaneous reboot. At the
same time Node B took over all resources and seemed to be functioning
properly, but drbd status showed a Standalone connection state on Node
B. When
node A came backup it was in a WFConnection state and in Secondary mode.
This seemed to indicate a splitbrain type of situation as described in
the manual. Splitbrain was not mentioned in the logs though. So I did
the steps outline in Manual split brain recovery.
On Node A
> drdbadm secondary
> drbdadm -- --discard-my-data connect all
On Node B
> drbdadm connect all
The logs on Node B kept saying:
kernel: drbd0: I shall become SyncTarget, but I am primary!
I could never get the nodes to reconnect. I finally shut down both
Nodes and brought up Node A first and then Node B. Everything seemed to
connect fine and it is syncing. So, I was curious if their was
something else I could have done instead of the reboot in case this ever
happens on a production system. The good thing is, it seems that no
data was lost.
More information about the drbd-user
mailing list