[DRBD-user] Split-brain fixing without stopping cluster resources.

Igor Yu. Zhbanov bsg at uniyar.ac.ru
Wed Aug 2 19:05:26 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



Sometimes split-brain happens. We have two Standalone:Primary/Unknown DRBDs.
Let's name cluster nodes nodeA and nodeB. Services now run on both nodes and
cluster IP belongs to node that have sent ARP-notification later. Suppose that
it is a nodeB. And despite a split-brain nodeB serves clients normally.
(We have a switch, not a hub.) And now we want to connect DRBD on both nodes
to each other:
nodeA# drbdadm secondary all
nodeB# drbdadm connect all
nodeA# drbdadm connect all

But it doesn't work because DRBD on nodeA thinks that it has more actual data.
And even if we run "drbdadm -- --human primary" on nodeB this doesn't help.
So, the correct procedure to getting out of split-brain is:
1) Stop Heartbeat on both nodes to switch drbd to Standalone:Secondary/Unknown
2) On node with actual data run "drbdadm -- --human primary all"
3) On both nodes run "drbdadm connect all"
4) On node with actual data run "drbdadm secondary all" so Heartbeat can start
   without complains that DRBD is running already.
5) On both nodes run Heartbeat (in desired order).

Now my questions.
1) Is it possible to connect DRBDs without bringing both devices to secondary
   state? If one DRBD is in primary state and we know for sure that it has
   actual data, why another DRBD does not agreed with admin's choise? Even if
   we run "drbdadm -- --human primary all" on (already in Primary state) nodeB.
   I just don't want to stop cluster services which already serving clients.
   By the way command "drbdadm -- --human secondary all" return errors.

2) Is it possible to detect situation when DRBD will refuse other node to
   connect before conectting? I mean by asking DRBD to tell some information
   about most fresh generation counter or whatever it uses to decide what DRBD
   will be in primary state. It will be useful when you have both nodes in
   Standalone state to see in what order synchronization will start or it will
   not start because of conflict.


More information about the drbd-user mailing list