Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-08-02 21:05:26 +0400
\ Igor Yu. Zhbanov:
> Hello!
>
> Linux-2.6/DRBD-0.7.20.
>
> Sometimes split-brain happens. We have two Standalone:Primary/Unknown DRBDs.
> Let's name cluster nodes nodeA and nodeB. Services now run on both nodes and
> cluster IP belongs to node that have sent ARP-notification later. Suppose that
> it is a nodeB. And despite a split-brain nodeB serves clients normally.
> (We have a switch, not a hub.) And now we want to connect DRBD on both nodes
> to each other:
> nodeA# drbdadm secondary all
> nodeB# drbdadm connect all
> nodeA# drbdadm connect all
>
> But it doesn't work because DRBD on nodeA thinks that it has more actual data.
> And even if we run "drbdadm -- --human primary" on nodeB this doesn't help.
> So, the correct procedure to getting out of split-brain is:
> 1) Stop Heartbeat on both nodes to switch drbd to Standalone:Secondary/Unknown
> 2) On node with actual data run "drbdadm -- --human primary all"
> 3) On both nodes run "drbdadm connect all"
> 4) On node with actual data run "drbdadm secondary all" so Heartbeat can start
> without complains that DRBD is running already.
> 5) On both nodes run Heartbeat (in desired order).
>
> Now my questions.
> 1) Is it possible to connect DRBDs without bringing both devices to secondary
> state? If one DRBD is in primary state and we know for sure that it has
> actual data, why another DRBD does not agreed with admin's choise? Even if
> we run "drbdadm -- --human primary all" on (already in Primary state) nodeB.
> I just don't want to stop cluster services which already serving clients.
> By the way command "drbdadm -- --human secondary all" return errors.
easy, but involves full sync:
bad-node# drbdadm secondary all
drbdadm disconnect all
(goes StandAlone Secondary/Unknown)
drbdadm invalidate all
drbdadm connect all
good-node# drbdadm connect all
-> full sync now.
by taking down the bad-node, and carefully modifying the counters and
flags to the right value (for example consistent, crashed primary, but
all counters zero) you could talk drbd into doing a partial sync of all
relevant areas.
> 2) Is it possible to detect situation when DRBD will refuse other node to
> connect before conectting? I mean by asking DRBD to tell some information
> about most fresh generation counter or whatever it uses to decide what DRBD
> will be in primary state. It will be useful when you have both nodes in
> Standalone state to see in what order synchronization will start or it will
> not start because of conflict.
only by comparing the "generation counters", e.g. utilising the
read_gc.pl (from the ./testing/ dir in the drbd source tar-ball)
rules:
the "higher" counters win.
highest priority is consistent-flag.
a node which is currently primary will refuse to become sync-target,
even if its counters would lose.
this last rule is useful: by making the good-node Primary prior to
connection attempts, you can be sure that its data won't be destroyed by
a sync in the "wrong" direction, so you can safely attempt to connect
"and see what hapens".
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.