Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-08-02 21:05:26 +0400 \ Igor Yu. Zhbanov: > Hello! > > Linux-2.6/DRBD-0.7.20. > > Sometimes split-brain happens. We have two Standalone:Primary/Unknown DRBDs. > Let's name cluster nodes nodeA and nodeB. Services now run on both nodes and > cluster IP belongs to node that have sent ARP-notification later. Suppose that > it is a nodeB. And despite a split-brain nodeB serves clients normally. > (We have a switch, not a hub.) And now we want to connect DRBD on both nodes > to each other: > nodeA# drbdadm secondary all > nodeB# drbdadm connect all > nodeA# drbdadm connect all > > But it doesn't work because DRBD on nodeA thinks that it has more actual data. > And even if we run "drbdadm -- --human primary" on nodeB this doesn't help. > So, the correct procedure to getting out of split-brain is: > 1) Stop Heartbeat on both nodes to switch drbd to Standalone:Secondary/Unknown > 2) On node with actual data run "drbdadm -- --human primary all" > 3) On both nodes run "drbdadm connect all" > 4) On node with actual data run "drbdadm secondary all" so Heartbeat can start > without complains that DRBD is running already. > 5) On both nodes run Heartbeat (in desired order). > > Now my questions. > 1) Is it possible to connect DRBDs without bringing both devices to secondary > state? If one DRBD is in primary state and we know for sure that it has > actual data, why another DRBD does not agreed with admin's choise? Even if > we run "drbdadm -- --human primary all" on (already in Primary state) nodeB. > I just don't want to stop cluster services which already serving clients. > By the way command "drbdadm -- --human secondary all" return errors. easy, but involves full sync: bad-node# drbdadm secondary all drbdadm disconnect all (goes StandAlone Secondary/Unknown) drbdadm invalidate all drbdadm connect all good-node# drbdadm connect all -> full sync now. by taking down the bad-node, and carefully modifying the counters and flags to the right value (for example consistent, crashed primary, but all counters zero) you could talk drbd into doing a partial sync of all relevant areas. > 2) Is it possible to detect situation when DRBD will refuse other node to > connect before conectting? I mean by asking DRBD to tell some information > about most fresh generation counter or whatever it uses to decide what DRBD > will be in primary state. It will be useful when you have both nodes in > Standalone state to see in what order synchronization will start or it will > not start because of conflict. only by comparing the "generation counters", e.g. utilising the read_gc.pl (from the ./testing/ dir in the drbd source tar-ball) rules: the "higher" counters win. highest priority is consistent-flag. a node which is currently primary will refuse to become sync-target, even if its counters would lose. this last rule is useful: by making the good-node Primary prior to connection attempts, you can be sure that its data won't be destroyed by a sync in the "wrong" direction, so you can safely attempt to connect "and see what hapens". -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.