[DRBD-user] Recovery from split-brain condition, please advice.
lars.ellenberg at linbit.com
Tue Nov 17 10:10:27 CET 2009
On Mon, Nov 16, 2009 at 11:41:44AM -0800, Adam Gandelman wrote:
> Ivan wrote:
> > 1. # umount block devices
> Only needed on the split-brain victim (secondary in this case) if your
> CRM's brain also split and you found drbd promoted and filesystem mounted.
> > 2. # disconnect all resources on both nodes
> > $ drbdadm disconnect all
> Not needed on primary since it already is disconnected (StandAlone)
> > 3. # force both nodes to be secondary
> > $ drbdadm secondary all
> Again, only needed on the victim if you found it promoted.
> > 4. # select slave drive and tell it to drop all data
> > $ drbdadm -- --discard-my-data connect resource
> > to force all resources on the secondary node ( bad ) to be secondary
> > and to drop all date.
> It already is secondary and will reconnect to its peer and attempt to
> sync up what data is needed to get back UpToDate
> > 5. # select source and master mode and start synchronisation.
> > $ drbdadm -- --overwrite-data-of-peer primary resource
> This will initiate a FULL resync. Not needed, just reconnect and begin
No, it will likely be a no-op ;)
the "--overwrite-data-of-peer" thing is only needed if you
want to force something to primary that otherwise would
refuse, i.e. on an Inconsistent or Outdated device.
Otherwise, this option is simply ignored.
You should _NEVER_ need to use it but for the initial full sync,
or possibly for data recovery after everything went wrong,
and you have no more UpToDate copy of data left.
It does not affect the amount of data to be resynced.
> > 6. # Start synchronisation on the source ( master ) node
> > drbdadm connect resource
> > I would greatly appreciate if you can answer my questions.
> > 1. Any comments on the procedure?
> > 2. How do I know if --discard-my-date option is necessary ?
> One node has outdated data. This will designate that node as the victim.
> > 3. After DRBD starts process of synchronisation, can I mount block
> > devises on the master node, or do I have to wait until synchronisation
> > is completed?
> You shouldn't need to unmount, demote or otherwise stop services on the
> primary during any of this.
> Also, look into notify-split-brain.sh and crm-fence-peer.sh or dopd.
And all of this is explained in the appropriate sections
in the DRBD User's Guide.
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list -- I'm subscribed
More information about the drbd-user