[DRBD-user] borked split-brain recovery

Matt Davidson mdavidson at allureglobal.com
Thu Oct 4 15:56:06 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Well, my boss just tried `drbdadm connect all` on openfiler2 and the nodes are now syncing with 2 as source and 1 as target, so all is well in the world again.  Thanks guys!

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Thursday, October 04, 2012 9:05 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] borked split-brain recovery

On Wed, Oct 03, 2012 at 01:08:40PM -0700, mdavidson at allureglobal.co wrote:
> 
> sorry, i meant to say, when telling openfiler1 to connect, openfiler2 
> is designated as sync target

I have no idea how you managed to get yourself into that situation, but I sugget to re-create the drbd meta data on the "bad" node, and have it sync up from there.

on openfiler1,
	drbdadm down vg0_drbd 
	drbdadm -- --force create-md vg0_drbd
	drbdadm up vg0_drbd
on openfiler2
	drbdadm adjust vg0_drbd


Then have someone help you figure out what went wrong, and how to avoid that in the future...

	Lars


> mdavidson at allureglobal.co wrote:
> > 
> > in the middle of trying to manually recover from a split-brain, it 
> > seems i've created a little bit of a mess.  I'm using two openfiler 
> > machines with drbd as HA iscsi storage for a xenserver cluster as 
> > described here
> > http://www.howtoforge.com/installing-and-configuring-openfiler-with-
> > drbd-and-heartbeat-p2
> > http://www.howtoforge.com/installing-and-configuring-openfiler-with-
> > drbd-and-heartbeat-p2 .  I've managed to get the cluster_metadata 
> > resource syncing properly, but the actual data resource is being 
> > fussy.  Openfiler2 is currently primary and seems to be working fine 
> > as all my vm's are currently online.  I'd like to keep openfiler2 as 
> > the primary, but when i tell openfiler1 to connect the system 
> > designates openfiler1 as the sync target.  I'm rather new to drbd so 
> > if there's any other info i need to post please let me know
> > 
> > Openfiler1 status:
> > [root at openfiler1 log]# service drbd status drbd driver loaded OK; 
> > device status:
> > version: 8.3.7 (api:88/proto:86-91)
> > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by 
> > phil at fat-tyre,
> > 2010-01-13 17:17:27
> > m:res               cs            ro                 ds                    
> > p  mounted  fstype
> > 0:cluster_metadata  Connected     Secondary/Primary  UpToDate/UpToDate     
> > C
> > 1:vg0_drbd          WFConnection  Secondary/Unknown  Inconsistent/DUnknown 
> > C
> > [root at openfiler1 log]#
> > 
> > 
> > Openfiler2 status:
> > [root at openfiler2 ha.d]# service drbd status drbd driver loaded OK; 
> > device status:
> > version: 8.3.7 (api:88/proto:86-91)
> > GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by 
> > phil at fat-tyre,
> > 2010-01-13 17:17:27
> > m:res               cs          ro                 ds                    
> > p      mounted            fstype
> > 0:cluster_metadata  Connected   Primary/Secondary  UpToDate/UpToDate     
> > C      /cluster_metadata  ext3
> > 1:vg0_drbd          StandAlone  Primary/Unknown    UpToDate/Inconsistent 
> > r----
> > [root at openfiler2 ha.d]#
> > 
> > 
> > dmesg output from openfiler2:
> > [1287966.539911] block drbd1: Starting receiver thread (from 
> > drbd1_worker
> > [3145])
> > [1287966.540030] block drbd1: receiver (re)started [1287966.540047] 
> > block drbd1: conn( Unconnected -> WFConnection ) [1287966.639236] 
> > block drbd1: Handshake successful: Agreed network protocol version 
> > 91 [1287966.639246] block drbd1: conn( WFConnection -> 
> > WFReportParams ) [1287966.639282] block drbd1: Starting asender 
> > thread (from drbd1_receiver
> > [12115])
> > [1287966.639419] block drbd1: data-integrity-alg: <not-used> 
> > [1287966.639526] block drbd1: drbd_sync_handshake:
> > [1287966.639532] block drbd1: self
> > 89867987176E42C7:0000000000000000:C1B7F3C81019781C:2516E370EEC0B159
> > bits:29285 flags:0
> > [1287966.639538] block drbd1: peer
> > 80413839405F0B3A:89867987176E42C6:C1B7F3C81019781C:2516E370EEC0B159 
> > bits:0
> > flags:0
> > [1287966.639542] block drbd1: uuid_compare()=-1 by rule 50 
> > [1287966.639546] block drbd1: I shall become SyncTarget, but I am primary!
> > [1287966.639777] block drbd1: conn( WFReportParams -> Disconnecting 
> > ) [1287966.639785] block drbd1: error receiving ReportState, l: 4!
> > [1287966.640033] block drbd1: asender terminated [1287966.640042] 
> > block drbd1: Terminating asender thread [1287966.640209] block 
> > drbd1: Connection closed [1287966.640217] block drbd1: conn( 
> > Disconnecting -> StandAlone ) [1287966.640234] block drbd1: receiver 
> > terminated [1287966.640239] block drbd1: Terminating receiver thread
> > 
> > 
> 
> --
> View this message in context: 
> http://old.nabble.com/borked-split-brain-recovery-tp34510920p34510927.
> html Sent from the DRBD - User mailing list archive at Nabble.com.
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list