[DRBD-user] Cannot force node to be Primary Connected after kernel panic

Lars Ellenberg lars.ellenberg at linbit.com
Wed Mar 11 19:14:45 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Mar 10, 2009 at 10:36:19AM -0700, Jeff Orr wrote:
> So I was trying to upgrade the primary in one of our DRBD pairs last
> night. The cluster manager moved the disk and virtual IP to the slave,
> which subsequently crashed with a kernel panic (in XFS). I moved the
> DRBD mount and virtual IP back to the primary, so services are up. But
> now I am presented with this message on attempting to reconnect to the
> secondary:
> 
> drbd0: I shall become SyncTarget, but I am primary!
> 
> I tried forcing the secondary to discard its data with "--
> --discard-my-data secondary", as well as "-- --overwrite-data-of-peer
> primary" on the primary, but DRBD is flatly refusing to force the
> secondary to become SyncTarget. I would like to discard the last 24hr or
> so of changes on the secondary, but not resync the entire 6TB disk. Any
> ideas on how to proceed?
> 
> Here are the dmesg logs from primary:
> 
> drbd0: conn( StandAlone -> Unconnected )
> drbd0: Starting receiver thread (from drbd0_worker [6728])
> drbd0: receiver (re)started
> drbd0: conn( Unconnected -> WFConnection )
> drbd0: Handshake successful: Agreed network protocol version 89
> drbd0: conn( WFConnection -> WFReportParams )
> drbd0: Starting asender thread (from drbd0_receiver [12436])
> drbd0: data-integrity-alg: md5
> drbd0: drbd_sync_handshake:
> drbd0: self 4E2A0149690C1915:0CE9A5964B14BF3A:AC676DD67EC523BD:8435486C64177EB3
> drbd0: peer 4E2A0149690C1914:0000000000000000:0CE9A5964B14BF3A:AC676DD67EC523BD
> drbd0: uuid_compare()=-1 by rule 4
> drbd0: I shall become SyncTarget, but I am primary!

> drbd0: self 4E2A0149690C1915:0CE9A5964B14BF3A:AC676DD67EC523BD:8435486C64177EB3
> drbd0: peer 4E2A0149690C1914:0000000000000000:0CE9A5964B14BF3A:AC676DD67EC523BD


I have no idea how exactly you got into this situation.

but to recover from it, without doing a full sync,
you need to low-level modify the DRBD meta data uuids.

on the box where you want to throw away the data:
stop (unconfigure) drbd.
do
drbdadm -- 0CE9A5964B14BF3A:0000000000000000:AC676DD67EC523BD set-gi $resourcename
start (configure and connect) drbd.

should then do a bitmap based sync.

I would feel more comfortable with a full sync, though.
to do so, I'd probably "start from scratch",
and re-create drbd meta data on the "victim" node.

or at least to an online verify afterwards (we have had various issues
with online verify in the past, unfortunately, so use latest version).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list