[DRBD-user] Resizing DRBD/LVM, stuck in WFSyncUUID

Mike Sweetser - Adhost mikesw at adhost.com
Thu May 7 20:28:07 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Lars,

Thanks for the information - what will happen to anything trying to
write to the resource when I run suspend-io?  Will it simply hang until
resume-io is run?

Thanks!

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Thursday, May 07, 2009 12:45 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Resizing DRBD/LVM, stuck in WFSyncUUID

On Wed, May 06, 2009 at 04:22:35PM -0700, Mike Sweetser - Adhost wrote:
> Hello:
> 
> I'm doing some testing of resizing a LVM-based DRBD partition.  I've
> successfully resized the LVM, and then resized the partition via
DRBDADM
> (the partition is named part3)
> 
> drbdadm resize part3
> 
> I see the following on the Primary server, and everything is OK.
> 
> May  6 16:15:19 SERVER1 kernel: drbd4: drbd_bm_resize called with
> capacity == 31457280
> May  6 16:15:19 SERVER1 kernel: drbd4: resync bitmap: bits=3932160
> words=122880
> May  6 16:15:19 SERVER1 kernel: drbd4: size = 15 GB (15728640 KB)
> May  6 16:15:42 SERVER1 kernel: drbd4: Writing the whole bitmap, size
> changed
> 
> However, I see this on the Secondary server, and it's stuck in
> WFSyncUUID:
> 
> May  6 16:15:18 SERVER2 kernel: drbd4: drbd_bm_resize called with
> capacity == 31457280
> May  6 16:15:18 SERVER2 kernel: drbd4: resync bitmap: bits=3932160
> words=122880
> May  6 16:15:18 SERVER2 kernel: drbd4: size = 15 GB (15728640 KB)
> May  6 16:15:18 SERVER2 kernel: drbd4: Writing the whole bitmap, size
> changed
> May  6 16:15:19 SERVER2 kernel: drbd4: writing of bitmap took 1376
> jiffies
> May  6 16:15:19 SERVER2 kernel: drbd4: 10 GB (2621440 bits) marked
> out-of-sync by on disk bit-map.
> May  6 16:15:19 SERVER2 kernel: drbd4: Writing meta data super block
> now.
> May  6 16:15:19 SERVER2 kernel: drbd4: No resync, but 2621440 bits in
> bitmap!
> May  6 16:15:19 SERVER2 kernel: drbd4: bm_set was 2621440, corrected
to
> 2621472. /usr/local/src/drbd-8.2.6/drbd/drbd_receiver.c:2144
>
> May  6 16:15:19 SERVER2 kernel: drbd4: Resync of new storage after
> online grow
> May  6 16:15:19 SERVER2 kernel: drbd4: conn( Connected -> WFSyncUUID )
> 
> Seven minutes later, it's still in WFSyncUUID on the Secondary.
> 
> Am I missing a step?  Is something possibly configured wrong on my
end?
> Help? :)

there has been an unlikely but possible "wait-for-ever" condition in
some versions of DRBD if the connection or resync handshake happens
while there is IO in flight.
get out of WFSync*: try drbdadm disconnect, then reconnect.
if the disconnect does not work, cut the tcp connection by other means
(e.g. iptables reject, or ifdown)

workaround to make the race impossible:
	drbdadm suspend-io
	do-the-interessting-stuff-here
	drbdadm resume-io

fix:
upgrade to 8.3 ;)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list