[DRBD-user] Question: how to avoid full resync

Lars Ellenberg lars.ellenberg at linbit.com
Mon May 18 17:48:51 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, May 18, 2009 at 05:18:41PM +0300, Eli Dorfman (Voltaire) wrote:
> Lars Ellenberg wrote:
> > On Sun, May 17, 2009 at 05:12:50PM +0300, Eli Dorfman (Voltaire) wrote:
> >> Lars Ellenberg wrote:
> >>> On Thu, May 14, 2009 at 04:45:44PM +0300, Eli Dorfman (Voltaire) wrote:
> >>>> Lars Ellenberg wrote:
> >>>>> On Wed, May 13, 2009 at 06:35:43PM +0300, Eli Dorfman (Voltaire) wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Assuming a setup of 2 nodes primary A and secondary B.
> >>>>>> After primary node A reboot and B became primary, is there a way to avoid full resync
> >>>>>> from B to A?
> >>>>> Uhm, well, yes: just do nothing and let DRBD do its job ;)
> >>>>> Bitmap based resync is the "normal" resync operation.
> >>>>>
> >>>> After A was rebooted and B had almost no changes - 
> >>>> yet when A goes up again it seems that B (drbd) performs full resync.  
> >>> what makes you think so?
> >> That's what I see in /proc/drbd.
> >> we have a partition of 20GB and from the link speed and the time till process is completed,
> >> it seems that drbd performs full resync.
> >>
> >>>> Is this correct?
> >>> it would not be expected.
> >> So what could be the reason for this full resync?
> > 
> > I still doubt the full sync, though.
> > care to show logs?
> > 
> 
> After rebooting node A these are the messages on B:
> 
> May 18 19:12:23 optimus2 kernel: drbd0: PingAck did not arrive in time.
> May 18 19:12:23 optimus2 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> May 18 19:12:23 optimus2 kernel: drbd0: asender terminated
> May 18 19:12:23 optimus2 kernel: drbd0: Terminating asender thread
> May 18 19:12:23 optimus2 kernel: drbd0: short read expecting header on sock: r=-512
> May 18 19:12:23 optimus2 kernel: drbd0: Writing meta data super block now.
> May 18 19:12:23 optimus2 kernel: drbd0: tl_clear()
> May 18 19:12:23 optimus2 kernel: drbd0: Connection closed
> May 18 19:12:23 optimus2 kernel: drbd0: conn( NetworkFailure -> Unconnected )
> May 18 19:12:23 optimus2 kernel: drbd0: receiver terminated
> May 18 19:12:23 optimus2 kernel: drbd0: receiver (re)started
> May 18 19:12:23 optimus2 kernel: drbd0: conn( Unconnected -> WFConnection )
> May 18 19:13:16 optimus2 ResourceManager[30674]: [30685]: info: Acquiring resource group: optimus2 172.30.3.99/24/eth0/172.30.3.255 drbddisk::ufmdb Filesystem::/dev/drbd0::/opt/ufm/files/::ext3 mysqld ufmd
> May 18 19:13:16 optimus2 kernel: drbd0: role( Secondary -> Primary )
> May 18 19:13:16 optimus2 kernel: drbd0: Writing meta data super block now.
> May 18 19:13:16 optimus2 kernel: drbd0: Creating new current UUID
> May 18 19:13:16 optimus2 kernel: drbd0: Writing meta data super block now.
> May 18 19:13:16 optimus2 ResourceManager[30674]: [30965]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt/ufm/files/ ext3 start
> May 18 19:13:16 optimus2 Filesystem[30978]: [31008]: INFO: Running start for /dev/drbd0 on /opt/ufm/files
> May 18 19:13:16 optimus2 kernel: EXT3 FS on drbd0, internal journal
> May 18 19:13:56 optimus2 kernel: drbd0: Handshake successful: Agreed network protocol version 88
> May 18 19:13:56 optimus2 kernel: drbd0: conn( WFConnection -> WFReportParams )
> May 18 19:13:56 optimus2 kernel: drbd0: Starting asender thread (from drbd0_receiver [22112])
> May 18 19:13:56 optimus2 kernel: drbd0: data-integrity-alg: <not-used>
> May 18 19:13:56 optimus2 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> May 18 19:13:56 optimus2 kernel: drbd0: Writing meta data super block now.
> May 18 19:13:56 optimus2 kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
> May 18 19:13:56 optimus2 kernel: drbd0: Began resync as SyncSource (will sync 20482176 KB [5120544 bits set]).
> May 18 19:13:56 optimus2 kernel: drbd0: Writing meta data super block now.
> 
> 
> [root at optimus2 ~]# cat /proc/drbd
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn at c5-x8664-build, 2008-06-21 08:48:13
>  0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
>     ns:2244236 nr:482168 dw:483008 dr:2256073 al:18 bm:136 lo:3 pe:5 ua:253 ap:0 oos:18238236
>         [=>..................] sync'ed: 11.0% (17810/20002)M
>         finish: 0:24:30 speed: 12,320 (11,504) K/sec
> 
> 
> It seems as if drbd performs full resync of node A - why?
> The link is 100 Mb so drbd is actually using the maximal bandwidth.

for some reason (which only you can determine, as you know what happened),
"Began resync as SyncSource (will sync 20482176 KB [5120544 bits set])"
that much bits are set, so that much will be synced.

either you modified that much within 40 seconds (highly unlikely),
or you tried something "interessting", and it did not work out.

have you been trying to outsmart DRBD?

what exactly is the sequence of events to reproduce this?
something "interessting" in the logs on the other node?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list