Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > On Mon, May 18, 2009 at 05:18:41PM +0300, Eli Dorfman (Voltaire) wrote: >> Lars Ellenberg wrote: >>> On Sun, May 17, 2009 at 05:12:50PM +0300, Eli Dorfman (Voltaire) wrote: >>>> Lars Ellenberg wrote: >>>>> On Thu, May 14, 2009 at 04:45:44PM +0300, Eli Dorfman (Voltaire) wrote: >>>>>> Lars Ellenberg wrote: >>>>>>> On Wed, May 13, 2009 at 06:35:43PM +0300, Eli Dorfman (Voltaire) wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Assuming a setup of 2 nodes primary A and secondary B. >>>>>>>> After primary node A reboot and B became primary, is there a way to avoid full resync >>>>>>>> from B to A? >>>>>>> Uhm, well, yes: just do nothing and let DRBD do its job ;) >>>>>>> Bitmap based resync is the "normal" resync operation. >>>>>>> >>>>>> After A was rebooted and B had almost no changes - >>>>>> yet when A goes up again it seems that B (drbd) performs full resync. >>>>> what makes you think so? >>>> That's what I see in /proc/drbd. >>>> we have a partition of 20GB and from the link speed and the time till process is completed, >>>> it seems that drbd performs full resync. >>>> >>>>>> Is this correct? >>>>> it would not be expected. >>>> So what could be the reason for this full resync? >>> I still doubt the full sync, though. >>> care to show logs? >>> >> After rebooting node A these are the messages on B: >> >> May 18 19:12:23 optimus2 kernel: drbd0: PingAck did not arrive in time. >> May 18 19:12:23 optimus2 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) >> May 18 19:12:23 optimus2 kernel: drbd0: asender terminated >> May 18 19:12:23 optimus2 kernel: drbd0: Terminating asender thread >> May 18 19:12:23 optimus2 kernel: drbd0: short read expecting header on sock: r=-512 >> May 18 19:12:23 optimus2 kernel: drbd0: Writing meta data super block now. >> May 18 19:12:23 optimus2 kernel: drbd0: tl_clear() >> May 18 19:12:23 optimus2 kernel: drbd0: Connection closed >> May 18 19:12:23 optimus2 kernel: drbd0: conn( NetworkFailure -> Unconnected ) >> May 18 19:12:23 optimus2 kernel: drbd0: receiver terminated >> May 18 19:12:23 optimus2 kernel: drbd0: receiver (re)started >> May 18 19:12:23 optimus2 kernel: drbd0: conn( Unconnected -> WFConnection ) >> May 18 19:13:16 optimus2 ResourceManager[30674]: [30685]: info: Acquiring resource group: optimus2 172.30.3.99/24/eth0/172.30.3.255 drbddisk::ufmdb Filesystem::/dev/drbd0::/opt/ufm/files/::ext3 mysqld ufmd >> May 18 19:13:16 optimus2 kernel: drbd0: role( Secondary -> Primary ) >> May 18 19:13:16 optimus2 kernel: drbd0: Writing meta data super block now. >> May 18 19:13:16 optimus2 kernel: drbd0: Creating new current UUID >> May 18 19:13:16 optimus2 kernel: drbd0: Writing meta data super block now. >> May 18 19:13:16 optimus2 ResourceManager[30674]: [30965]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt/ufm/files/ ext3 start >> May 18 19:13:16 optimus2 Filesystem[30978]: [31008]: INFO: Running start for /dev/drbd0 on /opt/ufm/files >> May 18 19:13:16 optimus2 kernel: EXT3 FS on drbd0, internal journal >> May 18 19:13:56 optimus2 kernel: drbd0: Handshake successful: Agreed network protocol version 88 >> May 18 19:13:56 optimus2 kernel: drbd0: conn( WFConnection -> WFReportParams ) >> May 18 19:13:56 optimus2 kernel: drbd0: Starting asender thread (from drbd0_receiver [22112]) >> May 18 19:13:56 optimus2 kernel: drbd0: data-integrity-alg: <not-used> >> May 18 19:13:56 optimus2 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) >> May 18 19:13:56 optimus2 kernel: drbd0: Writing meta data super block now. >> May 18 19:13:56 optimus2 kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) >> May 18 19:13:56 optimus2 kernel: drbd0: Began resync as SyncSource (will sync 20482176 KB [5120544 bits set]). >> May 18 19:13:56 optimus2 kernel: drbd0: Writing meta data super block now. >> >> >> [root at optimus2 ~]# cat /proc/drbd >> version: 8.2.6 (api:88/proto:86-88) >> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn at c5-x8664-build, 2008-06-21 08:48:13 >> 0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r--- >> ns:2244236 nr:482168 dw:483008 dr:2256073 al:18 bm:136 lo:3 pe:5 ua:253 ap:0 oos:18238236 >> [=>..................] sync'ed: 11.0% (17810/20002)M >> finish: 0:24:30 speed: 12,320 (11,504) K/sec >> >> >> It seems as if drbd performs full resync of node A - why? >> The link is 100 Mb so drbd is actually using the maximal bandwidth. > > for some reason (which only you can determine, as you know what happened), > "Began resync as SyncSource (will sync 20482176 KB [5120544 bits set])" > that much bits are set, so that much will be synced. > > either you modified that much within 40 seconds (highly unlikely), > or you tried something "interessting", and it did not work out. > > have you been trying to outsmart DRBD? > > what exactly is the sequence of events to reproduce this? > something "interessting" in the logs on the other node? We only rebooted node A (reboot -f) These are the messages from node A May 18 17:14:17 shay2 kernel: drbd0: disk( Diskless -> Attaching ) May 18 17:14:17 shay2 kernel: drbd0: Starting worker thread (from cqueue/3 [155]) May 18 17:14:17 shay2 kernel: drbd0: Found 4 transactions (34 active extents) in activity log. May 18 17:14:17 shay2 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 May 18 17:14:17 shay2 kernel: drbd0: drbd_bm_resize called with capacity == 40964352 May 18 17:14:17 shay2 kernel: drbd0: resync bitmap: bits=5120544 words=80009 May 18 17:14:17 shay2 kernel: drbd0: size = 20 GB (20482176 KB) May 18 17:14:17 shay2 kernel: drbd0: reading of bitmap took 15 jiffies May 18 17:14:17 shay2 kernel: drbd0: recounting of set bits took additional 2 jiffies May 18 17:14:17 shay2 kernel: drbd0: 20 GB (5120544 bits) marked out-of-sync by on disk bit-map. May 18 17:14:17 shay2 kernel: drbd0: Marked additional 0 KB as out-of-sync based on AL. May 18 17:14:17 shay2 kernel: drbd0: disk( Attaching -> UpToDate ) May 18 17:14:17 shay2 kernel: drbd0: Writing meta data super block now. May 18 17:14:17 shay2 kernel: drbd0: conn( StandAlone -> Unconnected ) May 18 17:14:17 shay2 kernel: drbd0: Starting receiver thread (from drbd0_worker [5891]) May 18 17:14:17 shay2 kernel: drbd0: receiver (re)started May 18 17:14:17 shay2 kernel: drbd0: conn( Unconnected -> WFConnection ) May 18 17:14:18 shay2 kernel: drbd0: Handshake successful: Agreed network protocol version 88 May 18 17:14:18 shay2 kernel: drbd0: conn( WFConnection -> WFReportParams ) May 18 17:14:18 shay2 kernel: drbd0: Starting asender thread (from drbd0_receiver [5900]) May 18 17:14:18 shay2 kernel: drbd0: data-integrity-alg: <not-used> May 18 17:14:18 shay2 kernel: drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) May 18 17:14:18 shay2 kernel: drbd0: Writing meta data super block now. May 18 17:14:18 shay2 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) May 18 17:14:18 shay2 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target May 18 17:14:18 shay2 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) May 18 17:14:18 shay2 kernel: drbd0: Began resync as SyncTarget (will sync 20482176 KB [5120544 bits set]). May 18 17:14:18 shay2 kernel: drbd0: Writing meta data super block now. May 18 17:21:19 shay2 kernel: drbd0: peer( Primary -> Secondary ) Thanks, Eli