Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Feb 10, 2010 at 09:46:45AM -0500, Petrakis, Peter wrote: > Hi All, > > We're encountering a resync problem with 8.3.6 where after we resync, > the target node transitions to UpToDate, which the peer sees, but then > another state transition happens that claims the pdsk state is UpToDate > -> Inconsitent. The circumstances surrounding the fault were we lost > connectivity to our peer which was then rebooted, after which point the > resync began. > > Here's the config (same for all resources): > /sbin/drbdsetup /dev/drbd16 show > > disk { > size 0s _is_default; # bytes > on-io-error detach; > fencing dont-care _is_default; > max-bio-bvecs 0 _is_default; > } > net { > timeout 60 _is_default; # 1/10 seconds > max-epoch-size 2048 _is_default; > max-buffers 2048 _is_default; > unplug-watermark 128 _is_default; > connect-int 10 _is_default; # seconds > ping-int 10 _is_default; # seconds > sndbuf-size 0 _is_default; # bytes > rcvbuf-size 0 _is_default; # bytes > ko-count 2; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri violently-as0p; > after-sb-2pri violently-as0p; > rr-conflict violently; > ping-timeout 20; # 1/10 seconds > } > syncer { > rate 30720k; # bytes/second > after 15; > al-extents 709; > } > protocol C; > _this_host { > device minor 16; > disk "/dev/disk-drbd16"; > meta-disk internal; > address ipv4 169.254.84.220:8916; > } > _remote_host { > address ipv4 169.254.214.196:8916; > } > > and the log snippets from both sides, I have full logs if needed. I > tried sending them > to the list, even zipped I can't get them across. But thats still no need to not do correct line wraps when pasting ;) > (Source) > > > Feb 6 01:57:13 node0 kernel: block drbd16: Starting asender thread (from drbd16_receiver [4790]) > Feb 6 01:57:13 node0 kernel: block drbd16: data-integrity-alg: <not-used> > Feb 6 01:57:13 node0 kernel: block drbd16: drbd_sync_handshake: > Feb 6 01:57:13 node0 kernel: block drbd16: self 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0 > Feb 6 01:57:13 node0 kernel: block drbd16: peer 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0 > Feb 6 01:57:13 node0 kernel: block drbd16: uuid_compare()=1 by rule 70 > Feb 6 01:57:13 node0 kernel: block drbd16: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) > Feb 6 01:57:13 node0 kernel: block drbd16: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate -> Inconsistent ) > Feb 6 01:57:13 node0 kernel: block drbd16: Began resync as PausedSyncS (will sync 0 KB [0 bits set]). > Feb 6 01:57:14 node0 kernel: block drbd16: aftr_isp( 1 -> 0 ) > Feb 6 01:57:15 node0 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec) > Feb 6 01:57:15 node0 kernel: block drbd16: conn( PausedSyncS -> Connected ) pdsk( Inconsistent -> UpToDate ) > Feb 6 01:57:15 node0 kernel: block drbd16: pdsk( UpToDate -> Inconsistent ) peer_isp( 1 -> 0 ) "interesting". > (Target) > > > Feb 6 01:57:13 node1 kernel: block drbd16: Starting asender thread (from drbd16_receiver [18186]) > Feb 6 01:57:13 node1 kernel: block drbd16: data-integrity-alg: <not-used> > Feb 6 01:57:13 node1 kernel: block drbd16: drbd_sync_handshake: > Feb 6 01:57:13 node1 kernel: block drbd16: self 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0 > Feb 6 01:57:13 node1 kernel: block drbd16: peer 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0 > Feb 6 01:57:13 node1 kernel: block drbd16: uuid_compare()=-1 by rule 50 > Feb 6 01:57:13 node1 kernel: block drbd16: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) > Feb 6 01:57:13 node1 kernel: block drbd16: conn( WFBitMapT -> WFSyncUUID ) > Feb 6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 > Feb 6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 exit code 0 (0x0) > Feb 6 01:57:13 node1 kernel: block drbd16: conn( WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent ) > Feb 6 01:57:13 node1 kernel: block drbd16: Began resync as PausedSyncT (will sync 0 KB [0 bits set]). > Feb 6 01:57:14 node1 kernel: block drbd16: aftr_isp( 1 -> 0 ) > Feb 6 01:57:15 node1 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec) > Feb 6 01:57:15 node1 kernel: block drbd16: conn( PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate ) > Feb 6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16 > Feb 6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16 exit code 0 (0x0) > Feb 6 01:57:15 node1 kernel: block drbd16: peer_isp( 1 -> 0 ) > > We're trying to reproduce it now but haven't had any success so far. Good. > Any ideas? Thanks. Nope. Not seen that one so far. If it is real, it has to be a race condition when exchanging the state information about "paused" flag state changes. Once you find a way to reproduce, let us know. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed