Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Feb 10, 2010 at 09:46:45AM -0500, Petrakis, Peter wrote:
> Hi All,
>
> We're encountering a resync problem with 8.3.6 where after we resync,
> the target node transitions to UpToDate, which the peer sees, but then
> another state transition happens that claims the pdsk state is UpToDate
> -> Inconsitent. The circumstances surrounding the fault were we lost
> connectivity to our peer which was then rebooted, after which point the
> resync began.
>
> Here's the config (same for all resources):
> /sbin/drbdsetup /dev/drbd16 show
>
> disk {
> size 0s _is_default; # bytes
> on-io-error detach;
> fencing dont-care _is_default;
> max-bio-bvecs 0 _is_default;
> }
> net {
> timeout 60 _is_default; # 1/10 seconds
> max-epoch-size 2048 _is_default;
> max-buffers 2048 _is_default;
> unplug-watermark 128 _is_default;
> connect-int 10 _is_default; # seconds
> ping-int 10 _is_default; # seconds
> sndbuf-size 0 _is_default; # bytes
> rcvbuf-size 0 _is_default; # bytes
> ko-count 2;
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri violently-as0p;
> after-sb-2pri violently-as0p;
> rr-conflict violently;
> ping-timeout 20; # 1/10 seconds
> }
> syncer {
> rate 30720k; # bytes/second
> after 15;
> al-extents 709;
> }
> protocol C;
> _this_host {
> device minor 16;
> disk "/dev/disk-drbd16";
> meta-disk internal;
> address ipv4 169.254.84.220:8916;
> }
> _remote_host {
> address ipv4 169.254.214.196:8916;
> }
>
> and the log snippets from both sides, I have full logs if needed. I
> tried sending them
> to the list, even zipped I can't get them across.
But thats still no need to not do correct line wraps when pasting ;)
> (Source)
>
>
> Feb 6 01:57:13 node0 kernel: block drbd16: Starting asender thread (from drbd16_receiver [4790])
> Feb 6 01:57:13 node0 kernel: block drbd16: data-integrity-alg: <not-used>
> Feb 6 01:57:13 node0 kernel: block drbd16: drbd_sync_handshake:
> Feb 6 01:57:13 node0 kernel: block drbd16: self 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0
> Feb 6 01:57:13 node0 kernel: block drbd16: peer 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0
> Feb 6 01:57:13 node0 kernel: block drbd16: uuid_compare()=1 by rule 70
> Feb 6 01:57:13 node0 kernel: block drbd16: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
> Feb 6 01:57:13 node0 kernel: block drbd16: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate -> Inconsistent )
> Feb 6 01:57:13 node0 kernel: block drbd16: Began resync as PausedSyncS (will sync 0 KB [0 bits set]).
> Feb 6 01:57:14 node0 kernel: block drbd16: aftr_isp( 1 -> 0 )
> Feb 6 01:57:15 node0 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec)
> Feb 6 01:57:15 node0 kernel: block drbd16: conn( PausedSyncS -> Connected ) pdsk( Inconsistent -> UpToDate )
> Feb 6 01:57:15 node0 kernel: block drbd16: pdsk( UpToDate -> Inconsistent ) peer_isp( 1 -> 0 )
"interesting".
> (Target)
>
>
> Feb 6 01:57:13 node1 kernel: block drbd16: Starting asender thread (from drbd16_receiver [18186])
> Feb 6 01:57:13 node1 kernel: block drbd16: data-integrity-alg: <not-used>
> Feb 6 01:57:13 node1 kernel: block drbd16: drbd_sync_handshake:
> Feb 6 01:57:13 node1 kernel: block drbd16: self 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0
> Feb 6 01:57:13 node1 kernel: block drbd16: peer 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0
> Feb 6 01:57:13 node1 kernel: block drbd16: uuid_compare()=-1 by rule 50
> Feb 6 01:57:13 node1 kernel: block drbd16: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
> Feb 6 01:57:13 node1 kernel: block drbd16: conn( WFBitMapT -> WFSyncUUID )
> Feb 6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16
> Feb 6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 exit code 0 (0x0)
> Feb 6 01:57:13 node1 kernel: block drbd16: conn( WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent )
> Feb 6 01:57:13 node1 kernel: block drbd16: Began resync as PausedSyncT (will sync 0 KB [0 bits set]).
> Feb 6 01:57:14 node1 kernel: block drbd16: aftr_isp( 1 -> 0 )
> Feb 6 01:57:15 node1 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec)
> Feb 6 01:57:15 node1 kernel: block drbd16: conn( PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate )
> Feb 6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16
> Feb 6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16 exit code 0 (0x0)
> Feb 6 01:57:15 node1 kernel: block drbd16: peer_isp( 1 -> 0 )
>
> We're trying to reproduce it now but haven't had any success so far.
Good.
> Any ideas? Thanks.
Nope. Not seen that one so far.
If it is real, it has to be a race condition when exchanging the state
information about "paused" flag state changes.
Once you find a way to reproduce, let us know.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed