[DRBD-user] drbd-8.3.6 pdsk: Uptodate->Inconsitent but is really uptodate after resync

Lars Ellenberg lars.ellenberg at linbit.com
Wed Feb 10 19:31:07 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Feb 10, 2010 at 09:46:45AM -0500, Petrakis, Peter wrote:
> Hi All,
> 
> We're encountering a resync problem with 8.3.6 where after we resync,
> the target node transitions to UpToDate, which the peer sees, but then
> another state transition happens that claims the pdsk state is UpToDate
> -> Inconsitent. The circumstances surrounding the fault were we lost
> connectivity to our peer which was then rebooted, after which point the
> resync began.
> 
> Here's the config (same for all resources):
> /sbin/drbdsetup /dev/drbd16 show
> 
> disk {                                                   
>         size                    0s _is_default; # bytes  
>         on-io-error             detach;                  
>         fencing                 dont-care _is_default;
>         max-bio-bvecs           0 _is_default;
> }
> net {
>         timeout                 60 _is_default; # 1/10 seconds
>         max-epoch-size          2048 _is_default;
>         max-buffers             2048 _is_default;
>         unplug-watermark        128 _is_default;
>         connect-int             10 _is_default; # seconds
>         ping-int                10 _is_default; # seconds
>         sndbuf-size             0 _is_default; # bytes
>         rcvbuf-size             0 _is_default; # bytes
>         ko-count                2;
>         allow-two-primaries;
>         after-sb-0pri           discard-zero-changes;
>         after-sb-1pri           violently-as0p;
>         after-sb-2pri           violently-as0p;
>         rr-conflict             violently;
>         ping-timeout            20; # 1/10 seconds
> }
> syncer {
>         rate                    30720k; # bytes/second
>         after                   15;
>         al-extents              709;
> }
> protocol C;
> _this_host {
>         device                  minor 16;
>         disk                    "/dev/disk-drbd16";
>         meta-disk               internal;
>         address                 ipv4 169.254.84.220:8916;
> }
> _remote_host {
>         address                 ipv4 169.254.214.196:8916;
> }
> 
> and the log snippets from both sides, I have full logs if needed. I
> tried sending them
> to the list, even zipped I can't get them across.

But thats still no need to not do correct line wraps when pasting ;)

> (Source)
> 
> 
> Feb  6 01:57:13 node0 kernel: block drbd16: Starting asender thread (from drbd16_receiver [4790])
> Feb  6 01:57:13 node0 kernel: block drbd16: data-integrity-alg: <not-used>
> Feb  6 01:57:13 node0 kernel: block drbd16: drbd_sync_handshake:
> Feb  6 01:57:13 node0 kernel: block drbd16: self 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0
> Feb  6 01:57:13 node0 kernel: block drbd16: peer 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0 
> Feb  6 01:57:13 node0 kernel: block drbd16: uuid_compare()=1 by rule 70 
> Feb  6 01:57:13 node0 kernel: block drbd16: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) 
> Feb  6 01:57:13 node0 kernel: block drbd16: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate -> Inconsistent ) 
> Feb  6 01:57:13 node0 kernel: block drbd16: Began resync as PausedSyncS (will sync 0 KB [0 bits set]). 
> Feb  6 01:57:14 node0 kernel: block drbd16: aftr_isp( 1 -> 0 ) 
> Feb  6 01:57:15 node0 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec) 
> Feb  6 01:57:15 node0 kernel: block drbd16: conn( PausedSyncS -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> Feb  6 01:57:15 node0 kernel: block drbd16: pdsk( UpToDate -> Inconsistent ) peer_isp( 1 -> 0 ) 

"interesting".


> (Target)
> 
> 
> Feb  6 01:57:13 node1 kernel: block drbd16: Starting asender thread (from drbd16_receiver [18186]) 
> Feb  6 01:57:13 node1 kernel: block drbd16: data-integrity-alg: <not-used> 
> Feb  6 01:57:13 node1 kernel: block drbd16: drbd_sync_handshake: 
> Feb  6 01:57:13 node1 kernel: block drbd16: self 3F4D478748F24FE6:0000000000000000:6E1F4F316DBF9290:0000000000000006 bits:0 flags:0 
> Feb  6 01:57:13 node1 kernel: block drbd16: peer 93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0000000000000006 bits:0 flags:0 
> Feb  6 01:57:13 node1 kernel: block drbd16: uuid_compare()=-1 by rule 50 
> Feb  6 01:57:13 node1 kernel: block drbd16: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) 
> Feb  6 01:57:13 node1 kernel: block drbd16: conn( WFBitMapT -> WFSyncUUID ) 
> Feb  6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 
> Feb  6 01:57:13 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 exit code 0 (0x0) 
> Feb  6 01:57:13 node1 kernel: block drbd16: conn( WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent ) 
> Feb  6 01:57:13 node1 kernel: block drbd16: Began resync as PausedSyncT (will sync 0 KB [0 bits set]). 
> Feb  6 01:57:14 node1 kernel: block drbd16: aftr_isp( 1 -> 0 ) 
> Feb  6 01:57:15 node1 kernel: block drbd16: Resync done (total 2 sec; paused 0 sec; 0 K/sec) 
> Feb  6 01:57:15 node1 kernel: block drbd16: conn( PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate ) 
> Feb  6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16 
> Feb  6 01:57:15 node1 kernel: block drbd16: helper command: /usr/lib/spine/bin/avance_drbd_helper after-resync-target minor-16 exit code 0 (0x0) 
> Feb  6 01:57:15 node1 kernel: block drbd16: peer_isp( 1 -> 0 ) 
> 
> We're trying to reproduce it now but haven't had any success so far.

Good.

> Any ideas? Thanks.

Nope. Not seen that one so far.
If it is real, it has to be a race condition when exchanging the state
information about "paused" flag state changes.

Once you find a way to reproduce, let us know.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list