[DRBD-user] Sync stuck at 100%

Per Liden per at fukt.bth.se
Fri Nov 26 14:02:47 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, 26 Nov 2004, Philipp Reisner wrote:

[...]
> I think that I have resoved the issue of Eugene Crosser by now, it 
> should be solved with the patch applied to this e-mail. I am waiting
> for Eugene to confirm that the issue is solved now.
> 
> I do not think it has anything to do with LVM or not LVM. It has
> to do with wether you have application IO during the _start_ of
> the resync process or not.

Ok, interesting.

> I will write a longish exlanation of the bug and the fix to the list
> as soon as I have either the confirmation of Eugene or I found
> the time to reproduce it here in the office 
> 
> ( Today is some outbrak of some stupid windows worm, and we have
>   to take care of the system's of our paying customers first... )
> 
> If you cound confirm this behavior (bug triggered by app IO during
> start of resync) and that p4 fixes it, this would help a lot...

I've tried doing some heavy write IO on the primary side while issuing 
invalidate on the secondary node. However, everytime both nodes have had 
the same number of bytes that needs to be synced in their logs and the 
sync has completed without getting stuck. I'll try this some more, but so 
far it seems I unable to trigger this.


Another minor thing, not really a problem as far as I can see but might be 
interesting to know about. If I drop the connection (doing "drbdadm down 
all") on the secondary node while it is syncing I trigger a D_ASSERT in 
drbd_receiver.c.

Nov 26 12:47:22 Proc1 kernel: drbd0: size = 62 GB (65928176 KB)
Nov 26 12:47:23 Proc1 kernel: drbd0: 62 GB marked out-of-sync by on disk bit-map.
Nov 26 12:47:23 Proc1 kernel: drbd0: Found 4 transactions (248 active extents) in activity log.
Nov 26 12:47:23 Proc1 kernel: drbd0: drbdsetup [2505]: cstate Unconfigured --> StandAlone
Nov 26 12:47:23 Proc1 kernel: drbd0: drbdsetup [2507]: cstate StandAlone --> Unconnected
Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate Unconnected --> WFConnection
Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFConnection --> WFReportParams
Nov 26 12:47:23 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Nov 26 12:47:23 Proc1 kernel: drbd0: Connection established.
Nov 26 12:47:23 Proc1 kernel: drbd0: I am(S): 0:00000007:00000001:00000019:0000000a:01
Nov 26 12:47:23 Proc1 kernel: drbd0: Peer(P): 1:00000007:00000001:0000001a:0000000a:10
Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFReportParams --> WFBitMapT
Nov 26 12:47:23 Proc1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFBitMapT --> SyncTarget
Nov 26 12:47:23 Proc1 kernel: drbd0: Resync started as SyncTarget (need to sync 65349168 KB [16337292 bits set]).
Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate SyncTarget --> Unconnected
Nov 26 12:48:36 Proc1 kernel: drbd0: /home/per/drbd-0.7.4/drbd/drbd_receiver.c:895: Unconnected flags=0x3032
Nov 26 12:48:36 Proc1 kernel: drbd0: asender terminated
Nov 26 12:48:36 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate Unconnected --> BrokenPipe
Nov 26 12:48:36 Proc1 kernel: drbd0: short read receiving data block: read 640 expected 4096
Nov 26 12:48:36 Proc1 kernel: drbd0: error receiving RSDataReply, l: 4112!
Nov 26 12:48:36 Proc1 kernel: drbd0: ASSERT( mdev->resync_work.cb == w_resync_inactive ) in /home/per/drbd-0.7.4/drbd/drbd_receiver.c:1760
Nov 26 12:48:36 Proc1 kernel: drbd0: worker terminated
Nov 26 12:48:36 Proc1 kernel: drbd0: unacked_cnt = 9
Nov 26 12:48:36 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate BrokenPipe --> StandAlone
Nov 26 12:48:36 Proc1 kernel: drbd0: Connection lost.
Nov 26 12:48:36 Proc1 kernel: drbd0: receiver terminated
Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate StandAlone --> StandAlone
Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate StandAlone --> Unconfigured
Nov 26 12:48:36 Proc1 kernel: drbd0: worker terminated

/Per



More information about the drbd-user mailing list