Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, 26 Nov 2004, Philipp Reisner wrote: [...] > I think that I have resoved the issue of Eugene Crosser by now, it > should be solved with the patch applied to this e-mail. I am waiting > for Eugene to confirm that the issue is solved now. > > I do not think it has anything to do with LVM or not LVM. It has > to do with wether you have application IO during the _start_ of > the resync process or not. Ok, interesting. > I will write a longish exlanation of the bug and the fix to the list > as soon as I have either the confirmation of Eugene or I found > the time to reproduce it here in the office > > ( Today is some outbrak of some stupid windows worm, and we have > to take care of the system's of our paying customers first... ) > > If you cound confirm this behavior (bug triggered by app IO during > start of resync) and that p4 fixes it, this would help a lot... I've tried doing some heavy write IO on the primary side while issuing invalidate on the secondary node. However, everytime both nodes have had the same number of bytes that needs to be synced in their logs and the sync has completed without getting stuck. I'll try this some more, but so far it seems I unable to trigger this. Another minor thing, not really a problem as far as I can see but might be interesting to know about. If I drop the connection (doing "drbdadm down all") on the secondary node while it is syncing I trigger a D_ASSERT in drbd_receiver.c. Nov 26 12:47:22 Proc1 kernel: drbd0: size = 62 GB (65928176 KB) Nov 26 12:47:23 Proc1 kernel: drbd0: 62 GB marked out-of-sync by on disk bit-map. Nov 26 12:47:23 Proc1 kernel: drbd0: Found 4 transactions (248 active extents) in activity log. Nov 26 12:47:23 Proc1 kernel: drbd0: drbdsetup [2505]: cstate Unconfigured --> StandAlone Nov 26 12:47:23 Proc1 kernel: drbd0: drbdsetup [2507]: cstate StandAlone --> Unconnected Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate Unconnected --> WFConnection Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFConnection --> WFReportParams Nov 26 12:47:23 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Nov 26 12:47:23 Proc1 kernel: drbd0: Connection established. Nov 26 12:47:23 Proc1 kernel: drbd0: I am(S): 0:00000007:00000001:00000019:0000000a:01 Nov 26 12:47:23 Proc1 kernel: drbd0: Peer(P): 1:00000007:00000001:0000001a:0000000a:10 Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFReportParams --> WFBitMapT Nov 26 12:47:23 Proc1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary Nov 26 12:47:23 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate WFBitMapT --> SyncTarget Nov 26 12:47:23 Proc1 kernel: drbd0: Resync started as SyncTarget (need to sync 65349168 KB [16337292 bits set]). Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate SyncTarget --> Unconnected Nov 26 12:48:36 Proc1 kernel: drbd0: /home/per/drbd-0.7.4/drbd/drbd_receiver.c:895: Unconnected flags=0x3032 Nov 26 12:48:36 Proc1 kernel: drbd0: asender terminated Nov 26 12:48:36 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate Unconnected --> BrokenPipe Nov 26 12:48:36 Proc1 kernel: drbd0: short read receiving data block: read 640 expected 4096 Nov 26 12:48:36 Proc1 kernel: drbd0: error receiving RSDataReply, l: 4112! Nov 26 12:48:36 Proc1 kernel: drbd0: ASSERT( mdev->resync_work.cb == w_resync_inactive ) in /home/per/drbd-0.7.4/drbd/drbd_receiver.c:1760 Nov 26 12:48:36 Proc1 kernel: drbd0: worker terminated Nov 26 12:48:36 Proc1 kernel: drbd0: unacked_cnt = 9 Nov 26 12:48:36 Proc1 kernel: drbd0: drbd0_receiver [2508]: cstate BrokenPipe --> StandAlone Nov 26 12:48:36 Proc1 kernel: drbd0: Connection lost. Nov 26 12:48:36 Proc1 kernel: drbd0: receiver terminated Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate StandAlone --> StandAlone Nov 26 12:48:36 Proc1 kernel: drbd0: drbdsetup [2529]: cstate StandAlone --> Unconfigured Nov 26 12:48:36 Proc1 kernel: drbd0: worker terminated /Per