Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have very similar problem here (SyncSource gets stuck at 100% after "invalidate all" on the other node). Would you tell us your exact hardware configuration and kernel version? Eugene Per Liden wrote: > Hi, > > I'm having problems with DRBD getting stuck at around 99-100% during an > initial/full sync. This seems to be happening about 8 out of 10 times. If > I do "drbdadm down all" on both sides and then "drbdadm up all", both > nodes connect just fine and both end up in a consistent state. But for > some reason drbd will not by itself detect that the sync has actually > completed. This is what it looks like when they get stuck: > > Proc1:~ # cat /proc/drbd > version: 0.7.4 (api:76/proto:74) > SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 > 0: cs:SyncSource st:Primary/Secondary ld:Consistent > ns:60558616 nr:0 dw:360 dr:60558461 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0 > [===================>] sync'ed: 99.6% (248/59387)M > finish: 4:45:21 speed: 12 (10,488) K/sec > > Proc2:~ # cat /proc/drbd > version: 0.7.4 (api:76/proto:74) > SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 > 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent > ns:0 nr:60558616 dw:60558616 dr:0 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0 > [===================>] sync'ed:100.0% (0/59139)M > finish: 0:00:00 speed: 16 (10,480) K/sec > > > /var/log/messages on Proc1: > ... > Nov 25 11:07:02 Proc1 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74) > Nov 25 11:07:02 Proc1 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 > Nov 25 11:07:02 Proc1 kernel: drbd: registered as block device major 147 > Nov 25 11:07:02 Proc1 kernel: drbd0: resync bitmap: bits=1251563 words=39112 > Nov 25 11:07:02 Proc1 kernel: drbd0: size = 4888 MB (5006250 KB) > Nov 25 11:07:02 Proc1 kernel: drbd0: 248 MB marked out-of-sync by on disk bit-map. > Nov 25 11:07:02 Proc1 kernel: drbd0: Found 4 transactions (64 active extents) in activity log. > Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1094]: cstate Unconfigured --> StandAlone > Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1096]: cstate StandAlone --> Unconnected > Nov 25 11:07:02 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate Unconnected --> WFConnection > Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFConnection --> WFReportParams > Nov 25 11:07:03 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 > Nov 25 11:07:03 Proc1 kernel: drbd0: resync bitmap: bits=16391168 words=512224 > Nov 25 11:07:03 Proc1 kernel: drbd0: size = 62 GB (65564672 KB) > Nov 25 11:07:03 Proc1 kernel: drbd0: Connection established. > Nov 25 11:07:03 Proc1 kernel: drbd0: I am(S): 1:00000005:00000003:00000091:0000004e:00 > Nov 25 11:07:03 Proc1 kernel: drbd0: Peer(S): 1:00000005:00000003:00000090:0000004e:00 > Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFReportParams --> WFBitMapS > Nov 25 11:07:03 Proc1 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary > Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFBitMapS --> SyncSource > Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]). > Nov 25 11:07:18 Proc1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary > Nov 25 11:15:38 Proc1 kernel: drbd0: [drbd0_worker/1095] sock_sendmsg time expired, ko = 4294967295 > > /var/log/messages on Proc2: > ... > Nov 25 11:07:02 Proc2 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74) > Nov 25 11:07:02 Proc2 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 > Nov 25 11:07:02 Proc2 kernel: drbd: registered as block device major 147 > Nov 25 11:07:02 Proc2 kernel: drbd0: resync bitmap: bits=1251563 words=39112 > Nov 25 11:07:02 Proc2 kernel: drbd0: size = 4888 MB (5006250 KB) > Nov 25 11:07:02 Proc2 kernel: drbd0: 80 KB marked out-of-sync by on disk bit-map. > Nov 25 11:07:02 Proc2 kernel: drbd0: Found 4 transactions (52 active extents) in activity log. > Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1105]: cstate Unconfigured --> StandAlone > Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1107]: cstate StandAlone --> Unconnected > Nov 25 11:07:02 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate Unconnected --> WFConnection > Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFConnection --> WFReportParams > Nov 25 11:07:03 Proc2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 > Nov 25 11:07:03 Proc2 kernel: drbd0: resync bitmap: bits=16391168 words=512224 > Nov 25 11:07:03 Proc2 kernel: drbd0: size = 62 GB (65564672 KB) > Nov 25 11:07:03 Proc2 kernel: drbd0: Connection established. > Nov 25 11:07:03 Proc2 kernel: drbd0: I am(S): 1:00000005:00000003:00000090:0000004e:00 > Nov 25 11:07:03 Proc2 kernel: drbd0: Peer(S): 1:00000005:00000003:00000091:0000004e:00 > Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFReportParams --> WFBitMapT > Nov 25 11:07:03 Proc2 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary > Nov 25 11:07:04 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFBitMapT --> SyncTarget > Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]). > Nov 25 11:07:18 Proc2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary > > > Interesting to note is that the nodes seem to have different ideas about > how much data needs to be synchronized, i.e.: > Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]). > vs. > Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]). > > The nodes are connected with a gigabit crossover. The network itself works > fine even after the sync halts. Sync rate is set to 30M, but I've also got > the same result using 10M. Also, in my configuration DRBD runs on top of a > LVM device. > > Any ideas? > > /Per > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 256 bytes Desc: OpenPGP digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041125/3522495d/attachment.pgp>