[DRBD-user] Sync stuck at 100%
Eugene Crosser
crosser at rol.ru
Thu Nov 25 14:17:01 CET 2004
I have very similar problem here (SyncSource gets stuck at 100% after
"invalidate all" on the other node). Would you tell us your exact
hardware configuration and kernel version?
Eugene
Per Liden wrote:
> Hi,
>
> I'm having problems with DRBD getting stuck at around 99-100% during an
> initial/full sync. This seems to be happening about 8 out of 10 times. If
> I do "drbdadm down all" on both sides and then "drbdadm up all", both
> nodes connect just fine and both end up in a consistent state. But for
> some reason drbd will not by itself detect that the sync has actually
> completed. This is what it looks like when they get stuck:
>
> Proc1:~ # cat /proc/drbd
> version: 0.7.4 (api:76/proto:74)
> SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> 0: cs:SyncSource st:Primary/Secondary ld:Consistent
> ns:60558616 nr:0 dw:360 dr:60558461 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0
> [===================>] sync'ed: 99.6% (248/59387)M
> finish: 4:45:21 speed: 12 (10,488) K/sec
>
> Proc2:~ # cat /proc/drbd
> version: 0.7.4 (api:76/proto:74)
> SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
> ns:0 nr:60558616 dw:60558616 dr:0 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0
> [===================>] sync'ed:100.0% (0/59139)M
> finish: 0:00:00 speed: 16 (10,480) K/sec
>
>
> /var/log/messages on Proc1:
> ...
> Nov 25 11:07:02 Proc1 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74)
> Nov 25 11:07:02 Proc1 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> Nov 25 11:07:02 Proc1 kernel: drbd: registered as block device major 147
> Nov 25 11:07:02 Proc1 kernel: drbd0: resync bitmap: bits=1251563 words=39112
> Nov 25 11:07:02 Proc1 kernel: drbd0: size = 4888 MB (5006250 KB)
> Nov 25 11:07:02 Proc1 kernel: drbd0: 248 MB marked out-of-sync by on disk bit-map.
> Nov 25 11:07:02 Proc1 kernel: drbd0: Found 4 transactions (64 active extents) in activity log.
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1094]: cstate Unconfigured --> StandAlone
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1096]: cstate StandAlone --> Unconnected
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate Unconnected --> WFConnection
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFConnection --> WFReportParams
> Nov 25 11:07:03 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> Nov 25 11:07:03 Proc1 kernel: drbd0: resync bitmap: bits=16391168 words=512224
> Nov 25 11:07:03 Proc1 kernel: drbd0: size = 62 GB (65564672 KB)
> Nov 25 11:07:03 Proc1 kernel: drbd0: Connection established.
> Nov 25 11:07:03 Proc1 kernel: drbd0: I am(S): 1:00000005:00000003:00000091:0000004e:00
> Nov 25 11:07:03 Proc1 kernel: drbd0: Peer(S): 1:00000005:00000003:00000090:0000004e:00
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFReportParams --> WFBitMapS
> Nov 25 11:07:03 Proc1 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFBitMapS --> SyncSource
> Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]).
> Nov 25 11:07:18 Proc1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary
> Nov 25 11:15:38 Proc1 kernel: drbd0: [drbd0_worker/1095] sock_sendmsg time expired, ko = 4294967295
>
> /var/log/messages on Proc2:
> ...
> Nov 25 11:07:02 Proc2 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74)
> Nov 25 11:07:02 Proc2 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> Nov 25 11:07:02 Proc2 kernel: drbd: registered as block device major 147
> Nov 25 11:07:02 Proc2 kernel: drbd0: resync bitmap: bits=1251563 words=39112
> Nov 25 11:07:02 Proc2 kernel: drbd0: size = 4888 MB (5006250 KB)
> Nov 25 11:07:02 Proc2 kernel: drbd0: 80 KB marked out-of-sync by on disk bit-map.
> Nov 25 11:07:02 Proc2 kernel: drbd0: Found 4 transactions (52 active extents) in activity log.
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1105]: cstate Unconfigured --> StandAlone
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1107]: cstate StandAlone --> Unconnected
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate Unconnected --> WFConnection
> Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFConnection --> WFReportParams
> Nov 25 11:07:03 Proc2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> Nov 25 11:07:03 Proc2 kernel: drbd0: resync bitmap: bits=16391168 words=512224
> Nov 25 11:07:03 Proc2 kernel: drbd0: size = 62 GB (65564672 KB)
> Nov 25 11:07:03 Proc2 kernel: drbd0: Connection established.
> Nov 25 11:07:03 Proc2 kernel: drbd0: I am(S): 1:00000005:00000003:00000090:0000004e:00
> Nov 25 11:07:03 Proc2 kernel: drbd0: Peer(S): 1:00000005:00000003:00000091:0000004e:00
> Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFReportParams --> WFBitMapT
> Nov 25 11:07:03 Proc2 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary
> Nov 25 11:07:04 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFBitMapT --> SyncTarget
> Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]).
> Nov 25 11:07:18 Proc2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary
>
>
> Interesting to note is that the nodes seem to have different ideas about
> how much data needs to be synchronized, i.e.:
> Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]).
> vs.
> Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]).
>
> The nodes are connected with a gigabit crossover. The network itself works
> fine even after the sync halts. Sync rate is set to 30M, but I've also got
> the same result using 10M. Also, in my configuration DRBD runs on top of a
> LVM device.
>
> Any ideas?
>
> /Per
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20041125/3522495d/attachment.pgp
More information about the drbd-user
mailing list