[DRBD-user] Sync stuck at 100%

Eugene Crosser crosser at rol.ru
Thu Nov 25 14:17:01 CET 2004


I have very similar problem here (SyncSource gets stuck at 100% after 
"invalidate all" on the other node).  Would you tell us your exact 
hardware configuration and kernel version?

Eugene

Per Liden wrote:
> Hi,
> 
> I'm having problems with DRBD getting stuck at around 99-100% during an 
> initial/full sync. This seems to be happening about 8 out of 10 times. If 
> I do "drbdadm down all" on both sides and then "drbdadm up all", both 
> nodes connect just fine and both end up in a consistent state. But for 
> some reason drbd will not by itself detect that the sync has actually 
> completed. This is what it looks like when they get stuck:
> 
> Proc1:~ # cat /proc/drbd 
> version: 0.7.4 (api:76/proto:74)
> SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
>  0: cs:SyncSource st:Primary/Secondary ld:Consistent
>     ns:60558616 nr:0 dw:360 dr:60558461 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0
>         [===================>] sync'ed: 99.6% (248/59387)M
>         finish: 4:45:21 speed: 12 (10,488) K/sec
> 
> Proc2:~ # cat /proc/drbd 
> version: 0.7.4 (api:76/proto:74)
> SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
>  0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
>     ns:0 nr:60558616 dw:60558616 dr:0 al:0 bm:3697 lo:0 pe:0 ua:0 ap:0
>         [===================>] sync'ed:100.0% (0/59139)M
>         finish: 0:00:00 speed: 16 (10,480) K/sec
> 
> 
> /var/log/messages on Proc1:
> ...
> Nov 25 11:07:02 Proc1 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74)
> Nov 25 11:07:02 Proc1 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> Nov 25 11:07:02 Proc1 kernel: drbd: registered as block device major 147
> Nov 25 11:07:02 Proc1 kernel: drbd0: resync bitmap: bits=1251563 words=39112
> Nov 25 11:07:02 Proc1 kernel: drbd0: size = 4888 MB (5006250 KB)
> Nov 25 11:07:02 Proc1 kernel: drbd0: 248 MB marked out-of-sync by on disk bit-map.
> Nov 25 11:07:02 Proc1 kernel: drbd0: Found 4 transactions (64 active extents) in activity log.
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1094]: cstate Unconfigured --> StandAlone
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbdsetup [1096]: cstate StandAlone --> Unconnected
> Nov 25 11:07:02 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate Unconnected --> WFConnection
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFConnection --> WFReportParams
> Nov 25 11:07:03 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> Nov 25 11:07:03 Proc1 kernel: drbd0: resync bitmap: bits=16391168 words=512224
> Nov 25 11:07:03 Proc1 kernel: drbd0: size = 62 GB (65564672 KB)
> Nov 25 11:07:03 Proc1 kernel: drbd0: Connection established.
> Nov 25 11:07:03 Proc1 kernel: drbd0: I am(S): 1:00000005:00000003:00000091:0000004e:00
> Nov 25 11:07:03 Proc1 kernel: drbd0: Peer(S): 1:00000005:00000003:00000090:0000004e:00
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFReportParams --> WFBitMapS
> Nov 25 11:07:03 Proc1 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary
> Nov 25 11:07:03 Proc1 kernel: drbd0: drbd0_receiver [1097]: cstate WFBitMapS --> SyncSource
> Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]).
> Nov 25 11:07:18 Proc1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary
> Nov 25 11:15:38 Proc1 kernel: drbd0: [drbd0_worker/1095] sock_sendmsg time expired, ko = 4294967295
> 
> /var/log/messages on Proc2:
> ...
> Nov 25 11:07:02 Proc2 kernel: drbd: initialised. Version: 0.7.4 (api:76/proto:74)
> Nov 25 11:07:02 Proc2 kernel: drbd: SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
> Nov 25 11:07:02 Proc2 kernel: drbd: registered as block device major 147
> Nov 25 11:07:02 Proc2 kernel: drbd0: resync bitmap: bits=1251563 words=39112
> Nov 25 11:07:02 Proc2 kernel: drbd0: size = 4888 MB (5006250 KB)
> Nov 25 11:07:02 Proc2 kernel: drbd0: 80 KB marked out-of-sync by on disk bit-map.
> Nov 25 11:07:02 Proc2 kernel: drbd0: Found 4 transactions (52 active extents) in activity log.
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1105]: cstate Unconfigured --> StandAlone
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbdsetup [1107]: cstate StandAlone --> Unconnected
> Nov 25 11:07:02 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate Unconnected --> WFConnection
> Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFConnection --> WFReportParams
> Nov 25 11:07:03 Proc2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> Nov 25 11:07:03 Proc2 kernel: drbd0: resync bitmap: bits=16391168 words=512224
> Nov 25 11:07:03 Proc2 kernel: drbd0: size = 62 GB (65564672 KB)
> Nov 25 11:07:03 Proc2 kernel: drbd0: Connection established.
> Nov 25 11:07:03 Proc2 kernel: drbd0: I am(S): 1:00000005:00000003:00000090:0000004e:00
> Nov 25 11:07:03 Proc2 kernel: drbd0: Peer(S): 1:00000005:00000003:00000091:0000004e:00
> Nov 25 11:07:03 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFReportParams --> WFBitMapT
> Nov 25 11:07:03 Proc2 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary
> Nov 25 11:07:04 Proc2 kernel: drbd0: drbd0_receiver [1108]: cstate WFBitMapT --> SyncTarget
> Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]).
> Nov 25 11:07:18 Proc2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary
> 
> 
> Interesting to note is that the nodes seem to have different ideas about 
> how much data needs to be synchronized, i.e.:
>   Nov 25 11:07:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 60812372 KB [15203093 bits set]).
> vs.
>   Nov 25 11:07:04 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 60558500 KB [15139625 bits set]).
> 
> The nodes are connected with a gigabit crossover. The network itself works 
> fine even after the sync halts. Sync rate is set to 30M, but I've also got 
> the same result using 10M. Also, in my configuration DRBD runs on top of a 
> LVM device.
> 
> Any ideas?
> 
> /Per
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20041125/3522495d/attachment.pgp 


More information about the drbd-user mailing list