Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm getting problems with the initial synchronization. It starts normally: drbd0: Secondary/Unknown --> Secondary/Secondary drbd0: drbd0_receiver [4437]: cstate WFBitMapS --> SyncSource drbd0: Resync started as SyncSource (need to sync 442081152 KB [110520288 bits set]). But after ~5-10 minutes, it breaks with several hundreds times the same message, during ~15 minutes: drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295 Then I get a disconnection: drbd0: _drbd_send_page: size=4096 len=1936 sent=-110 drbd0: drbd_send_block() failed drbd0: drbd0_worker [4435]: cstate SyncSource --> NetworkFailure drbd0: drbd_get_ee interrupted! drbd0: error receiving RSDataRequest, l: 24! drbd0: asender terminated drbd0: worker terminated drbd0: drbd0_receiver [4437]: cstate NetworkFailure --> Unconnected drbd0: Connection lost. drbd0: drbd0_receiver [4437]: cstate Unconnected --> WFConnection drbd0: drbd0_receiver [4437]: cstate WFConnection --> WFReportParams And it automatically starts again; drbd0: Handshake successful: DRBD Network Protocol version 74 drbd0: Connection established. drbd0: I am(P): 1:00000003:00000009:00000027:00000002:10 drbd0: Peer(S): 0:00000003:00000009:00000026:00000002:01 drbd0: drbd0_receiver [4437]: cstate WFReportParams --> WFBitMapS drbd0: Primary/Unknown --> Primary/Secondary drbd0: drbd0_receiver [4437]: cstate WFBitMapS --> SyncSource drbd0: Resync started as SyncSource (need to sync 419722172 KB [104930543 bits set]). And again the same hundreds messages after a while: drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295 (...) It's been restarting over and over at 0% during the last 5 hours today. :( I'm using "rate 100M" for this initial synchronization and it looks like it's not breaking when using "rate 10M" but I haven't waited the full sync before sending this message. At least it's more robust because it has been synchronizing for 2 hours and a half, now. When it's working with "rate 100M", it goes at ~70M/s, maybe it's the problem... too fast? Anyone already made a synchronization at that speed? The full kernel log with times and everything is available at: http://cyril.bouthors.org/tmp/kern.log The configuration is attached here. I'm using DRBD 0.7.5, Linux 2.4.27 and EXT3. Both nodes are currently idle. The same thing also happens even if the device is not mounted. The two nodes are directly connected with a cross over gigabit cable and Realtek NICs. -- Cyril Bouthors -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: drbd.conf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041124/8207e6fe/attachment.asc> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 188 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041124/8207e6fe/attachment.pgp>