[DRBD-user] Initial sync fails with "rate 100M"

Cyril Bouthors cyril at bouthors.org
Wed Nov 24 13:38:41 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I'm getting problems with the initial synchronization. It starts
normally:

drbd0: Secondary/Unknown --> Secondary/Secondary
drbd0: drbd0_receiver [4437]: cstate WFBitMapS --> SyncSource
drbd0: Resync started as SyncSource (need to sync 442081152 KB [110520288 bits set]).

But after ~5-10 minutes, it breaks with several hundreds times the
same message, during ~15 minutes:

drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_worker/4435] sock_sendmsg time expired, ko = 4294967295

Then I get a disconnection:

drbd0: _drbd_send_page: size=4096 len=1936 sent=-110
drbd0: drbd_send_block() failed
drbd0: drbd0_worker [4435]: cstate SyncSource --> NetworkFailure
drbd0: drbd_get_ee interrupted!
drbd0: error receiving RSDataRequest, l: 24!
drbd0: asender terminated
drbd0: worker terminated
drbd0: drbd0_receiver [4437]: cstate NetworkFailure --> Unconnected
drbd0: Connection lost.
drbd0: drbd0_receiver [4437]: cstate Unconnected --> WFConnection
drbd0: drbd0_receiver [4437]: cstate WFConnection --> WFReportParams

And it automatically starts again;

drbd0: Handshake successful: DRBD Network Protocol version 74
drbd0: Connection established.
drbd0: I am(P): 1:00000003:00000009:00000027:00000002:10
drbd0: Peer(S): 0:00000003:00000009:00000026:00000002:01
drbd0: drbd0_receiver [4437]: cstate WFReportParams --> WFBitMapS
drbd0: Primary/Unknown --> Primary/Secondary
drbd0: drbd0_receiver [4437]: cstate WFBitMapS --> SyncSource
drbd0: Resync started as SyncSource (need to sync 419722172 KB [104930543 bits set]).

And again the same hundreds messages after a while:

drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_worker/5853] sock_sendmsg time expired, ko = 4294967295
(...)

It's been restarting over and over at 0% during the last 5 hours
today. :(

I'm using "rate 100M" for this initial synchronization and it looks
like it's not breaking when using "rate 10M" but I haven't waited the
full sync before sending this message. At least it's more robust
because it has been synchronizing for 2 hours and a half, now.

When it's working with "rate 100M", it goes at ~70M/s, maybe it's the
problem... too fast?  Anyone already made a synchronization at that
speed?

The full kernel log with times and everything is available at:

http://cyril.bouthors.org/tmp/kern.log

The configuration is attached here.

I'm using DRBD 0.7.5, Linux 2.4.27 and EXT3.  Both nodes are currently
idle.  The same thing also happens even if the device is not mounted.

The two nodes are directly connected with a cross over gigabit cable
and Realtek NICs.
-- 
Cyril Bouthors
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: drbd.conf
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041124/8207e6fe/attachment.asc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20041124/8207e6fe/attachment.pgp>


More information about the drbd-user mailing list