[DRBD-user] Sync speed ok, then drop then stop

Seki Lau sekilau at gmail.com
Fri Nov 5 11:54:22 CET 2004


Dear All,
I am using Kernel 2.6.8 and 2.6.9 . Ininitally, I set up the primary
machine and request the secondary machine to performance a full sync.
It was ok on most of the /dev/drbdX disk.
But on the last disk, it has problem. When the drbd daemon was just
started on the secondary machine, they speed of Sync is ok. I limited
the sync rate to 10MBytes/s and initially, it can have 10,500KB/s. As
time goes by, it drops, from 10,xxx to 6,xxx then 4,xxx. I checked the
reading of ns , nr, dw and dr. In fact, they do not increase when the
speed drop. I use smartctl (smartd tools) to check the health of the
harddisk, all disks in both machines are ok. I check the network cable
by ttcp, although the speed can't be 700Mbit/s , but it can still more
than 100Mbit/s.
I check the /var/log/messages and no error message is given. I unmount
all the share disk on the primary and the umount actions are success.
but it does not help to heal the problem.

When I leave the machines for a certain periods of time (tea break), I
find the following messages in secondary /var/log/messages

Nov  5 18:44:25 Slave kernel: drbd4: meta connection shut down by peer.
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_asender [5107]: cstate
SyncTarget --> NetworkFailure
Nov  5 18:44:25 Slave kernel: drbd4: asender terminated
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
NetworkFailure --> BrokenPipe
Nov  5 18:44:25 Slave kernel: drbd4: short read receiving data block:
read 2672 expected 4096
Nov  5 18:44:25 Slave kernel: drbd4: error receiving RSDataReply, l: 4112!
Nov  5 18:44:25 Slave kernel: drbd4: worker terminated
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
BrokenPipe --> Unconnected
Nov  5 18:44:25 Slave kernel: drbd4: Connection lost.
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
Unconnected --> WFConnection
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
WFConnection --> WFReportParams
Nov  5 18:44:25 Slave kernel: drbd4: Handshake successful: DRBD
Network Protocol version 74
Nov  5 18:44:25 Slave kernel: drbd4: Connection established.
Nov  5 18:44:25 Slave kernel: drbd4: I am(S):
0:00000017:00000015:00000062:0000001b:01
Nov  5 18:44:25 Slave kernel: drbd4: Peer(P):
1:00000017:00000015:00000063:0000001b:10
Nov  5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
WFReportParams --> WFBitMapT
Nov  5 18:44:25 Slave kernel: drbd4: Secondary/Unknown --> Secondary/Primary
Nov  5 18:44:26 Slave kernel: drbd4: drbd4_receiver [5106]: cstate
WFBitMapT --> SyncTarget
Nov  5 18:44:26 Slave kernel: drbd4: Resync started as SyncTarget
(need to sync 101777592 KB [25444398 bits set]).

Then I check the primary machine, I get the following

Nov  5 18:44:25 Master kernel: drbd4: sock_recvmsg returned -110
Nov  5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate
SyncSource --> BrokenPipe
Nov  5 18:44:25 Master kernel: drbd4: short read expecting header on
sock: r=-110
Nov  5 18:44:25 Master kernel: drbd4: worker terminated
Nov  5 18:44:25 Master kernel: drbd4: asender terminated
Nov  5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate
BrokenPipe --> Unconnected
Nov  5 18:44:25 Master kernel: drbd4: Connection lost.
Nov  5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate
Unconnected --> WFConnection
Nov  5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate
WFConnection --> WFReportParams
Nov  5 18:44:25 Master kernel: drbd4: Handshake successful: DRBD
Network Protocol version 74
Nov  5 18:44:25 Master kernel: drbd4: Connection established.
Nov  5 18:44:25 Master kernel: drbd4: I am(P):
1:00000017:00000015:00000063:0000001b:10
Nov  5 18:44:25 Master kernel: drbd4: Peer(S):
0:00000017:00000015:00000062:0000001b:01
Nov  5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate
WFReportParams --> WFBitMapS
Nov  5 18:44:25 Master kernel: drbd4: Primary/Unknown --> Primary/Secondary
Nov  5 18:44:26 Master kernel: drbd4: drbd4_receiver [2451]: cstate
WFBitMapS --> SyncSource
Nov  5 18:44:26 Master kernel: drbd4: Resync started as SyncSource
(need to sync 101777592 KB [25444398 bits set]).

Can anyone give me some light to heal this problem?
Thanks in prior,
Seki Lau



More information about the drbd-user mailing list