Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear All, I am using Kernel 2.6.8 and 2.6.9 . Ininitally, I set up the primary machine and request the secondary machine to performance a full sync. It was ok on most of the /dev/drbdX disk. But on the last disk, it has problem. When the drbd daemon was just started on the secondary machine, they speed of Sync is ok. I limited the sync rate to 10MBytes/s and initially, it can have 10,500KB/s. As time goes by, it drops, from 10,xxx to 6,xxx then 4,xxx. I checked the reading of ns , nr, dw and dr. In fact, they do not increase when the speed drop. I use smartctl (smartd tools) to check the health of the harddisk, all disks in both machines are ok. I check the network cable by ttcp, although the speed can't be 700Mbit/s , but it can still more than 100Mbit/s. I check the /var/log/messages and no error message is given. I unmount all the share disk on the primary and the umount actions are success. but it does not help to heal the problem. When I leave the machines for a certain periods of time (tea break), I find the following messages in secondary /var/log/messages Nov 5 18:44:25 Slave kernel: drbd4: meta connection shut down by peer. Nov 5 18:44:25 Slave kernel: drbd4: drbd4_asender [5107]: cstate SyncTarget --> NetworkFailure Nov 5 18:44:25 Slave kernel: drbd4: asender terminated Nov 5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate NetworkFailure --> BrokenPipe Nov 5 18:44:25 Slave kernel: drbd4: short read receiving data block: read 2672 expected 4096 Nov 5 18:44:25 Slave kernel: drbd4: error receiving RSDataReply, l: 4112! Nov 5 18:44:25 Slave kernel: drbd4: worker terminated Nov 5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate BrokenPipe --> Unconnected Nov 5 18:44:25 Slave kernel: drbd4: Connection lost. Nov 5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate Unconnected --> WFConnection Nov 5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate WFConnection --> WFReportParams Nov 5 18:44:25 Slave kernel: drbd4: Handshake successful: DRBD Network Protocol version 74 Nov 5 18:44:25 Slave kernel: drbd4: Connection established. Nov 5 18:44:25 Slave kernel: drbd4: I am(S): 0:00000017:00000015:00000062:0000001b:01 Nov 5 18:44:25 Slave kernel: drbd4: Peer(P): 1:00000017:00000015:00000063:0000001b:10 Nov 5 18:44:25 Slave kernel: drbd4: drbd4_receiver [5106]: cstate WFReportParams --> WFBitMapT Nov 5 18:44:25 Slave kernel: drbd4: Secondary/Unknown --> Secondary/Primary Nov 5 18:44:26 Slave kernel: drbd4: drbd4_receiver [5106]: cstate WFBitMapT --> SyncTarget Nov 5 18:44:26 Slave kernel: drbd4: Resync started as SyncTarget (need to sync 101777592 KB [25444398 bits set]). Then I check the primary machine, I get the following Nov 5 18:44:25 Master kernel: drbd4: sock_recvmsg returned -110 Nov 5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate SyncSource --> BrokenPipe Nov 5 18:44:25 Master kernel: drbd4: short read expecting header on sock: r=-110 Nov 5 18:44:25 Master kernel: drbd4: worker terminated Nov 5 18:44:25 Master kernel: drbd4: asender terminated Nov 5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate BrokenPipe --> Unconnected Nov 5 18:44:25 Master kernel: drbd4: Connection lost. Nov 5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate Unconnected --> WFConnection Nov 5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate WFConnection --> WFReportParams Nov 5 18:44:25 Master kernel: drbd4: Handshake successful: DRBD Network Protocol version 74 Nov 5 18:44:25 Master kernel: drbd4: Connection established. Nov 5 18:44:25 Master kernel: drbd4: I am(P): 1:00000017:00000015:00000063:0000001b:10 Nov 5 18:44:25 Master kernel: drbd4: Peer(S): 0:00000017:00000015:00000062:0000001b:01 Nov 5 18:44:25 Master kernel: drbd4: drbd4_receiver [2451]: cstate WFReportParams --> WFBitMapS Nov 5 18:44:25 Master kernel: drbd4: Primary/Unknown --> Primary/Secondary Nov 5 18:44:26 Master kernel: drbd4: drbd4_receiver [2451]: cstate WFBitMapS --> SyncSource Nov 5 18:44:26 Master kernel: drbd4: Resync started as SyncSource (need to sync 101777592 KB [25444398 bits set]). Can anyone give me some light to heal this problem? Thanks in prior, Seki Lau