Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear all, I have a drbd device (i name it /dev/drbd3) . It is ok for me to perform small size file copy, delete on it. But when we try to tar/cp large volumn data(few Gbyte on this /dev/drbd3, the action stop in the middle. I check the network connection by iptraf and i see that the traffice on the dedicate NIC is less then 3kbps during the stop periods I try to type 'cat /proc/drbd' , nothing unusual appear. version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root at Master, 2004-10-20 18:44:52 0: cs:Connected st:Primary/Secondary ld:Consistent ns:12235516 nr:0 dw:12235516 dr:6333445 al:2929 bm:1 lo:0 pe:0 ua:0 ap:0 1: cs:Connected st:Primary/Secondary ld:Consistent ns:168 nr:0 dw:168 dr:225 al:0 bm:1 lo:0 pe:0 ua:0 ap:0 2: cs:Connected st:Primary/Secondary ld:Consistent ns:192 nr:0 dw:272 dr:537 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 3: cs:Connected st:Primary/Secondary ld:Consistent ns:58122621 nr:0 dw:58106173 dr:25682705 al:18126 bm:79 lo:0 pe:185 ua:0 ap:184 4: cs:Connected st:Primary/Secondary ld:Consistent ns:364 nr:0 dw:364 dr:29877389 al:11 bm:1 lo:0 pe:0 ua:0 ap:0 [root at Master share4]# Then, i try to use the command 'drbdadm down r3' and 'drbdadm up r3' on the secondary computer. the tar/cp action back to nornal for a few files (each file is more than 600mb size). After a couple of minutes, the tar/cp action pause again. I try to put the machine aside and observe what will it be if i don't take action at all. After a few hours, I get the follow from the primary machine: Feb 3 21:12:59 Master kernel: drbd3: sock_recvmsg returned -110 Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate Connected --> BrokenPipe Feb 3 21:12:59 Master kernel: drbd3: short read expecting header on sock: r=-110 Feb 3 21:12:59 Master kernel: drbd3: worker terminated Feb 3 21:12:59 Master kernel: drbd3: asender terminated Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate BrokenPipe --> Unconnected Feb 3 21:12:59 Master kernel: drbd3: Connection lost. Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate Unconnected --> WFConnection Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate WFConnection --> WFReportParams Feb 3 21:12:59 Master kernel: drbd3: Handshake successful: DRBD Network Protocol version 74 Feb 3 21:12:59 Master kernel: drbd3: Connection established. Feb 3 21:12:59 Master kernel: drbd3: I am(P): 1:0000002a:00000016:000000b1:00000019:10 Feb 3 21:12:59 Master kernel: drbd3: Peer(S): 1:0000002a:00000016:000000b0:00000019:01 Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate WFReportParams --> WFBitMapS Feb 3 21:12:59 Master kernel: drbd3: Primary/Unknown --> Primary/Secondary Feb 3 21:13:00 Master kernel: drbd3: drbd3_receiver [2415]: cstate WFBitMapS --> SyncSource Feb 3 21:13:00 Master kernel: drbd3: Resync started as SyncSource (need to sync 4136 KB [1034 bits set]). Feb 3 21:13:00 Master kernel: drbd3: Resync done (total 1 sec; paused 0 sec; 4136 K/sec) Feb 3 21:13:00 Master kernel: drbd3: drbd3_worker [20765]: cstate SyncSource --> Connected I guest it take a long time for the primary machine to give up the tcp connection and reconnect the secondary (or the secondary take a long time to give up the existing connection, i dont sure who is the initatior ). I am looking for help... Here are some information of my setting drbd.conf (both machines) resource r3 { protocol C; # inittimeout=-0; incon-degr-cmd "echo '!DRBD! pri on incon-degr'"; startup { degr-wfc-timeout 20; # 2 minutes. wfc-timeout 20; } disk { on-io-error detach; } net{ timeout 90; ping-int 30; connect-int 10; sndbuf-size 50M; max-buffers 5000; max-epoch-size 5000; } syncer { rate 11M; group 1; al-extents 257; } on Master { device /dev/drbd3; disk /dev/hdg; address 10.0.0.1:50003; meta-disk internal; } on Slave { device /dev/drbd3; disk /dev/hdg; address 10.0.0.2:50003; meta-disk internal; } } drbd version 0.7.5 dedicated network card (for sync ) : gigabit nic Thanks in prior Seki