Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear all,
I have a drbd device (i name it /dev/drbd3) . It is ok for me to
perform small size file copy, delete on it. But when we try to tar/cp
large volumn data(few Gbyte on this /dev/drbd3, the action stop in the
middle.
I check the network connection by iptraf and i see that the traffice
on the dedicate NIC is less then 3kbps during the stop periods
I try to type 'cat /proc/drbd' , nothing unusual appear.
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by root at Master, 2004-10-20 18:44:52
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:12235516 nr:0 dw:12235516 dr:6333445 al:2929 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Connected st:Primary/Secondary ld:Consistent
ns:168 nr:0 dw:168 dr:225 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Primary/Secondary ld:Consistent
ns:192 nr:0 dw:272 dr:537 al:0 bm:5 lo:0 pe:0 ua:0 ap:0
3: cs:Connected st:Primary/Secondary ld:Consistent
ns:58122621 nr:0 dw:58106173 dr:25682705 al:18126 bm:79 lo:0
pe:185 ua:0 ap:184
4: cs:Connected st:Primary/Secondary ld:Consistent
ns:364 nr:0 dw:364 dr:29877389 al:11 bm:1 lo:0 pe:0 ua:0 ap:0
[root at Master share4]#
Then, i try to use the command 'drbdadm down r3' and 'drbdadm up r3'
on the secondary computer. the tar/cp action back to nornal for a few
files (each file is more than 600mb size). After a couple of minutes,
the tar/cp action pause again.
I try to put the machine aside and observe what will it be if i don't
take action at all. After a few hours, I get the follow from the
primary machine:
Feb 3 21:12:59 Master kernel: drbd3: sock_recvmsg returned -110
Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
Connected --> BrokenPipe
Feb 3 21:12:59 Master kernel: drbd3: short read expecting header on
sock: r=-110
Feb 3 21:12:59 Master kernel: drbd3: worker terminated
Feb 3 21:12:59 Master kernel: drbd3: asender terminated
Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
BrokenPipe --> Unconnected
Feb 3 21:12:59 Master kernel: drbd3: Connection lost.
Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
Unconnected --> WFConnection
Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFConnection --> WFReportParams
Feb 3 21:12:59 Master kernel: drbd3: Handshake successful: DRBD
Network Protocol version 74
Feb 3 21:12:59 Master kernel: drbd3: Connection established.
Feb 3 21:12:59 Master kernel: drbd3: I am(P):
1:0000002a:00000016:000000b1:00000019:10
Feb 3 21:12:59 Master kernel: drbd3: Peer(S):
1:0000002a:00000016:000000b0:00000019:01
Feb 3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFReportParams --> WFBitMapS
Feb 3 21:12:59 Master kernel: drbd3: Primary/Unknown --> Primary/Secondary
Feb 3 21:13:00 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFBitMapS --> SyncSource
Feb 3 21:13:00 Master kernel: drbd3: Resync started as SyncSource
(need to sync 4136 KB [1034 bits set]).
Feb 3 21:13:00 Master kernel: drbd3: Resync done (total 1 sec; paused
0 sec; 4136 K/sec)
Feb 3 21:13:00 Master kernel: drbd3: drbd3_worker [20765]: cstate
SyncSource --> Connected
I guest it take a long time for the primary machine to give up the tcp
connection and reconnect the secondary (or the secondary take a long
time to give up the existing connection, i dont sure who is the
initatior ).
I am looking for help...
Here are some information of my setting
drbd.conf (both machines)
resource r3 {
protocol C;
# inittimeout=-0;
incon-degr-cmd "echo '!DRBD! pri on incon-degr'";
startup {
degr-wfc-timeout 20; # 2 minutes.
wfc-timeout 20;
}
disk { on-io-error detach; }
net{
timeout 90;
ping-int 30;
connect-int 10;
sndbuf-size 50M;
max-buffers 5000;
max-epoch-size 5000;
}
syncer { rate 11M; group 1; al-extents 257; }
on Master {
device /dev/drbd3;
disk /dev/hdg;
address 10.0.0.1:50003;
meta-disk internal;
}
on Slave {
device /dev/drbd3;
disk /dev/hdg;
address 10.0.0.2:50003;
meta-disk internal;
}
}
drbd version 0.7.5
dedicated network card (for sync ) : gigabit nic
Thanks in prior
Seki