[DRBD-user] Please Help ! cp , tar action stop (in the middle ) on drbd device

Seki Lau sekilau at gmail.com
Thu Feb 3 17:42:06 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Dear all,
I have a drbd device (i name it /dev/drbd3) . It is ok for me to
perform small size file copy, delete on it. But when we try to tar/cp
large volumn data(few Gbyte on this /dev/drbd3, the action stop in the
middle.

I check the network connection by iptraf and i see that the traffice
on the dedicate NIC is less then 3kbps during the stop periods

I try to type 'cat /proc/drbd' , nothing unusual appear.
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by root at Master, 2004-10-20 18:44:52
 0: cs:Connected st:Primary/Secondary ld:Consistent
    ns:12235516 nr:0 dw:12235516 dr:6333445 al:2929 bm:1 lo:0 pe:0 ua:0 ap:0
 1: cs:Connected st:Primary/Secondary ld:Consistent
    ns:168 nr:0 dw:168 dr:225 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
 2: cs:Connected st:Primary/Secondary ld:Consistent
    ns:192 nr:0 dw:272 dr:537 al:0 bm:5 lo:0 pe:0 ua:0 ap:0
 3: cs:Connected st:Primary/Secondary ld:Consistent
    ns:58122621 nr:0 dw:58106173 dr:25682705 al:18126 bm:79 lo:0
pe:185 ua:0 ap:184
 4: cs:Connected st:Primary/Secondary ld:Consistent
    ns:364 nr:0 dw:364 dr:29877389 al:11 bm:1 lo:0 pe:0 ua:0 ap:0
[root at Master share4]# 


Then, i try to use the command 'drbdadm down r3' and 'drbdadm up r3'
on the secondary computer. the tar/cp action back to nornal for a few
files (each file is more than 600mb size). After a couple of minutes,
the tar/cp action pause again.

I try to put the machine aside and observe what will it be if i don't
take action at all. After a few hours, I get the follow from the
primary machine:


Feb  3 21:12:59 Master kernel: drbd3: sock_recvmsg returned -110
Feb  3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
Connected --> BrokenPipe
Feb  3 21:12:59 Master kernel: drbd3: short read expecting header on
sock: r=-110
Feb  3 21:12:59 Master kernel: drbd3: worker terminated
Feb  3 21:12:59 Master kernel: drbd3: asender terminated
Feb  3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
BrokenPipe --> Unconnected
Feb  3 21:12:59 Master kernel: drbd3: Connection lost.
Feb  3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
Unconnected --> WFConnection
Feb  3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFConnection --> WFReportParams
Feb  3 21:12:59 Master kernel: drbd3: Handshake successful: DRBD
Network Protocol version 74
Feb  3 21:12:59 Master kernel: drbd3: Connection established.
Feb  3 21:12:59 Master kernel: drbd3: I am(P):
1:0000002a:00000016:000000b1:00000019:10
Feb  3 21:12:59 Master kernel: drbd3: Peer(S):
1:0000002a:00000016:000000b0:00000019:01
Feb  3 21:12:59 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFReportParams --> WFBitMapS
Feb  3 21:12:59 Master kernel: drbd3: Primary/Unknown --> Primary/Secondary
Feb  3 21:13:00 Master kernel: drbd3: drbd3_receiver [2415]: cstate
WFBitMapS --> SyncSource
Feb  3 21:13:00 Master kernel: drbd3: Resync started as SyncSource
(need to sync 4136 KB [1034 bits set]).
Feb  3 21:13:00 Master kernel: drbd3: Resync done (total 1 sec; paused
0 sec; 4136 K/sec)
Feb  3 21:13:00 Master kernel: drbd3: drbd3_worker [20765]: cstate
SyncSource --> Connected

I guest it take a long time for the primary machine to give up the tcp
connection and reconnect the secondary (or the secondary take a long
time to give up the existing connection, i dont sure who is the
initatior ).

I am looking for help...
Here are some information of my setting


drbd.conf (both machines)

resource r3 {
    
  protocol C;
#  inittimeout=-0;
 
  incon-degr-cmd "echo '!DRBD! pri on incon-degr'";

  startup {
    degr-wfc-timeout 20;    # 2 minutes.
        wfc-timeout 20;
        }
  
  disk { on-io-error   detach;  }
  
net{
        timeout 90;
        ping-int 30;
        connect-int 10;
        sndbuf-size 50M;
        max-buffers 5000;
        max-epoch-size 5000;
}
  syncer {    rate 11M;    group 1;    al-extents 257;  }
        
  on Master {
    device     /dev/drbd3;
    disk       /dev/hdg;
    address    10.0.0.1:50003;
    meta-disk  internal;
  }
   
  on Slave {
    device    /dev/drbd3;
    disk      /dev/hdg;
    address   10.0.0.2:50003;
    meta-disk internal;
  }
}

drbd version 0.7.5
dedicated network card (for sync )  : gigabit nic


Thanks in prior
Seki



More information about the drbd-user mailing list