[DRBD-user] Broken connection when drbd starts the sync

Zachár Balázs zachar at direkt-kfki.hu
Sat Dec 9 09:47:08 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Daniel,

I made some "harder" test with iperf and I found some problem:
Node a:
iperf -s 2>&1

Node b:
iperf -c 10.0.0.1 -d -P10 -t 30 2>&1    (-d dualtest send and receive 
same time, -P10 parallel 10 session )

A part of out on node a was:

connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
connect failed: Connection refused
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
write2 failed: Broken pipe
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 31] local 0.0.0.0 port 39461 connected with 10.0.0.2 port 5001
[ 31]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 18] local 0.0.0.0 port 39449 connected with 10.0.0.2 port 5001
[ 18]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[  6] local 0.0.0.0 port 39448 connected with 10.0.0.2 port 5001
[  6]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 19] local 0.0.0.0 port 39450 connected with 10.0.0.2 port 5001
[ 19]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 21] local 0.0.0.0 port 39451 connected with 10.0.0.2 port 5001
[ 21]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 22] local 0.0.0.0 port 39452 connected with 10.0.0.2 port 5001
[ 22]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 23] local 0.0.0.0 port 39453 connected with 10.0.0.2 port 5001
[ 23]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 24] local 0.0.0.0 port 39454 connected with 10.0.0.2 port 5001
[ 24]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 26] local 0.0.0.0 port 39455 connected with 10.0.0.2 port 5001
[ 26]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[ 29] local 0.0.0.0 port 39457 connected with 10.0.0.2 port 5001
[ 29]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec
[SUM]  0.0- 0.0 sec  0.00 Bytes  0.00 bits/sec

I change the kernel for the default Suse kernel, and there was the same 
out. :( Both kernel use the same driver for my network card...

I use protocol C for the sync...

Now I will search for the problem...

Thanks,
Balázs

Daniel van Ham Colchete írta:
> Balázs,
>
> looking at DRBD's code I found that the 'magic??' line means that the
> secondary node received an incorrect packet, the packet's header is
> incorrect (invalid magic code). Think you magic as the "first bytes of
> something".
>
> Try the following (in that order):
> 1 - Disable jumbo packets if enabled. If not, try a small MTU setting
> on both nodes. If it doesn't solve the issue, them enable the jumbo
> packets, try putting a MTU of 5000 bytes. Playing with the MTU setting
> helps to understand why would you have incorrect packets in your
> traffic. It may be an ethernet problem, them setting a small MTU helps
> to solve it. It also might be a fragmentation problem, them increasing
> the MTU helps to solve it.
>
> 2 - Upgrade DRBD to 0.7.22. Everything before it has a serious bug
> with the A and B protocols that might result in data loss.
>
> At least you know where the problem is: the "magic??" matter..
>
> Best regards,
> Daniel Colchete
>





More information about the drbd-user mailing list