[DRBD-user] DRBD disconnects with error: short read error -512

Gilbert Cassar gilbert.cassar at um.edu.mt
Tue Feb 12 14:39:37 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


We have a problem with drbd randomly disconnecting a volume after a few 
days of syncronised operation. We run other 5 volumes on the same server 
however the traffic on the other volumes is significantly lower.

We got the following errors on the servers:
=== Server 1 Syslog ===
Feb 11 22:00:34 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg 
time expired, ko = 3
Feb 11 22:00:39 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg 
time expired, ko = 2
Feb 11 22:00:44 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg 
time expired, ko = 1
Feb 11 22:00:49 mailstore1 kernel: drbd2: 
/var/tmp/bach-build/BUILD/drbd-0.7.21/drbd/drbd_main.c:1095: Connected 
flags=0x120a
Feb 11 22:00:49 mailstore1 kernel: drbd2: pdflush [170]: cstate 
Connected --> NetworkFailure

== Server 1 dmesg ==
drbd2: drbd2_receiver [23259]: cstate NetworkFailure --> BrokenPipe
drbd2: short read expecting header on sock: r=-512
drbd2: worker terminated
drbd2: asender terminated
drbd2: drbd2_receiver [23259]: cstate BrokenPipe --> Unconnected
drbd2: Connection lost.
drbd2: drbd2_receiver [23259]: cstate Unconnected --> StandAlone

=== Server 2 Syslog ===
Feb 11 22:00:49 mailstore2 kernel: drbd2: meta connection shut down by 
peer.
Feb 11 22:00:59 mailstore2 kernel: drbd2: drbd2_asender [2672]: cstate 
Connected --> NetworkFailure
Feb 11 22:00:59 mailstore2 kernel: drbd2: asender terminated
Feb 11 22:00:59 mailstore2 kernel: drbd2: short sent BarrierAck size=16 
sent=-1001
Feb 11 22:00:59 mailstore2 kernel: drbd2: error receiving Barrier, l: 8!
Feb 11 22:01:00 mailstore2 kernel: drbd2: worker terminated
Feb 11 22:01:00 mailstore2 kernel: drbd2: unacked_cnt = 59
Feb 11 22:01:00 mailstore2 kernel: drbd2: drbd2_receiver [2570]: cstate 
NetworkFailure --> Unconnected
Feb 11 22:01:00 mailstore2 kernel: drbd2: Connection lost.

== Server 2 dmesg ==
drbd2: meta connection shut down by peer.
drbd2: drbd2_asender [2672]: cstate Connected --> NetworkFailure
drbd2: asender terminated
drbd2: short sent BarrierAck size=16 sent=-1001
drbd2: error receiving Barrier, l: 8!
drbd2: worker terminated
drbd2: unacked_cnt = 59
drbd2: drbd2_receiver [2570]: cstate NetworkFailure --> Unconnected
drbd2: Connection lost.
drbd2: drbd2_receiver [2570]: cstate Unconnected --> StandAlone
drbd2: receiver terminated

The servers are connected to a dedicated via gigabit to a dedicated VLAN 
on a Cisco 2960G switch.

We noticed a number of errors on the drbd interface:
Server 1
         RX packets:938278799 errors:6 dropped:22308 overruns:0 frame:3
         TX packets:999591802 errors:0 dropped:0 overruns:0 carrier:0

Server 2
       RX packets:545215102 errors:2 dropped:7795 overruns:0 frame:1
       TX packets:419240487 errors:0 dropped:0 overruns:0 carrier:0

Distribution: Fedora 4
Linux Version: 2.6.17-1.2142_FC4smp
Version: 0.7.21 (api:79/proto:74)




More information about the drbd-user mailing list