Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We have a problem with drbd randomly disconnecting a volume after a few days of syncronised operation. We run other 5 volumes on the same server however the traffic on the other volumes is significantly lower. We got the following errors on the servers: === Server 1 Syslog === Feb 11 22:00:34 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg time expired, ko = 3 Feb 11 22:00:39 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg time expired, ko = 2 Feb 11 22:00:44 mailstore1 kernel: drbd2: [pdflush/170] sock_sendmsg time expired, ko = 1 Feb 11 22:00:49 mailstore1 kernel: drbd2: /var/tmp/bach-build/BUILD/drbd-0.7.21/drbd/drbd_main.c:1095: Connected flags=0x120a Feb 11 22:00:49 mailstore1 kernel: drbd2: pdflush [170]: cstate Connected --> NetworkFailure == Server 1 dmesg == drbd2: drbd2_receiver [23259]: cstate NetworkFailure --> BrokenPipe drbd2: short read expecting header on sock: r=-512 drbd2: worker terminated drbd2: asender terminated drbd2: drbd2_receiver [23259]: cstate BrokenPipe --> Unconnected drbd2: Connection lost. drbd2: drbd2_receiver [23259]: cstate Unconnected --> StandAlone === Server 2 Syslog === Feb 11 22:00:49 mailstore2 kernel: drbd2: meta connection shut down by peer. Feb 11 22:00:59 mailstore2 kernel: drbd2: drbd2_asender [2672]: cstate Connected --> NetworkFailure Feb 11 22:00:59 mailstore2 kernel: drbd2: asender terminated Feb 11 22:00:59 mailstore2 kernel: drbd2: short sent BarrierAck size=16 sent=-1001 Feb 11 22:00:59 mailstore2 kernel: drbd2: error receiving Barrier, l: 8! Feb 11 22:01:00 mailstore2 kernel: drbd2: worker terminated Feb 11 22:01:00 mailstore2 kernel: drbd2: unacked_cnt = 59 Feb 11 22:01:00 mailstore2 kernel: drbd2: drbd2_receiver [2570]: cstate NetworkFailure --> Unconnected Feb 11 22:01:00 mailstore2 kernel: drbd2: Connection lost. == Server 2 dmesg == drbd2: meta connection shut down by peer. drbd2: drbd2_asender [2672]: cstate Connected --> NetworkFailure drbd2: asender terminated drbd2: short sent BarrierAck size=16 sent=-1001 drbd2: error receiving Barrier, l: 8! drbd2: worker terminated drbd2: unacked_cnt = 59 drbd2: drbd2_receiver [2570]: cstate NetworkFailure --> Unconnected drbd2: Connection lost. drbd2: drbd2_receiver [2570]: cstate Unconnected --> StandAlone drbd2: receiver terminated The servers are connected to a dedicated via gigabit to a dedicated VLAN on a Cisco 2960G switch. We noticed a number of errors on the drbd interface: Server 1 RX packets:938278799 errors:6 dropped:22308 overruns:0 frame:3 TX packets:999591802 errors:0 dropped:0 overruns:0 carrier:0 Server 2 RX packets:545215102 errors:2 dropped:7795 overruns:0 frame:1 TX packets:419240487 errors:0 dropped:0 overruns:0 carrier:0 Distribution: Fedora 4 Linux Version: 2.6.17-1.2142_FC4smp Version: 0.7.21 (api:79/proto:74)