Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
hello
i have strange problem with drbd
i have 2 boxes with 2.6.13.4-grsec kernel and drbd 0.7.14
after network failure these nodes disconnect and after network is again online
primary node stays frozen in cs:WFReportParams
then after some time i execute on primary node "drbdadm disconnect r0":
root at mrwolf:~adm# drbdadm disconnect r0
Child process does not terminate!
Exiting.
in kernel log i have:
Nov 15 17:11:49 mrwolf drbd0: drbdsetup [17855]: cstate WFReportParams --> Unconnected
Nov 15 17:11:49 mrwolf drbd0: interrupted during initial handshake
Nov 15 17:11:49 mrwolf drbd0: worker terminated
but drbdsetup stays in processlist and cannot be killed with SIGKILL
and then when i run (on primary) drbdadm any_command r0 it never respond (but i can kill it with ctrl+c) 
after reboot primary node it work ok
here is drbd.conf
resource r0 {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr'";
  startup {
    degr-wfc-timeout 120;
  }
  disk {
    on-io-error   detach;
  }
  net {
    on-disconnect reconnect;
  }
  syncer {
    rate 4M;
    group 1;
    al-extents 257;
  }
  on mrwolf {
    device     /dev/drbd0;
    disk       /dev/md5;
    address    10.10.10.6:7788;
    meta-disk  /dev/md6[0];
  }
  on tonymontana {
    device    /dev/drbd0;
    disk      /dev/md7;
    address   10.10.10.5:7788;
    meta-disk /dev/md8[0];
  }
}
kernel log in primary box
Nov 15 17:00:31 mrwolf drbd0: [imapd/2902] sock_sendmsg time expired, ko = 4294967295
Nov 15 17:00:34 mrwolf drbd0: PingAck did not arrive in time.
Nov 15 17:00:34 mrwolf drbd0: drbd0_asender [11531]: cstate Connected --> NetworkFailure
Nov 15 17:00:34 mrwolf drbd0: asender terminated
Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate NetworkFailure --> BrokenPipe
Nov 15 17:00:34 mrwolf drbd0: short read expecting header on sock: r=-512
Nov 15 17:00:34 mrwolf drbd0: imapd [2902]: cstate BrokenPipe --> Timeout
Nov 15 17:00:34 mrwolf drbd0: short sent UnplugRemote size=8 sent=-1001
Nov 15 17:00:34 mrwolf drbd0: worker terminated
Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate Timeout --> Unconnected
Nov 15 17:00:34 mrwolf drbd0: Connection lost.
Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate Unconnected --> WFConnection
Nov 15 17:01:03 mrwolf drbd0: drbd0_receiver [3794]: cstate WFConnection --> WFReportParams
Nov 15 17:11:49 mrwolf drbd0: drbdsetup [17855]: cstate WFReportParams --> Unconnected
Nov 15 17:11:49 mrwolf drbd0: interrupted during initial handshake
Nov 15 17:11:49 mrwolf drbd0: worker terminated
kernel log in secondary box
Nov 15 17:00:24 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Nov 15 17:00:24 tonymontana TDH                  <b5>
Nov 15 17:00:24 tonymontana TDT                  <a1>
Nov 15 17:00:24 tonymontana next_to_use          <a1>
Nov 15 17:00:24 tonymontana next_to_clean        <b5>
Nov 15 17:00:24 tonymontana buffer_info[next_to_clean]
Nov 15 17:00:24 tonymontana dma                  <323f285e>
Nov 15 17:00:24 tonymontana time_stamp           <48b09a7>
Nov 15 17:00:24 tonymontana next_to_watch        <b5>
Nov 15 17:00:24 tonymontana jiffies              <48b0ac1>
Nov 15 17:00:24 tonymontana next_to_watch.status <0>
Nov 15 17:00:26 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Nov 15 17:00:26 tonymontana TDH                  <b5>
Nov 15 17:00:26 tonymontana TDT                  <a1>
Nov 15 17:00:26 tonymontana next_to_use          <a1>
Nov 15 17:00:26 tonymontana next_to_clean        <b5>
Nov 15 17:00:26 tonymontana buffer_info[next_to_clean]
Nov 15 17:00:26 tonymontana dma                  <323f285e>
Nov 15 17:00:26 tonymontana time_stamp           <48b09a7>
Nov 15 17:00:26 tonymontana next_to_watch        <b5>
Nov 15 17:00:26 tonymontana jiffies              <48b0cb5>
Nov 15 17:00:26 tonymontana next_to_watch.status <0>
Nov 15 17:00:28 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Nov 15 17:00:28 tonymontana TDH                  <b5>
Nov 15 17:00:28 tonymontana TDT                  <a1>
Nov 15 17:00:28 tonymontana next_to_use          <a1>
Nov 15 17:00:28 tonymontana next_to_clean        <b5>
Nov 15 17:00:28 tonymontana buffer_info[next_to_clean]
Nov 15 17:00:28 tonymontana dma                  <323f285e>
Nov 15 17:00:28 tonymontana time_stamp           <48b09a7>
Nov 15 17:00:28 tonymontana next_to_watch        <b5>
Nov 15 17:00:28 tonymontana jiffies              <48b0ea9>
Nov 15 17:00:28 tonymontana next_to_watch.status <0>
Nov 15 17:00:29 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:30 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Nov 15 17:00:30 tonymontana TDH                  <b5>
Nov 15 17:00:30 tonymontana TDT                  <a1>
Nov 15 17:00:30 tonymontana next_to_use          <a1>
Nov 15 17:00:30 tonymontana next_to_clean        <b5>
Nov 15 17:00:30 tonymontana buffer_info[next_to_clean]
Nov 15 17:00:30 tonymontana dma                  <323f285e>
Nov 15 17:00:30 tonymontana time_stamp           <48b09a7>
Nov 15 17:00:30 tonymontana next_to_watch        <b5>
Nov 15 17:00:30 tonymontana jiffies              <48b109d>
Nov 15 17:00:30 tonymontana next_to_watch.status <0>
Nov 15 17:00:31 tonymontana NETDEV WATCHDOG: eth0: transmit timed out
Nov 15 17:00:32 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:32 tonymontana e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
Nov 15 17:00:35 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:37 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:40 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:43 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:44 tonymontana drbd0: PingAck did not arrive in time.
Nov 15 17:00:44 tonymontana drbd0: drbd0_asender [6316]: cstate Connected --> NetworkFailure
Nov 15 17:00:44 tonymontana drbd0: asender terminated
Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate NetworkFailure --> BrokenPipe
Nov 15 17:00:44 tonymontana drbd0: short read expecting header on sock: r=-512
Nov 15 17:00:44 tonymontana drbd0: worker terminated
Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate BrokenPipe --> Unconnected
Nov 15 17:00:44 tonymontana drbd0: Connection lost.
Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> WFConnection
Nov 15 17:00:46 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:49 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:51 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:54 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:00:57 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:01:00 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:01:03 tonymontana drbd0: drbd0_receiver [20119]: cstate WFConnection --> WFReportParams
Nov 15 17:01:05 tonymontana drbd0: sock_recvmsg returned -11
Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate WFReportParams --> BrokenPipe
Nov 15 17:01:05 tonymontana drbd0: short read expecting header on sock: r=-11
Nov 15 17:01:05 tonymontana drbd0: worker terminated
Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate BrokenPipe --> Unconnected
Nov 15 17:01:05 tonymontana drbd0: Connection lost.
Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> WFConnection
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 not responding, still trying
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK
Nov 15 17:09:03 tonymontana drbd0: drbdsetup [25051]: cstate WFConnection --> Unconnected
Nov 15 17:09:03 tonymontana drbd0: worker terminated
Nov 15 17:09:03 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> StandAlone
Nov 15 17:09:03 tonymontana drbd0: Connection lost.
Nov 15 17:09:03 tonymontana drbd0: Discarding network configuration.
Nov 15 17:09:03 tonymontana drbd0: drbd0_receiver [20119]: cstate StandAlone --> StandAlone
Nov 15 17:09:03 tonymontana drbd0: receiver terminated
Nov 15 17:09:03 tonymontana drbd0: drbdsetup [25051]: cstate StandAlone --> StandAlone
Nov 15 17:09:27 tonymontana drbd0: drbdsetup [32029]: cstate StandAlone --> Unconnected
Nov 15 17:09:27 tonymontana drbd0: drbd0_receiver [23315]: cstate Unconnected --> WFConnection