Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
hello i have strange problem with drbd i have 2 boxes with 2.6.13.4-grsec kernel and drbd 0.7.14 after network failure these nodes disconnect and after network is again online primary node stays frozen in cs:WFReportParams then after some time i execute on primary node "drbdadm disconnect r0": root at mrwolf:~adm# drbdadm disconnect r0 Child process does not terminate! Exiting. in kernel log i have: Nov 15 17:11:49 mrwolf drbd0: drbdsetup [17855]: cstate WFReportParams --> Unconnected Nov 15 17:11:49 mrwolf drbd0: interrupted during initial handshake Nov 15 17:11:49 mrwolf drbd0: worker terminated but drbdsetup stays in processlist and cannot be killed with SIGKILL and then when i run (on primary) drbdadm any_command r0 it never respond (but i can kill it with ctrl+c) after reboot primary node it work ok here is drbd.conf resource r0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr'"; startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net { on-disconnect reconnect; } syncer { rate 4M; group 1; al-extents 257; } on mrwolf { device /dev/drbd0; disk /dev/md5; address 10.10.10.6:7788; meta-disk /dev/md6[0]; } on tonymontana { device /dev/drbd0; disk /dev/md7; address 10.10.10.5:7788; meta-disk /dev/md8[0]; } } kernel log in primary box Nov 15 17:00:31 mrwolf drbd0: [imapd/2902] sock_sendmsg time expired, ko = 4294967295 Nov 15 17:00:34 mrwolf drbd0: PingAck did not arrive in time. Nov 15 17:00:34 mrwolf drbd0: drbd0_asender [11531]: cstate Connected --> NetworkFailure Nov 15 17:00:34 mrwolf drbd0: asender terminated Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate NetworkFailure --> BrokenPipe Nov 15 17:00:34 mrwolf drbd0: short read expecting header on sock: r=-512 Nov 15 17:00:34 mrwolf drbd0: imapd [2902]: cstate BrokenPipe --> Timeout Nov 15 17:00:34 mrwolf drbd0: short sent UnplugRemote size=8 sent=-1001 Nov 15 17:00:34 mrwolf drbd0: worker terminated Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate Timeout --> Unconnected Nov 15 17:00:34 mrwolf drbd0: Connection lost. Nov 15 17:00:34 mrwolf drbd0: drbd0_receiver [3794]: cstate Unconnected --> WFConnection Nov 15 17:01:03 mrwolf drbd0: drbd0_receiver [3794]: cstate WFConnection --> WFReportParams Nov 15 17:11:49 mrwolf drbd0: drbdsetup [17855]: cstate WFReportParams --> Unconnected Nov 15 17:11:49 mrwolf drbd0: interrupted during initial handshake Nov 15 17:11:49 mrwolf drbd0: worker terminated kernel log in secondary box Nov 15 17:00:24 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 15 17:00:24 tonymontana TDH <b5> Nov 15 17:00:24 tonymontana TDT <a1> Nov 15 17:00:24 tonymontana next_to_use <a1> Nov 15 17:00:24 tonymontana next_to_clean <b5> Nov 15 17:00:24 tonymontana buffer_info[next_to_clean] Nov 15 17:00:24 tonymontana dma <323f285e> Nov 15 17:00:24 tonymontana time_stamp <48b09a7> Nov 15 17:00:24 tonymontana next_to_watch <b5> Nov 15 17:00:24 tonymontana jiffies <48b0ac1> Nov 15 17:00:24 tonymontana next_to_watch.status <0> Nov 15 17:00:26 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 15 17:00:26 tonymontana TDH <b5> Nov 15 17:00:26 tonymontana TDT <a1> Nov 15 17:00:26 tonymontana next_to_use <a1> Nov 15 17:00:26 tonymontana next_to_clean <b5> Nov 15 17:00:26 tonymontana buffer_info[next_to_clean] Nov 15 17:00:26 tonymontana dma <323f285e> Nov 15 17:00:26 tonymontana time_stamp <48b09a7> Nov 15 17:00:26 tonymontana next_to_watch <b5> Nov 15 17:00:26 tonymontana jiffies <48b0cb5> Nov 15 17:00:26 tonymontana next_to_watch.status <0> Nov 15 17:00:28 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 15 17:00:28 tonymontana TDH <b5> Nov 15 17:00:28 tonymontana TDT <a1> Nov 15 17:00:28 tonymontana next_to_use <a1> Nov 15 17:00:28 tonymontana next_to_clean <b5> Nov 15 17:00:28 tonymontana buffer_info[next_to_clean] Nov 15 17:00:28 tonymontana dma <323f285e> Nov 15 17:00:28 tonymontana time_stamp <48b09a7> Nov 15 17:00:28 tonymontana next_to_watch <b5> Nov 15 17:00:28 tonymontana jiffies <48b0ea9> Nov 15 17:00:28 tonymontana next_to_watch.status <0> Nov 15 17:00:29 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:30 tonymontana e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 15 17:00:30 tonymontana TDH <b5> Nov 15 17:00:30 tonymontana TDT <a1> Nov 15 17:00:30 tonymontana next_to_use <a1> Nov 15 17:00:30 tonymontana next_to_clean <b5> Nov 15 17:00:30 tonymontana buffer_info[next_to_clean] Nov 15 17:00:30 tonymontana dma <323f285e> Nov 15 17:00:30 tonymontana time_stamp <48b09a7> Nov 15 17:00:30 tonymontana next_to_watch <b5> Nov 15 17:00:30 tonymontana jiffies <48b109d> Nov 15 17:00:30 tonymontana next_to_watch.status <0> Nov 15 17:00:31 tonymontana NETDEV WATCHDOG: eth0: transmit timed out Nov 15 17:00:32 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:32 tonymontana e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex Nov 15 17:00:35 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:37 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:40 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:43 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:44 tonymontana drbd0: PingAck did not arrive in time. Nov 15 17:00:44 tonymontana drbd0: drbd0_asender [6316]: cstate Connected --> NetworkFailure Nov 15 17:00:44 tonymontana drbd0: asender terminated Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate NetworkFailure --> BrokenPipe Nov 15 17:00:44 tonymontana drbd0: short read expecting header on sock: r=-512 Nov 15 17:00:44 tonymontana drbd0: worker terminated Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate BrokenPipe --> Unconnected Nov 15 17:00:44 tonymontana drbd0: Connection lost. Nov 15 17:00:44 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> WFConnection Nov 15 17:00:46 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:49 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:51 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:54 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:00:57 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:01:00 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:01:03 tonymontana drbd0: drbd0_receiver [20119]: cstate WFConnection --> WFReportParams Nov 15 17:01:05 tonymontana drbd0: sock_recvmsg returned -11 Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate WFReportParams --> BrokenPipe Nov 15 17:01:05 tonymontana drbd0: short read expecting header on sock: r=-11 Nov 15 17:01:05 tonymontana drbd0: worker terminated Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate BrokenPipe --> Unconnected Nov 15 17:01:05 tonymontana drbd0: Connection lost. Nov 15 17:01:05 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> WFConnection Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 not responding, still trying Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:01:05 tonymontana nfs: server 10.10.10.6 OK Nov 15 17:09:03 tonymontana drbd0: drbdsetup [25051]: cstate WFConnection --> Unconnected Nov 15 17:09:03 tonymontana drbd0: worker terminated Nov 15 17:09:03 tonymontana drbd0: drbd0_receiver [20119]: cstate Unconnected --> StandAlone Nov 15 17:09:03 tonymontana drbd0: Connection lost. Nov 15 17:09:03 tonymontana drbd0: Discarding network configuration. Nov 15 17:09:03 tonymontana drbd0: drbd0_receiver [20119]: cstate StandAlone --> StandAlone Nov 15 17:09:03 tonymontana drbd0: receiver terminated Nov 15 17:09:03 tonymontana drbd0: drbdsetup [25051]: cstate StandAlone --> StandAlone Nov 15 17:09:27 tonymontana drbd0: drbdsetup [32029]: cstate StandAlone --> Unconnected Nov 15 17:09:27 tonymontana drbd0: drbd0_receiver [23315]: cstate Unconnected --> WFConnection