Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have two dual Xeon machines running i386 kernels that in the past I have connected via drbd via an gigabit ethernet direct connect with no problems. I decided to change the secondary to a x86_64 2.6.17.11 kernel as an experiment. Now, after a while the connection drops. On the primary (i386, 2.6.11.12) I see: Sep 5 17:20:22 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg time expired, ko = 4294967295 Sep 5 17:20:30 freshair2 kernel: drbd0: [pdflush/3208] sock_sendmsg time expired, ko = 4294967295 Sep 5 17:20:32 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg time expired, ko = 4294967295 Sep 5 17:20:33 freshair2 kernel: drbd0: [pdflush/3208] sock_sendmsg time expired, ko = 4294967294 Sep 5 17:20:33 freshair2 kernel: drbd0: PingAck did not arrive in time. Sep 5 17:20:33 freshair2 kernel: drbd0: drbd0_asender [2802]: cstate Connected --> NetworkFailure Sep 5 17:20:33 freshair2 kernel: drbd0: asender terminated Sep 5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate NetworkFailure --> BrokenPipe Sep 5 17:20:33 freshair2 kernel: drbd0: short read expecting header on sock: r=-512 Sep 5 17:20:33 freshair2 kernel: drbd0: _drbd_send_page: size=4096 len=2664 sent=-4 Sep 5 17:20:33 freshair2 kernel: drbd0: short sent UnplugRemote size=8 sent=-1001 Sep 5 17:20:33 freshair2 kernel: drbd0: worker terminated Sep 5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate BrokenPipe --> Unconnected Sep 5 17:20:33 freshair2 kernel: drbd0: Connection lost. Sep 5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate Unconnected --> WFConnection Sep 5 17:20:35 freshair2 kernel: drbd1: PingAck did not arrive in time. Sep 5 17:20:35 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg time expired, ko = 4294967294 Sep 5 17:20:35 freshair2 kernel: drbd1: drbd1_asender [2803]: cstate Connected --> NetworkFailure Sep 5 17:20:35 freshair2 kernel: drbd1: asender terminated Sep 5 17:20:35 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate NetworkFailure --> BrokenPipe Sep 5 17:20:35 freshair2 kernel: drbd1: short read expecting header on sock: r=-512 Sep 5 17:20:35 freshair2 kernel: drbd1: worker terminated Sep 5 17:20:35 freshair2 kernel: drbd1: _drbd_send_page: size=4096 len=1192 sent=-4 Sep 5 17:20:35 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate BrokenPipe --> Unconnected Sep 5 17:20:36 freshair2 kernel: drbd1: Connection lost. Sep 5 17:20:37 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate Unconnected --> WFConnection On the secondary (x86_64, 2.6.17.11) I see: Sep 5 17:25:55 freshair1 kernel: drbd0: meta connection shut down by peer. Sep 5 17:25:55 freshair1 kernel: drbd0: drbd0_asender [6909]: cstate Connected --> NetworkFailure Sep 5 17:25:55 freshair1 kernel: drbd0: asender terminated Sep 5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate NetworkFailure --> BrokenPipe Sep 5 17:25:55 freshair1 kernel: drbd0: short read receiving data block: read 864 expected 4096 Sep 5 17:25:55 freshair1 kernel: drbd0: error receiving Data, l: 4112! Sep 5 17:25:55 freshair1 kernel: drbd0: worker terminated Sep 5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate BrokenPipe --> Unconnected Sep 5 17:25:55 freshair1 kernel: drbd0: Connection lost. Sep 5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate Unconnected --> StandAlone Sep 5 17:25:55 freshair1 kernel: drbd0: receiver terminated Sep 5 17:25:55 freshair1 kernel: drbd1: meta connection shut down by peer. Sep 5 17:25:55 freshair1 kernel: drbd1: drbd1_asender [6910]: cstate Connected --> NetworkFailure Sep 5 17:25:55 freshair1 kernel: drbd1: asender terminated Sep 5 17:25:55 freshair1 kernel: drbd1: short sent BarrierAck size=16 sent=-1001 Sep 5 17:25:55 freshair1 kernel: drbd1: error receiving Barrier, l: 8! Sep 5 17:25:55 freshair1 kernel: drbd1: worker terminated Sep 5 17:25:55 freshair1 kernel: drbd1: drbd1_receiver [6892]: cstate NetworkFailure --> Unconnected Sep 5 17:25:55 freshair1 kernel: drbd1: Connection lost. Sep 5 17:25:55 freshair1 kernel: drbd1: drbd1_receiver [6892]: cstate Unconnected --> StandAlone Sep 5 17:25:55 freshair1 kernel: drbd1: receiver terminated One of the interesting items is the 5 minute delay between the primary and the secondary on the disconnect. Both systems are running NTP off of the same servers, so they are in perfect time synch. The connection log entry times show no delays. Any ideas what is going on?