[DRBD-user] 0.7.21 problem between i386 and x86_64 machine

Harry Edmon harry at atmos.washington.edu
Wed Sep 6 18:05:17 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have two dual Xeon machines running i386 kernels that in the past I 
have connected via drbd via an gigabit ethernet direct connect with no 
problems.  I decided to change the secondary to a x86_64 2.6.17.11 
kernel as an experiment.  Now, after a while the connection drops.  On 
the primary (i386, 2.6.11.12) I see:

Sep  5 17:20:22 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg 
time expired, ko = 4294967295
Sep  5 17:20:30 freshair2 kernel: drbd0: [pdflush/3208] sock_sendmsg 
time expired, ko = 4294967295
Sep  5 17:20:32 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg 
time expired, ko = 4294967295
Sep  5 17:20:33 freshair2 kernel: drbd0: [pdflush/3208] sock_sendmsg 
time expired, ko = 4294967294
Sep  5 17:20:33 freshair2 kernel: drbd0: PingAck did not arrive in time.
Sep  5 17:20:33 freshair2 kernel: drbd0: drbd0_asender [2802]: cstate 
Connected --> NetworkFailure
Sep  5 17:20:33 freshair2 kernel: drbd0: asender terminated
Sep  5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate 
NetworkFailure --> BrokenPipe
Sep  5 17:20:33 freshair2 kernel: drbd0: short read expecting header on 
sock: r=-512
Sep  5 17:20:33 freshair2 kernel: drbd0: _drbd_send_page: size=4096 
len=2664 sent=-4
Sep  5 17:20:33 freshair2 kernel: drbd0: short sent UnplugRemote size=8 
sent=-1001
Sep  5 17:20:33 freshair2 kernel: drbd0: worker terminated
Sep  5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate 
BrokenPipe --> Unconnected
Sep  5 17:20:33 freshair2 kernel: drbd0: Connection lost.
Sep  5 17:20:33 freshair2 kernel: drbd0: drbd0_receiver [3306]: cstate 
Unconnected --> WFConnection
Sep  5 17:20:35 freshair2 kernel: drbd1: PingAck did not arrive in time.
Sep  5 17:20:35 freshair2 kernel: drbd1: [pdflush/2094] sock_sendmsg 
time expired, ko = 4294967294
Sep  5 17:20:35 freshair2 kernel: drbd1: drbd1_asender [2803]: cstate 
Connected --> NetworkFailure
Sep  5 17:20:35 freshair2 kernel: drbd1: asender terminated
Sep  5 17:20:35 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate 
NetworkFailure --> BrokenPipe
Sep  5 17:20:35 freshair2 kernel: drbd1: short read expecting header on 
sock: r=-512
Sep  5 17:20:35 freshair2 kernel: drbd1: worker terminated
Sep  5 17:20:35 freshair2 kernel: drbd1: _drbd_send_page: size=4096 
len=1192 sent=-4
Sep  5 17:20:35 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate 
BrokenPipe --> Unconnected
Sep  5 17:20:36 freshair2 kernel: drbd1: Connection lost.
Sep  5 17:20:37 freshair2 kernel: drbd1: drbd1_receiver [3314]: cstate 
Unconnected --> WFConnection

On the secondary (x86_64, 2.6.17.11) I see:

Sep  5 17:25:55 freshair1 kernel: drbd0: meta connection shut down by peer.
Sep  5 17:25:55 freshair1 kernel: drbd0: drbd0_asender [6909]: cstate 
Connected --> NetworkFailure
Sep  5 17:25:55 freshair1 kernel: drbd0: asender terminated
Sep  5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate 
NetworkFailure --> BrokenPipe
Sep  5 17:25:55 freshair1 kernel: drbd0: short read receiving data 
block: read 864 expected 4096
Sep  5 17:25:55 freshair1 kernel: drbd0: error receiving Data, l: 4112!
Sep  5 17:25:55 freshair1 kernel: drbd0: worker terminated
Sep  5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate 
BrokenPipe --> Unconnected
Sep  5 17:25:55 freshair1 kernel: drbd0: Connection lost.
Sep  5 17:25:55 freshair1 kernel: drbd0: drbd0_receiver [6884]: cstate 
Unconnected --> StandAlone
Sep  5 17:25:55 freshair1 kernel: drbd0: receiver terminated
Sep  5 17:25:55 freshair1 kernel: drbd1: meta connection shut down by peer.
Sep  5 17:25:55 freshair1 kernel: drbd1: drbd1_asender [6910]: cstate 
Connected --> NetworkFailure
Sep  5 17:25:55 freshair1 kernel: drbd1: asender terminated
Sep  5 17:25:55 freshair1 kernel: drbd1: short sent BarrierAck size=16 
sent=-1001
Sep  5 17:25:55 freshair1 kernel: drbd1: error receiving Barrier, l: 8!
Sep  5 17:25:55 freshair1 kernel: drbd1: worker terminated
Sep  5 17:25:55 freshair1 kernel: drbd1: drbd1_receiver [6892]: cstate 
NetworkFailure --> Unconnected
Sep  5 17:25:55 freshair1 kernel: drbd1: Connection lost.
Sep  5 17:25:55 freshair1 kernel: drbd1: drbd1_receiver [6892]: cstate 
Unconnected --> StandAlone
Sep  5 17:25:55 freshair1 kernel: drbd1: receiver terminated


One of the interesting items is the 5 minute delay between the primary 
and the secondary on the disconnect.  Both systems are running NTP off 
of the same servers, so they are in perfect time synch.  The connection 
log entry times show no delays.  Any ideas what is going on?



More information about the drbd-user mailing list