[DRBD-user] DRBD locks up after PingAck did not arrive in time
John Du
jjohndu at gmail.com
Wed Jan 14 20:50:52 CET 2009
We are running five pairs of DRBD servers. Three pairs are on 32bit
RHEL 4 and 2 pairs on 64bit RHEL5. Saturday (January 3, 2008) night,
there was some network problems and the two 64bit DRBD pairs locked up
the primary severs and we had to power them off and on. The 32 bit
servers are all survived fine. Is this a known problem in the 64bit
versions of DRBD we are running? Will upgrading to the latest version
fix the problem? Your help is greatly appreciated.
The Linux kernel version is 2.6.18-8.1.15.el5 #1 SMP Thu Oct 4 04:06:39
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux and the DRBD version is 8.2.0-3.
The logs from one of the locked up servers is:
Jan 3 22:47:17 newimapn kernel: drbd1: PingAck did not arrive in time.
Jan 3 22:47:17 newimapn kernel: drbd1: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jan 3 22:47:17 newimapn kernel: drbd1: Creating new current UUID
Jan 3 22:47:17 newimapn kernel: drbd1: asender terminated
Jan 3 22:47:17 newimapn kernel: drbd1: short read expecting header on
sock: r=-512
Jan 3 22:47:17 newimapn kernel: drbd1: tl_clear()
Jan 3 22:47:17 newimapn kernel: drbd1: Connection closed
Jan 3 22:47:17 newimapn kernel: drbd1: Writing meta data super block now.
Jan 3 22:47:17 newimapn kernel: drbd1: conn( NetworkFailure ->
Unconnected )
Jan 3 22:47:17 newimapn kernel: drbd1: receiver terminated
Jan 3 22:47:17 newimapn kernel: drbd1: receiver (re)started
Jan 3 22:47:17 newimapn kernel: drbd1: conn( Unconnected -> WFConnection )
Jan 3 22:47:57 newimapn kernel: drbd1: conn( WFConnection ->
WFReportParams )
Jan 3 22:47:57 newimapn kernel: drbd1: Handshake successful: Agreed
network protocol version 87
Jan 3 22:47:57 newimapn kernel: drbd1: data-integrity-alg:
Jan 3 22:47:58 newimapn kernel: drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jan 3 22:48:28 newimapn kernel: drbd1: PingAck did not arrive in time.
Jan 3 22:47:17 newimapn kernel: drbd1: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jan 3 22:47:17 newimapn kernel: drbd1: Creating new current UUID
Jan 3 22:47:17 newimapn kernel: drbd1: asender terminated
Jan 3 22:47:17 newimapn kernel: drbd1: short read expecting header on
sock: r=-512
Jan 3 22:47:17 newimapn kernel: drbd1: tl_clear()
Jan 3 22:47:17 newimapn kernel: drbd1: Connection closed
Jan 3 22:47:17 newimapn kernel: drbd1: Writing meta data super block now.
Jan 3 22:47:17 newimapn kernel: drbd1: conn( NetworkFailure ->
Unconnected )
Jan 3 22:47:17 newimapn kernel: drbd1: receiver terminated
Jan 3 22:47:17 newimapn kernel: drbd1: receiver (re)started
Jan 3 22:47:17 newimapn kernel: drbd1: conn( Unconnected -> WFConnection )
Jan 3 22:47:57 newimapn kernel: drbd1: conn( WFConnection ->
WFReportParams )
Jan 3 22:47:57 newimapn kernel: drbd1: Handshake successful: Agreed
network protocol version 87
Jan 3 22:47:57 newimapn kernel: drbd1: data-integrity-alg:
Jan 3 22:47:58 newimapn kernel: drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jan 3 22:48:28 newimapn kernel: drbd1: PingAck did not arrive in time.
Jan 3 22:48:28 newimapn kernel: drbd1: peer( Secondary -> Unknown )
conn( WFBitMapS -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jan 3 22:48:28 newimapn kernel: drbd1: asender terminated
Jan 3 22:48:34 newimapn kernel: drbd1: short sent ReportBitMap
size=4096 sent=2288
Jan 3 22:48:34 newimapn kernel: drbd1: Writing meta data super block now.
Jan 3 22:48:34 newimapn kernel: list_add corruption. prev->next should
be ffff81023d295848, but was ffff81023d295a90
Jan 3 22:48:34 newimapn kernel: ----------- [cut here ] ---------
[please bite here ] ---------
More information about the drbd-user
mailing list