Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We are running five pairs of DRBD servers. Three pairs are on 32bit RHEL 4 and 2 pairs on 64bit RHEL5. Saturday (January 3, 2008) night, there was some network problems and the two 64bit DRBD pairs locked up the primary severs and we had to power them off and on. The 32 bit servers are all survived fine. Is this a known problem in the 64bit versions of DRBD we are running? Will upgrading to the latest version fix the problem? Your help is greatly appreciated. The Linux kernel version is 2.6.18-8.1.15.el5 #1 SMP Thu Oct 4 04:06:39 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux and the DRBD version is 8.2.0-3. The logs from one of the locked up servers is: Jan 3 22:47:17 newimapn kernel: drbd1: PingAck did not arrive in time. Jan 3 22:47:17 newimapn kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jan 3 22:47:17 newimapn kernel: drbd1: Creating new current UUID Jan 3 22:47:17 newimapn kernel: drbd1: asender terminated Jan 3 22:47:17 newimapn kernel: drbd1: short read expecting header on sock: r=-512 Jan 3 22:47:17 newimapn kernel: drbd1: tl_clear() Jan 3 22:47:17 newimapn kernel: drbd1: Connection closed Jan 3 22:47:17 newimapn kernel: drbd1: Writing meta data super block now. Jan 3 22:47:17 newimapn kernel: drbd1: conn( NetworkFailure -> Unconnected ) Jan 3 22:47:17 newimapn kernel: drbd1: receiver terminated Jan 3 22:47:17 newimapn kernel: drbd1: receiver (re)started Jan 3 22:47:17 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) Jan 3 22:47:57 newimapn kernel: drbd1: conn( WFConnection -> WFReportParams ) Jan 3 22:47:57 newimapn kernel: drbd1: Handshake successful: Agreed network protocol version 87 Jan 3 22:47:57 newimapn kernel: drbd1: data-integrity-alg: Jan 3 22:47:58 newimapn kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Jan 3 22:48:28 newimapn kernel: drbd1: PingAck did not arrive in time. Jan 3 22:47:17 newimapn kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jan 3 22:47:17 newimapn kernel: drbd1: Creating new current UUID Jan 3 22:47:17 newimapn kernel: drbd1: asender terminated Jan 3 22:47:17 newimapn kernel: drbd1: short read expecting header on sock: r=-512 Jan 3 22:47:17 newimapn kernel: drbd1: tl_clear() Jan 3 22:47:17 newimapn kernel: drbd1: Connection closed Jan 3 22:47:17 newimapn kernel: drbd1: Writing meta data super block now. Jan 3 22:47:17 newimapn kernel: drbd1: conn( NetworkFailure -> Unconnected ) Jan 3 22:47:17 newimapn kernel: drbd1: receiver terminated Jan 3 22:47:17 newimapn kernel: drbd1: receiver (re)started Jan 3 22:47:17 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) Jan 3 22:47:57 newimapn kernel: drbd1: conn( WFConnection -> WFReportParams ) Jan 3 22:47:57 newimapn kernel: drbd1: Handshake successful: Agreed network protocol version 87 Jan 3 22:47:57 newimapn kernel: drbd1: data-integrity-alg: Jan 3 22:47:58 newimapn kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Jan 3 22:48:28 newimapn kernel: drbd1: PingAck did not arrive in time. Jan 3 22:48:28 newimapn kernel: drbd1: peer( Secondary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jan 3 22:48:28 newimapn kernel: drbd1: asender terminated Jan 3 22:48:34 newimapn kernel: drbd1: short sent ReportBitMap size=4096 sent=2288 Jan 3 22:48:34 newimapn kernel: drbd1: Writing meta data super block now. Jan 3 22:48:34 newimapn kernel: list_add corruption. prev->next should be ffff81023d295848, but was ffff81023d295a90 Jan 3 22:48:34 newimapn kernel: ----------- [cut here ] --------- [please bite here ] ---------