Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I am having a problem with drbd disconnecting and reconnecting frequently. I am currently running version: 8.0.5 (api:86/proto:86) SVN Revision: 3012 (and 0.7.22 see below). When these disconnects occur, the load on the machine spikes to very high numbers (sometimes more than 100). For background, we are using a crossover cable on a gigabit Ethernet card for the synching. The disk that is synced between drbd peers has a high volume of writes. I think the probability that it is a hardware problem is low as I am seeing it on both our test boxes and our production boxes. I am seeing similar problems on version: 0.7.22 (api:79/proto:74) SVN Revision: 2555M which our production machines are running. I upgraded our test machines to 8.0.5 to see if it solved the problem and it does not seem to have done so. At one point on the 8.05 box, I saw this in the log: Aug 20 08:05:40 kernel: [44069281.930000] drbd0: BUG! md_sync_timer expired! Worker calls drbd_md_sync(). I have included snippets from /var/log/messages from the Primary and Secondary for a corresponding time where the disconnect occurred. Primary: Aug 20 07:44:04 kernel: [44067986.030000] drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 20 07:44:04 kernel: [44067986.030000] drbd0: Creating new current UUID Aug 20 07:44:04 kernel: [44067986.030000] drbd0: asender terminated Aug 20 07:44:04 kernel: [44067986.030000] drbd0: sock was shut down by peer Aug 20 07:44:04 kernel: [44067986.030000] drbd0: tl_clear() Aug 20 07:44:04 kernel: [44067986.030000] drbd0: Connection closed Aug 20 07:44:04 kernel: [44067986.030000] drbd0: Writing meta data super block now. Aug 20 07:44:04 kernel: [44067986.120000] drbd0: conn( NetworkFailure -> Unconnected ) Aug 20 07:44:04 kernel: [44067986.120000] drbd0: receiver terminated Aug 20 07:44:04 kernel: [44067986.120000] drbd0: receiver (re)started Aug 20 07:44:04 kernel: [44067986.120000] drbd0: conn( Unconnected -> WFConnection ) Aug 20 07:44:04 kernel: [44067986.120000] drbd0: conn( WFConnection -> WFReportParams ) Aug 20 07:44:04 kernel: [44067986.120000] drbd0: Handshake successful: DRBD Network Protocol version 86 Aug 20 07:44:04 kernel: [44067986.120000] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Aug 20 07:44:04 kernel: [44067986.170000] drbd0: Writing meta data super block now. Aug 20 07:44:04 kernel: [44067986.210000] drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 20 07:44:04 kernel: [44067986.210000] drbd0: Began resync as SyncSource (will sync 484 KB [121 bits set]). Aug 20 07:44:04 kernel: [44067986.210000] drbd0: Writing meta data super block now. Aug 20 07:44:04 kernel: [44067986.370000] drbd0: Resync done (total 1 sec; paused 0 sec; 484 K/sec) Aug 20 07:44:04 kernel: [44067986.380000] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Aug 20 07:44:04 kernel: [44067986.380000] drbd0: Writing meta data super block now. Secondary: Aug 20 07:44:01 kernel: [44067168.510000] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 20 07:44:01 kernel: [44067168.510000] drbd0: asender terminated Aug 20 07:44:01 kernel: [44067168.510000] drbd0: tl_clear() Aug 20 07:44:01 kernel: [44067168.510000] drbd0: Connection closed Aug 20 07:44:01 kernel: [44067168.510000] drbd0: Writing meta data super block now. Aug 20 07:44:01 kernel: [44067168.520000] drbd0: conn( NetworkFailure -> Unconnected ) Aug 20 07:44:01 kernel: [44067168.520000] drbd0: receiver terminated Aug 20 07:44:01 kernel: [44067168.520000] drbd0: receiver (re)started Aug 20 07:44:01 kernel: [44067168.520000] drbd0: conn( Unconnected -> WFConnection ) Aug 20 07:44:04 kernel: [44067171.670000] drbd0: conn( WFConnection -> WFReportParams ) Aug 20 07:44:04 kernel: [44067171.670000] drbd0: Handshake successful: DRBD Network Protocol version 86 Aug 20 07:44:04 kernel: [44067171.680000] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Aug 20 07:44:04 kernel: [44067171.680000] drbd0: Writing meta data super block now. Aug 20 07:44:04 kernel: [44067171.760000] drbd0: conn( WFBitMapT -> WFSyncUUID ) Aug 20 07:44:04 kernel: [44067171.770000] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Aug 20 07:44:04 kernel: [44067171.770000] drbd0: Began resync as SyncTarget (will sync 484 KB [121 bits set]). Aug 20 07:44:04 kernel: [44067171.770000] drbd0: Writing meta data super block now. Aug 20 07:44:04 kernel: [44067171.930000] drbd0: Resync done (total 1 sec; paused 0 sec; 484 K/sec) Aug 20 07:44:04 kernel: [44067171.940000] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Aug 20 07:44:04 kernel: [44067171.940000] drbd0: Writing meta data super block now. Any help is much appreciated, Brian