Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Oct 18, 2007 at 12:16:49PM -0400, Maurice Volaski wrote: > >On Wed, Oct 17, 2007 at 12:51:59PM -0400, Maurice Volaski wrote: > > > For a while now, I've been noticing that when starting the secondary, > >> the script is just stuck waiting. I don't know what it's waiting for, > >> though. As you can see on the primary, it's been connected, synced, > >> and up-to-date... > >> > >> On the secondary: > >> DRBD's startup script waits for the peer node(s) to appear. > >> - In case this node was already a degraded cluster before the > >> reboot the timeout is 120 seconds. [degr-wfc-timeout] > >> - If the peer was available before the reboot the timeout will > >> expire after 0 seconds. [wfc-timeout] > >> (These values are for resource 'logs'; 0 sec -> wait forever) > >> To abort waiting enter 'yes' [ 60]: > >> > >> > >> And for a number of seconds already, the primary has been reporting: > > > >you should not have stripped the version here. > >what drbd, what kernel etc. > > Sorry, it's 8.0.6 on both systems and the kernel is Gentoo 2.6.23, > but it's been happening with .22-rX and earlier versions of drbd > (8.03). > > >also, can you reproduce this? > > Easily. It's been this way possibly going back before 8.0.3. it works here, though. > >could you provide a process listing (grep for drbd)? > > It's the same on both. Here's one: there should at least also be the drbdadm / drbdsetup process that is "waiting for nothing"? and its "children"? > 13076 ? S 0:00 [drbd0_asender] > 13077 ? S 0:00 [drbd1_asender] > 13078 ? S 0:00 [drbd2_asender] > 13079 ? S 1:07 [drbd3_asender] > 13081 ? S 0:10 [drbd4_asender] > 13082 ? S 0:04 [drbd5_asender] > 13083 ? S 0:02 [drbd6_asender] > 13084 ? S 0:01 [drbd7_asender] > 14942 ? S 0:00 [drbd0_worker] > 14950 ? S 0:00 [drbd1_worker] > 14958 ? S 0:00 [drbd2_worker] > 14966 ? S 1:21 [drbd3_worker] > 14974 ? S 0:10 [drbd4_worker] > 14982 ? S 0:07 [drbd5_worker] > 14990 ? S 0:05 [drbd6_worker] > 14998 ? S 0:03 [drbd7_worker] > 15050 ? S 0:00 [drbd0_receiver] > 15058 ? S 0:01 [drbd1_receiver] > 15066 ? S 0:00 [drbd2_receiver] > 15074 ? S 2:10 [drbd3_receiver] > 15082 ? S 0:17 [drbd4_receiver] > 15090 ? S 0:11 [drbd5_receiver] > 15098 ? S 0:08 [drbd6_receiver] > 15106 ? S 0:05 [drbd7_receiver] > > >is there anything "unusual" in the kernel log? > > No, here's a time when I started it on the secondary with the stuck > init script: Oct 17 11:46:25 drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> Up ToDate ) Oct 17 11:46:25 drbd0: Writing meta data super block now. Oct 17 11:46:25 drbd0: conn( WFBitMapT -> WFSyncUUID ) Oct 17 11:46:25 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Oct 17 11:46:25 drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]). Oct 17 11:46:25 drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Oct 17 11:46:25 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Oct 17 11:46:25 drbd0: Writing meta data super block now. Oct 17 11:46:25 drbd1: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd1: receiver (re)started Oct 17 11:46:25 drbd1: conn( Unconnected -> WFConnection ) Oct 17 11:46:25 drbd1: conn( WFConnection -> WFReportParams ) Oct 17 11:46:25 drbd1: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:25 drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:25 drbd1: Writing meta data super block now. Oct 17 11:46:26 drbd1: conn( WFBitMapT -> WFSyncUUID ) not yet Connected Oct 17 11:46:25 drbd2: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd2: receiver (re)started Oct 17 11:46:25 drbd2: conn( Unconnected -> WFConnection ) Oct 17 11:46:25 drbd2: conn( WFConnection -> WFReportParams ) Oct 17 11:46:25 drbd2: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:25 drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:25 drbd2: Writing meta data super block now. Oct 17 11:46:25 drbd2: conn( WFBitMapT -> WFSyncUUID ) Oct 17 11:46:25 drbd2: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Oct 17 11:46:25 drbd2: Began resync as SyncTarget (will sync 0 KB [0 bits set]). Oct 17 11:46:25 drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Oct 17 11:46:25 drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Oct 17 11:46:25 drbd2: Writing meta data super block now. Oct 17 11:46:25 drbd3: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd3: receiver (re)started Oct 17 11:46:25 drbd3: conn( Unconnected -> WFConnection ) Oct 17 11:46:25 drbd3: conn( WFConnection -> WFReportParams ) Oct 17 11:46:25 drbd3: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:25 drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:25 drbd3: Writing meta data super block now. not yet Connected Oct 17 11:46:25 drbd4: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd4: receiver (re)started Oct 17 11:46:25 drbd4: conn( Unconnected -> WFConnection ) Oct 17 11:46:25 drbd4: conn( WFConnection -> WFReportParams ) Oct 17 11:46:25 drbd4: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:25 drbd4: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:25 drbd4: Writing meta data super block now. not yet connected Oct 17 11:46:25 drbd5: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd5: receiver (re)started Oct 17 11:46:25 drbd5: conn( Unconnected -> WFConnection ) Oct 17 11:46:25 drbd5: conn( WFConnection -> WFReportParams ) Oct 17 11:46:25 drbd5: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:26 drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:26 drbd5: Writing meta data super block now. not yet connected Oct 17 11:46:25 drbd6: conn( StandAlone -> Unconnected ) Oct 17 11:46:25 drbd6: receiver (re)started Oct 17 11:46:25 drbd6: conn( Unconnected -> WFConnection ) Oct 17 11:46:26 drbd6: conn( WFConnection -> WFReportParams ) Oct 17 11:46:26 drbd6: Handshake successful: DRBD Network Protocol version 86 Oct 17 11:46:26 drbd6: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 17 11:46:26 drbd6: Writing meta data super block now. not yet connected Oct 17 11:46:26 drbd7: conn( StandAlone -> Unconnected ) Oct 17 11:46:26 drbd7: receiver (re)started Oct 17 11:46:26 drbd7: conn( Unconnected -> WFConnection ) Oct 17 11:46:26 drbd7: conn( WFConnection -> WFReportParams ) Oct 17 11:46:26 drbd7: Handshake successful: DRBD Network Protocol version 86 not yet connected "waiting for connection" would be expected to wait, still. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.