[DRBD-user] In the init script, what exactly is drbdadm waiting for, exactly?
Lars Ellenberg
lars.ellenberg at linbit.com
Fri Oct 19 11:45:04 CEST 2007
On Thu, Oct 18, 2007 at 12:16:49PM -0400, Maurice Volaski wrote:
> >On Wed, Oct 17, 2007 at 12:51:59PM -0400, Maurice Volaski wrote:
> > > For a while now, I've been noticing that when starting the secondary,
> >> the script is just stuck waiting. I don't know what it's waiting for,
> >> though. As you can see on the primary, it's been connected, synced,
> >> and up-to-date...
> >>
> >> On the secondary:
> >> DRBD's startup script waits for the peer node(s) to appear.
> >> - In case this node was already a degraded cluster before the
> >> reboot the timeout is 120 seconds. [degr-wfc-timeout]
> >> - If the peer was available before the reboot the timeout will
> >> expire after 0 seconds. [wfc-timeout]
> >> (These values are for resource 'logs'; 0 sec -> wait forever)
> >> To abort waiting enter 'yes' [ 60]:
> >>
> >>
> >> And for a number of seconds already, the primary has been reporting:
> >
> >you should not have stripped the version here.
> >what drbd, what kernel etc.
>
> Sorry, it's 8.0.6 on both systems and the kernel is Gentoo 2.6.23,
> but it's been happening with .22-rX and earlier versions of drbd
> (8.03).
>
> >also, can you reproduce this?
>
> Easily. It's been this way possibly going back before 8.0.3.
it works here, though.
> >could you provide a process listing (grep for drbd)?
>
> It's the same on both. Here's one:
there should at least also be the drbdadm / drbdsetup process
that is "waiting for nothing"? and its "children"?
> 13076 ? S 0:00 [drbd0_asender]
> 13077 ? S 0:00 [drbd1_asender]
> 13078 ? S 0:00 [drbd2_asender]
> 13079 ? S 1:07 [drbd3_asender]
> 13081 ? S 0:10 [drbd4_asender]
> 13082 ? S 0:04 [drbd5_asender]
> 13083 ? S 0:02 [drbd6_asender]
> 13084 ? S 0:01 [drbd7_asender]
> 14942 ? S 0:00 [drbd0_worker]
> 14950 ? S 0:00 [drbd1_worker]
> 14958 ? S 0:00 [drbd2_worker]
> 14966 ? S 1:21 [drbd3_worker]
> 14974 ? S 0:10 [drbd4_worker]
> 14982 ? S 0:07 [drbd5_worker]
> 14990 ? S 0:05 [drbd6_worker]
> 14998 ? S 0:03 [drbd7_worker]
> 15050 ? S 0:00 [drbd0_receiver]
> 15058 ? S 0:01 [drbd1_receiver]
> 15066 ? S 0:00 [drbd2_receiver]
> 15074 ? S 2:10 [drbd3_receiver]
> 15082 ? S 0:17 [drbd4_receiver]
> 15090 ? S 0:11 [drbd5_receiver]
> 15098 ? S 0:08 [drbd6_receiver]
> 15106 ? S 0:05 [drbd7_receiver]
>
> >is there anything "unusual" in the kernel log?
>
> No, here's a time when I started it on the secondary with the stuck
> init script:
Oct 17 11:46:25 drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> Up ToDate )
Oct 17 11:46:25 drbd0: Writing meta data super block now.
Oct 17 11:46:25 drbd0: conn( WFBitMapT -> WFSyncUUID )
Oct 17 11:46:25 drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Oct 17 11:46:25 drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
Oct 17 11:46:25 drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Oct 17 11:46:25 drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Oct 17 11:46:25 drbd0: Writing meta data super block now.
Oct 17 11:46:25 drbd1: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd1: receiver (re)started
Oct 17 11:46:25 drbd1: conn( Unconnected -> WFConnection )
Oct 17 11:46:25 drbd1: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25 drbd1: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25 drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25 drbd1: Writing meta data super block now.
Oct 17 11:46:26 drbd1: conn( WFBitMapT -> WFSyncUUID )
not yet Connected
Oct 17 11:46:25 drbd2: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd2: receiver (re)started
Oct 17 11:46:25 drbd2: conn( Unconnected -> WFConnection )
Oct 17 11:46:25 drbd2: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25 drbd2: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25 drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25 drbd2: Writing meta data super block now.
Oct 17 11:46:25 drbd2: conn( WFBitMapT -> WFSyncUUID )
Oct 17 11:46:25 drbd2: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Oct 17 11:46:25 drbd2: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
Oct 17 11:46:25 drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Oct 17 11:46:25 drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Oct 17 11:46:25 drbd2: Writing meta data super block now.
Oct 17 11:46:25 drbd3: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd3: receiver (re)started
Oct 17 11:46:25 drbd3: conn( Unconnected -> WFConnection )
Oct 17 11:46:25 drbd3: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25 drbd3: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25 drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25 drbd3: Writing meta data super block now.
not yet Connected
Oct 17 11:46:25 drbd4: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd4: receiver (re)started
Oct 17 11:46:25 drbd4: conn( Unconnected -> WFConnection )
Oct 17 11:46:25 drbd4: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25 drbd4: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25 drbd4: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25 drbd4: Writing meta data super block now.
not yet connected
Oct 17 11:46:25 drbd5: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd5: receiver (re)started
Oct 17 11:46:25 drbd5: conn( Unconnected -> WFConnection )
Oct 17 11:46:25 drbd5: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25 drbd5: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:26 drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:26 drbd5: Writing meta data super block now.
not yet connected
Oct 17 11:46:25 drbd6: conn( StandAlone -> Unconnected )
Oct 17 11:46:25 drbd6: receiver (re)started
Oct 17 11:46:25 drbd6: conn( Unconnected -> WFConnection )
Oct 17 11:46:26 drbd6: conn( WFConnection -> WFReportParams )
Oct 17 11:46:26 drbd6: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:26 drbd6: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:26 drbd6: Writing meta data super block now.
not yet connected
Oct 17 11:46:26 drbd7: conn( StandAlone -> Unconnected )
Oct 17 11:46:26 drbd7: receiver (re)started
Oct 17 11:46:26 drbd7: conn( Unconnected -> WFConnection )
Oct 17 11:46:26 drbd7: conn( WFConnection -> WFReportParams )
Oct 17 11:46:26 drbd7: Handshake successful: DRBD Network Protocol version 86
not yet connected
"waiting for connection" would be expected to wait, still.
--
: Lars Ellenberg http://www.linbit.com :
: DRBD/HA support and consulting sales at linbit.com :
: LINBIT Information Technologies GmbH Tel +43-1-8178292-0 :
: Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.
More information about the drbd-user
mailing list