[DRBD-user] In the init script, what exactly is drbdadm waiting for, exactly?

Lars Ellenberg lars.ellenberg at linbit.com
Fri Oct 19 11:45:04 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Oct 18, 2007 at 12:16:49PM -0400, Maurice Volaski wrote:
> >On Wed, Oct 17, 2007 at 12:51:59PM -0400, Maurice Volaski wrote:
> > > For a while now, I've been noticing that when starting the secondary,
> >> the script is just stuck waiting. I don't know what it's waiting for,
> >> though. As you can see on the primary, it's been connected, synced,
> >> and up-to-date...
> >>
> >> On the secondary:
> >> DRBD's startup script waits for the peer node(s) to appear.
> >>  - In case this node was already a degraded cluster before the
> >>    reboot the timeout is 120 seconds. [degr-wfc-timeout]
> >>  - If the peer was available before the reboot the timeout will
> >>    expire after 0 seconds. [wfc-timeout]
> >>    (These values are for resource 'logs'; 0 sec -> wait forever)
> >>  To abort waiting enter 'yes' [  60]:
> >>
> >>
> >> And for a number of seconds already, the primary has been reporting:
> >
> >you should not have stripped the version here.
> >what drbd, what kernel etc.
> 
> Sorry, it's 8.0.6 on both systems and the kernel is Gentoo 2.6.23, 
> but it's been happening with .22-rX and earlier versions of drbd 
> (8.03).
> 
> >also, can you reproduce this?
> 
> Easily. It's been this way possibly going back before 8.0.3.

it works here, though.

> >could you provide a process listing (grep for drbd)?
> 
> It's the same on both. Here's one:

there should at least also be the drbdadm / drbdsetup process
that is "waiting for nothing"? and its "children"?

> 13076 ?        S      0:00 [drbd0_asender]
> 13077 ?        S      0:00 [drbd1_asender]
> 13078 ?        S      0:00 [drbd2_asender]
> 13079 ?        S      1:07 [drbd3_asender]
> 13081 ?        S      0:10 [drbd4_asender]
> 13082 ?        S      0:04 [drbd5_asender]
> 13083 ?        S      0:02 [drbd6_asender]
> 13084 ?        S      0:01 [drbd7_asender]
> 14942 ?        S      0:00 [drbd0_worker]
> 14950 ?        S      0:00 [drbd1_worker]
> 14958 ?        S      0:00 [drbd2_worker]
> 14966 ?        S      1:21 [drbd3_worker]
> 14974 ?        S      0:10 [drbd4_worker]
> 14982 ?        S      0:07 [drbd5_worker]
> 14990 ?        S      0:05 [drbd6_worker]
> 14998 ?        S      0:03 [drbd7_worker]
> 15050 ?        S      0:00 [drbd0_receiver]
> 15058 ?        S      0:01 [drbd1_receiver]
> 15066 ?        S      0:00 [drbd2_receiver]
> 15074 ?        S      2:10 [drbd3_receiver]
> 15082 ?        S      0:17 [drbd4_receiver]
> 15090 ?        S      0:11 [drbd5_receiver]
> 15098 ?        S      0:08 [drbd6_receiver]
> 15106 ?        S      0:05 [drbd7_receiver]
> 
> >is there anything "unusual" in the kernel log?
> 
> No, here's a time when I started it on the secondary with the stuck 
> init script:

Oct 17 11:46:25  drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> Up ToDate )
Oct 17 11:46:25  drbd0: Writing meta data super block now.
Oct 17 11:46:25  drbd0: conn( WFBitMapT -> WFSyncUUID )
Oct 17 11:46:25  drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Oct 17 11:46:25  drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
Oct 17 11:46:25  drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Oct 17 11:46:25  drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Oct 17 11:46:25  drbd0: Writing meta data super block now.

Oct 17 11:46:25  drbd1: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd1: receiver (re)started
Oct 17 11:46:25  drbd1: conn( Unconnected -> WFConnection )
Oct 17 11:46:25  drbd1: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25  drbd1: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25  drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25  drbd1: Writing meta data super block now.
Oct 17 11:46:26  drbd1: conn( WFBitMapT -> WFSyncUUID )

not yet Connected

Oct 17 11:46:25  drbd2: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd2: receiver (re)started
Oct 17 11:46:25  drbd2: conn( Unconnected -> WFConnection )
Oct 17 11:46:25  drbd2: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25  drbd2: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25  drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25  drbd2: Writing meta data super block now.
Oct 17 11:46:25  drbd2: conn( WFBitMapT -> WFSyncUUID )
Oct 17 11:46:25  drbd2: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Oct 17 11:46:25  drbd2: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
Oct 17 11:46:25  drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Oct 17 11:46:25  drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Oct 17 11:46:25  drbd2: Writing meta data super block now.

Oct 17 11:46:25  drbd3: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd3: receiver (re)started
Oct 17 11:46:25  drbd3: conn( Unconnected -> WFConnection )
Oct 17 11:46:25  drbd3: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25  drbd3: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25  drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25  drbd3: Writing meta data super block now.

not yet Connected

Oct 17 11:46:25  drbd4: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd4: receiver (re)started
Oct 17 11:46:25  drbd4: conn( Unconnected -> WFConnection )
Oct 17 11:46:25  drbd4: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25  drbd4: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:25  drbd4: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:25  drbd4: Writing meta data super block now.

not yet connected

Oct 17 11:46:25  drbd5: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd5: receiver (re)started
Oct 17 11:46:25  drbd5: conn( Unconnected -> WFConnection )
Oct 17 11:46:25  drbd5: conn( WFConnection -> WFReportParams )
Oct 17 11:46:25  drbd5: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:26  drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:26  drbd5: Writing meta data super block now.

not yet connected

Oct 17 11:46:25  drbd6: conn( StandAlone -> Unconnected )
Oct 17 11:46:25  drbd6: receiver (re)started
Oct 17 11:46:25  drbd6: conn( Unconnected -> WFConnection )
Oct 17 11:46:26  drbd6: conn( WFConnection -> WFReportParams )
Oct 17 11:46:26  drbd6: Handshake successful: DRBD Network Protocol version 86
Oct 17 11:46:26  drbd6: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Oct 17 11:46:26  drbd6: Writing meta data super block now.

not yet connected

Oct 17 11:46:26  drbd7: conn( StandAlone -> Unconnected )
Oct 17 11:46:26  drbd7: receiver (re)started
Oct 17 11:46:26  drbd7: conn( Unconnected -> WFConnection )
Oct 17 11:46:26  drbd7: conn( WFConnection -> WFReportParams )
Oct 17 11:46:26  drbd7: Handshake successful: DRBD Network Protocol version 86

not yet connected

"waiting for connection" would be expected to wait, still.


-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list