[DRBD-user] Bad network connection causing DRBD to freeze

Rainer Sabelka sabelka at iue.tuwien.ac.at
Tue Feb 3 15:52:30 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tuesday 03 February 2009 09:41:42 Lars Ellenberg wrote:
> On Mon, Feb 02, 2009 at 05:20:23PM +0100, Rainer Sabelka wrote:
> > > if you configure your ko count smaller,
> > > then if it actually reaches zero,
> > > drbd stays disconnected (StandAlone),
> > > until you tell it to reconnect explicitly.
> >
> > I just tried this on a pair of test machines (with ko-count=5).
> > But DRBD always tries to reconnect.
>
> hm.  it should not.
> if ko-count reaches zero, DRBD is supposed to go StandAlone.

This would be fine.
Basiclaclly this is what I do manually now. If users complain that the server 
"hangs" I say "drbdadm disconnect all".

> you say it just does a disconnect/reconnect cycle.
> guess we have to have a look at this more closely.

Hmm. When ko-count reaches zero my DRBD goes to "Timeout", then to 
"Unconnected", and again to "WFConnection", and the cycle continues.

Best Regards,
-Rainer


Feb  3 12:54:09 server1 kernel: drbd0: Handshake successful: DRBD Network 
Protocol version 86
Feb  3 12:54:09 server1 kernel: drbd0: conn( WFConnection -> WFReportParams )
Feb  3 12:54:09 server1 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [30508])
Feb  3 12:54:09 server1 kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate
)
Feb  3 12:54:09 server1 kernel: drbd0: Writing meta data super block now.
Feb  3 12:57:08 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 4
Feb  3 12:57:22 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 4
Feb  3 12:57:28 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 3
Feb  3 12:57:34 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 2
Feb  3 12:57:40 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 1
Feb  3 12:57:52 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 4
Feb  3 12:57:58 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 3
Feb  3 12:58:04 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 2
Feb  3 12:58:10 server1 kernel: drbd0: [drbd0_worker/23326] sock_sendmsg time 
expired, ko = 1
Feb  3 12:58:16 server1 kernel: drbd0: peer( Secondary -> Unknown ) conn( 
WFBitMapS -> Timeout ) pdsk( UpToDate -> DUnknown )
Feb  3 12:58:16 server1 kernel: drbd0: short sent ReportBitMap size=4096 
sent=3124
Feb  3 12:58:16 server1 kernel: drbd0: Writing meta data super block now.
Feb  3 12:58:16 server1 kernel: drbd0: short read expecting header on sock: 
r=-512
Feb  3 12:58:16 server1 kernel: drbd0: asender terminated
Feb  3 12:58:16 server1 kernel: drbd0: Terminating asender thread
Feb  3 12:58:16 server1 kernel: drbd0: tl_clear()
Feb  3 12:58:16 server1 kernel: drbd0: Connection closed
Feb  3 12:58:16 server1 kernel: drbd0: conn( Timeout -> Unconnected )
Feb  3 12:58:16 server1 kernel: drbd0: receiver terminated
Feb  3 12:58:16 server1 kernel: drbd0: receiver (re)started
Feb  3 12:58:16 server1 kernel: drbd0: conn( Unconnected -> WFConnection )
Feb  3 12:58:17 server1 kernel: drbd0: Handshake successful: DRBD Network 
Protocol version 86
Feb  3 12:58:17 server1 kernel: drbd0: conn( WFConnection -> WFReportParams )
Feb  3 12:58:17 server1 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [30508])
Feb  3 12:58:17 server1 kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate)

[and so on ...]





More information about the drbd-user mailing list