[DRBD-user] Bad network connection causing DRBD to freeze

Rainer Sabelka sabelka at iue.tuwien.ac.at
Mon Feb 2 17:20:23 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Monday 02 February 2009 13:51:49 Lars Ellenberg wrote:
> On Mon, Feb 02, 2009 at 01:23:51PM +0100, Rainer Sabelka wrote:
> > Hi,
> >
> > I'm using DRBD (0.8.12) on a pair of servers in separate locations
> > connected by an (almost) dedicated 1GBit ethernet link.
> > This connection has become unreliabe in a way that from time to time we
> > see a packet loss up to 30 percent.
> > During the these phases of high packet loss, access to the DRBD device
> > blocks for several minutes and the applications accessing the disk become
> > completely unresponsive.
> >
> > While we are trying to fix the network connetion in the first place I
> > wonder if I can do something with DRBD to work around this problem.
> >
> > From what I see in the logfiles It seems that DRBD detects the network
> > failure, diconnects, and immediately trys to reconnect. Then it stays for
> > several minutes in the WFBitMapS state.
> > It seems that any access to the DRBD device during this time blocks until
> > the state SyncSource is reached.
> > If the packet loss on the network confinus for a longer periode this
> > disconnect-reconnect cycle repeats several times.
> > The result is that a disturbance in the network connection between the
> > servers basically supends all running services which depend on DRBD.
> >
> > To work around the problem I've now put DRBD into stand alone mode.
> > Is there anything else I can do about this?
>
> fix the network link?

Yes of course this is being worked on. But since the problem occurs only 
transiently its not easy to identify the faulty component.

> well, seriously: what would you like DRBD to do?

Allow IO to the local disk during transfer of the bitmap.

However, I'm not sure if this would be possible since this would mean that the 
bitmap changes during the transfer. Maybe this could be handled with a second 
bitmap to keep track which blocks of the first bitmap have been altered during 
the transfer and need to be re-transmitted. But I feel this is starting to get 
complicated ...

> > Jan 30 11:21:33 server2 kernel: drbd0: [drbd0_worker/13962] sock_sendmsg
> > time expired, ko = 19
>
> if you configure your ko count smaller,
> then if it actually reaches zero,
> drbd stays disconnected (StandAlone),
> until you tell it to reconnect explicitly.

I just tried this on a pair of test machines (with ko-count=5).
But DRBD always tries to reconnect.

-Rainer




More information about the drbd-user mailing list