[DRBD-user] DRBD ping-timeout values

Florian Haas florian.haas at linbit.com
Fri Apr 4 15:32:49 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Friday 04 April 2008 15:18:21 George H wrote:
> OK I upgraded both my blades to the latest stable kernel 2.6.24.
> Rebuilt drbd 8.0.8 and restarted the sync.
>
> I noticed off hand the connection to get sync was quicker than before
>
> Apr  4 15:49:35 mailserv1 drbd0: conn( Connected -> WFBitMapS )
> Apr  4 15:50:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource )
>
> normally it used to take 5 or more minutes.
>
> But as that was quick.. so was the "network failure" see below
>
> Apr  4 15:49:35 mailserv1 drbd0: Writing meta data super block now.
> Apr  4 15:49:35 mailserv1 drbd0: Becoming sync source due to disk states.
> Apr  4 15:49:35 mailserv1 drbd0: Writing meta data super block now.
> Apr  4 15:49:35 mailserv1 drbd0: writing of bitmap took 7 jiffies
> Apr  4 15:49:35 mailserv1 drbd0: 476 GB (124997941 bits) marked
> out-of-sync by on disk bit-map.
> Apr  4 15:49:35 mailserv1 drbd0: Writing meta data super block now.
> Apr  4 15:49:35 mailserv1 drbd0: conn( Connected -> WFBitMapS )
> Apr  4 15:50:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource )
> Apr  4 15:50:18 mailserv1 drbd0: Began resync as SyncSource (will sync
> 499991764 KB [124997941 bits set]).
> Apr  4 15:50:18 mailserv1 drbd0: Writing meta data super block now.
> Apr  4 16:03:26 mailserv1 drbd0: PingAck did not arrive in time.
> Apr  4 16:03:26 mailserv1 drbd0: peer( Secondary -> Unknown ) conn(
> SyncSource -> NetworkFailure )
> Apr  4 16:03:26 mailserv1 drbd0: asender terminated
> Apr  4 16:03:26 mailserv1 drbd0: drbd_pp_alloc interrupted!
> Apr  4 16:03:26 mailserv1 drbd0: alloc_ee: Allocation of a page failed
> Apr  4 16:03:26 mailserv1 drbd0: error receiving RSDataRequest, l: 24!
> Apr  4 16:03:26 mailserv1 drbd0: tl_clear()
> Apr  4 16:03:26 mailserv1 drbd0: Connection closed
> Apr  4 16:03:26 mailserv1 drbd0: Writing meta data super block now.
> Apr  4 16:03:26 mailserv1 drbd0: conn( NetworkFailure -> Unconnected )
> Apr  4 16:03:26 mailserv1 drbd0: receiver terminated
> Apr  4 16:03:26 mailserv1 drbd0: receiver (re)started
> Apr  4 16:03:26 mailserv1 drbd0: conn( Unconnected -> WFConnection )
> Apr  4 16:03:26 mailserv1 drbd0: Handshake successful: DRBD Network
> Protocol version 86

OK so that's a very quick disconnection and subsequent reconnection. How often 
does that occur? Do you ever get network interruptions for longer periods? 
When you do, what does "tcpdump -i <your replication interface>" say?

I strongly suspect at this point all your DRBD tuning efforts, while 
admirable, are futile. You really need to fix your network stack first.

Cheers,
Florian

-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria

When replying, there is no need to CC my personal address.
I monitor the list on a daily basis. Thank you.



More information about the drbd-user mailing list