[DRBD-user] Re: DRBD ping-timeout values

Fri Apr 4 10:37:31 CEST 2008

On 4/4/08, George H <george.dma at gmail.com> wrote:
> Hi,
>
>  I'm using DRBD v8.0.8 with kernel 2.6.23 and I am doing an initial
>  sync of around 500GB.
>  It continuously fails with "PingAck did not arrive in time." after it
>  reaches 2% of progress. I re-ran the sync and separately pinged the
>  other node the whole time. The ping replys were all at 0.200 ms.
>
>  Below are the DRBD logs. Basically it took around 45minute before a
>  ping ack wasn't received.
>
>  pr  4 09:02:05 mailserv1 drbd0: Peer authenticated using 32 bytes of
>  'sha256' HMAC
>  Apr  4 09:02:05 mailserv1 drbd0: conn( WFConnection -> WFReportParams )
>  Apr  4 09:02:05 mailserv1 drbd0: Becoming sync source due to disk states.
>  Apr  4 09:02:05 mailserv1 drbd0: peer( Unknown -> Secondary ) conn(
>  WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )
>  Apr  4 09:02:09 mailserv1 drbd0: Writing meta data super block now.
>  Apr  4 09:08:16 mailserv1 drbd0: role( Secondary -> Primary )
>  Apr  4 09:08:16 mailserv1 drbd0: Writing meta data super block now.
>  Apr  4 09:08:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource )
>  Apr  4 09:08:18 mailserv1 drbd0: Began resync as SyncSource (will sync
>  453330132 KB [113332533 bits set]).
>  Apr  4 09:08:18 mailserv1 drbd0: Writing meta data super block now.
>  Apr  4 09:53:56 mailserv1 drbd0: PingAck did not arrive in time.
>  Apr  4 09:53:56 mailserv1 drbd0: peer( Secondary -> Unknown ) conn(
>  SyncSource -> NetworkFailure )
>  Apr  4 09:53:56 mailserv1 drbd0: asender terminated
>  Apr  4 09:53:56 mailserv1 drbd0: drbd_pp_alloc interrupted!
>  Apr  4 09:53:56 mailserv1 drbd0: alloc_ee: Allocation of a page failed
>  Apr  4 09:53:56 mailserv1 drbd0: error receiving RSDataRequest, l: 24!
>  Apr  4 09:53:56 mailserv1 drbd0: tl_clear()
>  Apr  4 09:53:56 mailserv1 drbd0: Connection closed
>  Apr  4 09:53:56 mailserv1 drbd0: Writing meta data super block now.
>  Apr  4 09:53:56 mailserv1 drbd0: conn( NetworkFailure -> Unconnected )
>  Apr  4 09:53:56 mailserv1 drbd0: receiver terminated
>  Apr  4 09:53:56 mailserv1 drbd0: receiver (re)started
>  Apr  4 09:53:56 mailserv1 drbd0: conn( Unconnected -> WFConnection )
>  Apr  4 09:53:59 mailserv1 drbd0: Handshake successful: DRBD Network
>  Protocol version 86
>  Apr  4 09:53:59 mailserv1 drbd0: Peer authenticated using 32 bytes of
>  'sha256' HMAC
>  Apr  4 09:53:59 mailserv1 drbd0: conn( WFConnection -> WFReportParams )
>  Apr  4 09:53:59 mailserv1 drbd0: Becoming sync source due to disk states.
>  Apr  4 09:53:59 mailserv1 drbd0: peer( Unknown -> Secondary ) conn(
>  WFReportParams -> WFBitMapS )
>  Apr  4 09:54:03 mailserv1 drbd0: Writing meta data super block now.
>  Apr  4 10:00:14 mailserv1 drbd0: conn( WFBitMapS -> SyncSource )
>  Apr  4 10:00:14 mailserv1 drbd0: Began resync as SyncSource (will sync
>  445001204 KB [111250301 bits set]).
>  Apr  4 10:00:14 mailserv1 drbd0: Writing meta data super block now.
>
>  I am using default values for ping-int, pint-timeout, which are   10
>  and 500 (respectively). To me this looks like the DRBD software is
>  lagging in replying to the pingAck , am I right on this? if I increase
>  the ping-timeout to something bigger like 1000 or 2000 will it solve
>  this problem?
>
>  My eth0 setting are (ethtool outhout)
>
>  Settings for eth0:
>         Supported ports: [ FIBRE ]
>         Supported link modes:   1000baseT/Full
>         Supports auto-negotiation: Yes
>         Advertised link modes:  1000baseT/Full
>         Advertised auto-negotiation: Yes
>         Speed: 1000Mb/s
>         Duplex: Full
>         Port: FIBRE
>         PHYAD: 2
>         Transceiver: internal
>         Auto-negotiation: on
>         Supports Wake-on: d
>         Wake-on: d
>         Link detected: yes
>
>  Thanks.
>

OK I just noticed that in the man page it says the default
ping-timeout value is 500ms, but really it won't let you load DRBD if
the value is not between 1 and 100.

SO I set it to 100 and now I am getting pingAck error many times just
at the WFBitMapT and WFBitMapS states.