Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 4/4/08, George H <george.dma at gmail.com> wrote: > Hi, > > I'm using DRBD v8.0.8 with kernel 2.6.23 and I am doing an initial > sync of around 500GB. > It continuously fails with "PingAck did not arrive in time." after it > reaches 2% of progress. I re-ran the sync and separately pinged the > other node the whole time. The ping replys were all at 0.200 ms. > > Below are the DRBD logs. Basically it took around 45minute before a > ping ack wasn't received. > > pr 4 09:02:05 mailserv1 drbd0: Peer authenticated using 32 bytes of > 'sha256' HMAC > Apr 4 09:02:05 mailserv1 drbd0: conn( WFConnection -> WFReportParams ) > Apr 4 09:02:05 mailserv1 drbd0: Becoming sync source due to disk states. > Apr 4 09:02:05 mailserv1 drbd0: peer( Unknown -> Secondary ) conn( > WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent ) > Apr 4 09:02:09 mailserv1 drbd0: Writing meta data super block now. > Apr 4 09:08:16 mailserv1 drbd0: role( Secondary -> Primary ) > Apr 4 09:08:16 mailserv1 drbd0: Writing meta data super block now. > Apr 4 09:08:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource ) > Apr 4 09:08:18 mailserv1 drbd0: Began resync as SyncSource (will sync > 453330132 KB [113332533 bits set]). > Apr 4 09:08:18 mailserv1 drbd0: Writing meta data super block now. > Apr 4 09:53:56 mailserv1 drbd0: PingAck did not arrive in time. > Apr 4 09:53:56 mailserv1 drbd0: peer( Secondary -> Unknown ) conn( > SyncSource -> NetworkFailure ) > Apr 4 09:53:56 mailserv1 drbd0: asender terminated > Apr 4 09:53:56 mailserv1 drbd0: drbd_pp_alloc interrupted! > Apr 4 09:53:56 mailserv1 drbd0: alloc_ee: Allocation of a page failed > Apr 4 09:53:56 mailserv1 drbd0: error receiving RSDataRequest, l: 24! > Apr 4 09:53:56 mailserv1 drbd0: tl_clear() > Apr 4 09:53:56 mailserv1 drbd0: Connection closed > Apr 4 09:53:56 mailserv1 drbd0: Writing meta data super block now. > Apr 4 09:53:56 mailserv1 drbd0: conn( NetworkFailure -> Unconnected ) > Apr 4 09:53:56 mailserv1 drbd0: receiver terminated > Apr 4 09:53:56 mailserv1 drbd0: receiver (re)started > Apr 4 09:53:56 mailserv1 drbd0: conn( Unconnected -> WFConnection ) > Apr 4 09:53:59 mailserv1 drbd0: Handshake successful: DRBD Network > Protocol version 86 > Apr 4 09:53:59 mailserv1 drbd0: Peer authenticated using 32 bytes of > 'sha256' HMAC > Apr 4 09:53:59 mailserv1 drbd0: conn( WFConnection -> WFReportParams ) > Apr 4 09:53:59 mailserv1 drbd0: Becoming sync source due to disk states. > Apr 4 09:53:59 mailserv1 drbd0: peer( Unknown -> Secondary ) conn( > WFReportParams -> WFBitMapS ) > Apr 4 09:54:03 mailserv1 drbd0: Writing meta data super block now. > Apr 4 10:00:14 mailserv1 drbd0: conn( WFBitMapS -> SyncSource ) > Apr 4 10:00:14 mailserv1 drbd0: Began resync as SyncSource (will sync > 445001204 KB [111250301 bits set]). > Apr 4 10:00:14 mailserv1 drbd0: Writing meta data super block now. > > I am using default values for ping-int, pint-timeout, which are 10 > and 500 (respectively). To me this looks like the DRBD software is > lagging in replying to the pingAck , am I right on this? if I increase > the ping-timeout to something bigger like 1000 or 2000 will it solve > this problem? > > My eth0 setting are (ethtool outhout) > > Settings for eth0: > Supported ports: [ FIBRE ] > Supported link modes: 1000baseT/Full > Supports auto-negotiation: Yes > Advertised link modes: 1000baseT/Full > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: FIBRE > PHYAD: 2 > Transceiver: internal > Auto-negotiation: on > Supports Wake-on: d > Wake-on: d > Link detected: yes > > Thanks. > OK I just noticed that in the man page it says the default ping-timeout value is 500ms, but really it won't let you load DRBD if the value is not between 1 and 100. SO I set it to 100 and now I am getting pingAck error many times just at the WFBitMapT and WFBitMapS states.