Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Friday 04 April 2008 15:18:21 George H wrote: > OK I upgraded both my blades to the latest stable kernel 2.6.24. > Rebuilt drbd 8.0.8 and restarted the sync. > > I noticed off hand the connection to get sync was quicker than before > > Apr 4 15:49:35 mailserv1 drbd0: conn( Connected -> WFBitMapS ) > Apr 4 15:50:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource ) > > normally it used to take 5 or more minutes. > > But as that was quick.. so was the "network failure" see below > > Apr 4 15:49:35 mailserv1 drbd0: Writing meta data super block now. > Apr 4 15:49:35 mailserv1 drbd0: Becoming sync source due to disk states. > Apr 4 15:49:35 mailserv1 drbd0: Writing meta data super block now. > Apr 4 15:49:35 mailserv1 drbd0: writing of bitmap took 7 jiffies > Apr 4 15:49:35 mailserv1 drbd0: 476 GB (124997941 bits) marked > out-of-sync by on disk bit-map. > Apr 4 15:49:35 mailserv1 drbd0: Writing meta data super block now. > Apr 4 15:49:35 mailserv1 drbd0: conn( Connected -> WFBitMapS ) > Apr 4 15:50:18 mailserv1 drbd0: conn( WFBitMapS -> SyncSource ) > Apr 4 15:50:18 mailserv1 drbd0: Began resync as SyncSource (will sync > 499991764 KB [124997941 bits set]). > Apr 4 15:50:18 mailserv1 drbd0: Writing meta data super block now. > Apr 4 16:03:26 mailserv1 drbd0: PingAck did not arrive in time. > Apr 4 16:03:26 mailserv1 drbd0: peer( Secondary -> Unknown ) conn( > SyncSource -> NetworkFailure ) > Apr 4 16:03:26 mailserv1 drbd0: asender terminated > Apr 4 16:03:26 mailserv1 drbd0: drbd_pp_alloc interrupted! > Apr 4 16:03:26 mailserv1 drbd0: alloc_ee: Allocation of a page failed > Apr 4 16:03:26 mailserv1 drbd0: error receiving RSDataRequest, l: 24! > Apr 4 16:03:26 mailserv1 drbd0: tl_clear() > Apr 4 16:03:26 mailserv1 drbd0: Connection closed > Apr 4 16:03:26 mailserv1 drbd0: Writing meta data super block now. > Apr 4 16:03:26 mailserv1 drbd0: conn( NetworkFailure -> Unconnected ) > Apr 4 16:03:26 mailserv1 drbd0: receiver terminated > Apr 4 16:03:26 mailserv1 drbd0: receiver (re)started > Apr 4 16:03:26 mailserv1 drbd0: conn( Unconnected -> WFConnection ) > Apr 4 16:03:26 mailserv1 drbd0: Handshake successful: DRBD Network > Protocol version 86 OK so that's a very quick disconnection and subsequent reconnection. How often does that occur? Do you ever get network interruptions for longer periods? When you do, what does "tcpdump -i <your replication interface>" say? I strongly suspect at this point all your DRBD tuning efforts, while admirable, are futile. You really need to fix your network stack first. Cheers, Florian -- : Florian G. Haas : LINBIT Information Technologies GmbH : Vivenotgasse 48, A-1120 Vienna, Austria When replying, there is no need to CC my personal address. I monitor the list on a daily basis. Thank you.