[DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

Lars Ellenberg lars.ellenberg at linbit.com
Tue Oct 18 20:16:52 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Oct 14, 2016 at 07:07:55AM +0000, Eric Robinson wrote:
> > > > Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget
> > (will sync 0 KB [0 bits set]).
> > > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in
> > time.
> > > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary ->
> > > > Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate ->
> > > > DUnknown )
> > 
> > has been said before:
> > DRBD ping timeout is apparently too short for the latency in your setup.
> > increase it appropriately.
> > 
> > Where latency in this case involves network rtt plus kernel thread scheduling
> > plus maybe additional synchronous (flush/fua) IO plus whatever else DRBD
> > feels is necessary for a full DRBD to DRBD round-trip.
> > 
> > > > However, I can guarantee that the network connection is solid.
> > > > Running ping flood, I get 30,000 packets sent with no loss or
> > > > latency.
> > 
> > Mind telling us the network characteristics?  IO backend?
> > Virtualized?  Distribution? Kernel and DRBD version(s)?
> > 
> 
> We have a dozen other DRBD clusters and this has never happened to any
> of the others over the past decade or so, and they are all on the same
> switched network. The nodes are in different data centers 22 miles
> apart connected by gigabit fiber. Latency is always sub -millisecond.
> See the following ping test...
> 
> [root at ha14a ~]# ping -f ha14b-cl
> PING ha14b-cl.mycharts.md (198.51.100.43) 56(84) bytes of data.
> .^C
> --- ha14b-cl.mycharts.md ping statistics ---
> 23433 packets transmitted, 23432 received, 0% packet loss, time 15911ms
> rtt min/avg/max/mdev = 0.585/0.659/0.847/0.021 ms, ipg/ewma 0.679/0.658 ms
> 
> The servers are all physical, running RHEL 6.3 kernel 2.6.32-279.el6.x86_64. SSD drives.
> 
> DRBD version is 8.4.3


So, did you try to increase the ping timeout setting?
Did it help?

Did you try to upgrade to DRBD 8.4.8?
Did that help?


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list