[DRBD-user] "PingAck not received" messages

Felix Frank ff at mpexnet.de
Mon May 21 10:03:04 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

On 05/18/2012 06:49 PM, Matthew Bloch wrote:
> I set up a two pairs of VMs to write 1MB to the DRBD every second, and
> time it.  On the problematic machines, I saw lots of times where the
> write took more than 10s, and a couple of those corresponded with DRBD
> reconnections.  On the normal machines, only two of the writes took more
> than 0.1s!
> 
> So I'm still hunting for what might be going wrong, even though the
> software versions are the same, the drbd links aren't hitting the
> ceiling, they're doing no more I/O than the "good" pairs.  I think next
> will be to take some packet dumps to see if there is anything odd going
> on at the TCP layer.
> 
> If nobody else on the list has seen this sort of behaviour, and Linbit
> have a day rate :-) please get in touch privately, I'd rather get you
> guys to fix this for our customer.

I did have a couple of VMs with severe network problems. They were based
on the 2.6.33-ish KVM with userland as found in Debian Squeeze.

I'd find more or less frequent lost pings and reconnects in the DRBD
logs. Once every couple of weeks the network stack on the Primary would
completely stop receiving packets (funnily enough, it would still send,
so HA didn't kick in, but that may be a different story).

Switching from virtio to e1000 solved this for me.

Cheers,
Felix



More information about the drbd-user mailing list