[DRBD-user] "PingAck not received" messages

Florian Haas florian at hastexo.com
Mon May 21 08:14:31 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Matthew,

On Wed, May 16, 2012 at 10:11 PM, Matthew Bloch <matthew at bytemark.co.uk> wrote:
> I'm trying to understand a symptom for a client who uses drbd to run
> sets of virtual machines between three pairs of servers (v1a/v1b,
> v2a/v2b, v3a/v3b), and I wanted to understand a bit better how DRBD I/O
> is buffered depending on what mode is chosen, and buffer settings.

When you say virtual machines, how exactly are they being virtualized?
VMware? Libvirt/KVM? Xen?

> Firstly, it surprised me that even in replication mode "A", the system
> still seemed limited by by the bandwidth between nodes.

Not surprising; Bandwidth is always a limiting factor. In protocol A,
if your send buffers don't drain quickly enough, all DRBD can do is
block incoming I/O. DRBD Proxy helps you with compression and by
buffering write bursts, but if your system is getting constantly
hammered to the point where even the _average_ throughput is higher
than the bandwidth of the replication link (minus DRBD Proxy
compression, of course), you can still get it to block. Not too many
people actually run into this, but if you're trying to writing on a
DRBD device at, say 100 MB/s, and all you've got between sites is
2Mbps, then that's not going to work. :)


>  I found this
> out when the customer's bonded interface had flipped over to its 100Mb
> backup connection, and suddenly they had I/O problems.  While I was
> investigating this and running tests, I noticed that switching to mode A
> didn't help, even when measuring short transfers that I'd expect would
> fit into reasonable-sized buffers.  What kind of buffer size can I
> expect from an "auto-tuned" DRBD?  It seems important to be able to
> cover bursts without leaning on the network, so I'd like to know whether
> that's possible with some special tuning.

With sndbuf-size 0 and rcvbuf-size 0, DRBD will let the kernel do its
work. Refer to your TCP rmem/wmem sysctls to get an idea of the buffer
sizes you'll be getting.


> And secondly, what should I be doing about it?  My unsatisfactory
> response to the customer's worry is to reconnect all the drbds with a
> longer ping-timeout, and in 10 hours it hasn't reoccurred, which is an
> unusually long record.  I will be more convinced by the end of the day.

My hunch would be to start troubleshooting the virtualized network
layer. Hence my question above as to what virtualization you're using.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now



More information about the drbd-user mailing list