Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 24/05/12 13:54, Florian Haas wrote: > On Thu, May 24, 2012 at 1:53 PM, Matthew Bloch <matthew at bytemark.co.uk> wrote: >> Hmm, thanks. Unrelated to any of this, the v3a kernel (Debian 2.6.32-4) >> crashed pretty badly 48hrs ago. Since it has been rebooted - there have >> been no "PingAck not received" messages. > > Sure, if you have kernel-induced network problems on one of your > nodes, that would definitely explain the issues you're seeing. But you > insisted from the start that there were no network issues. :) No indeed, nothing external that we could detect after hours of layer 2 tracing, and no messages that would indicate a malfunction on either of the hosts. But this network problem was only visible via DRBD's messages, and if it's gone it's hard to reason about it any further (not that I miss it). As I said I couldn't see any symptoms via ICMP or TCP-based tests between the hosts. >> We are preparing to jump to a 2.6.32 sourced from CentOS because this Debian >> kernel seems to crash with one bug or another every few months. > > That would seem like an odd thing to do. FWIW, we've been running > happily on squeeze kernels for months. Then you've not hit the "scheduler divide by zero" bug or the "I/O frozen for 120s for no reason" bug or the "CPU#x stuck for 9999999s" bug? These are all things that are filed vaguely on the Redhat bug trackers, as far as I know, and usually closed a few kernel versions later with "well I haven't seen it for a few kernel versions so it's probably OK"! These are relatively rare bugs, except for some of our customers, when they're not at all rare and we haul them up to e.g. whatever wheezy has. Except in this case they broke the briding code in 3.2.0 which is going to cause a virtualising customer some problems :-) >> The reason we're using external meta-devices is for backup: without the >> metadata at the end, the underlying disk image represents exactly what the >> VMs see. We can then snapshot this and take a reasonably consistent backup >> without bothering DRBD. We later verify this backup by booting it back up, >> disconnected, and taking a snapshot of the VNC console! > > You can always to that from a device with metadata as well. kpartx is > your friend. Sure, but neither do we pay a penalty for doing it externally. It's all on LVM and proper battery-backed RAID. >> The reason I picked protocol B is because LVM snaphots kill the local DRBD >> performance if we snapshot the LVM device underlying the DRBD Primary. If >> we snapshot the Secondary and used protocol B where we weren't dependent on >> local write speeds, my working theory was that the performance hit wouldn't >> be as noticeable, and the customer seemed to concur (previously we were >> using C). > > That's a fair point, but realistically, how long does it take you to > take the backup off your snapshot? 10-60 minutes per system. Long enough that the I/O sensitive VMs notice. And the customer has customers who are up 24 hours a day, so there is no reliable "quiet time" when we can reduce their I/O bandwidth and not have it commented on. > And does this normally coincide > with the DRBD device getting hammered, which is pretty much the only > situation in which a downstream client would likely feel any > disruption? The DRBDs don't really get hammered at any one time - the backups happen direct from LVs on the host, and go over the main (not replication) interface. So the host system's I/O is stressed, sure. Previously the disconnects happened several times a day, not just when the backups ran - this is a separate issue from the one I asked about while still being relevant to the list. Arguably a customer running a heavily interactive system to very remote destinations shouldn't be using such a complex I/O stack and should use dedicated hardware. This is a pragmatic, expensive, unambitious arguemnt :-) But drbd+LVM has worked very well for them for 18 months, and the peace of mind of being able to start their customers' VMs in one of two places makes diagnosing this properly worth the effort. -- Matthew