Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Date: Wed, 24 Oct 2012 17:16:21 -0400 From: Whit Blauvelt <whit.drbd at transpect.com> To: Rasto Levrinc <rasto.levrinc at gmail.com>, drbd-user at lists.linbit.com Cc: drbd-mc at lists.linbit.com Subject: Re: [drbd-mc] LCMC display says "up to date" but DRBD is not User-Agent: Mutt/1.5.21 (2010-09-15) I wrote: > > I've got a fairly simple setup, that back some time ago was working well, > > but at some point has slipped away from me. I have a number of KVM VMs which > > have been set up by using a distinct LVM partition behind each, and then > > using DRBD to mirror these between two servers via a dedicated crossover. > > There are 6-8 VMs on each of the two servers, with both dedicated LVMs and > > dedicated DRBD resources. I've been using current versions of LCMC along the > > way to set up the DRBD mirroring. The LVMs have been set up using native > > tools, and the KVM VMs through libvirt. > > > > To put the problem briefly, I've recently discovered, on shutting down VMs > > on one server and then restarting the VMs on the other, after shifting DRBD > > primary assignments, that the secondary DRBD storage has not kept up. This > > is despite Connected/UpToDate claims in the Storage display of LCMC. > The display in LCMC should be ok. Your problem is probably either your > config or an administration error at some point, forcing the DRBD to think > the data are up-to-date. You can run online verify to check if your > secondary has the same data as primary, before finding out the hard way. For > DRBD specific questions, you should ask in drbd-user mailing list. > > Rasto Thanks Rasto, Including the drbd list now. I'm certainly capable of administrative error. And the reporting of UpToDate when the filesystems are definitely not is deeper than LCMC - drbd-overview shows the same thing, "Connected UpToDate/UpToDate" even though the mirror doesn't match. "cat /prod/drdb" gives the same misinformation. "drbdadm cstate xxx" also gives "Connected". And "drbdadm dstate cent_s" gives "UpToDate/UpToDate" on both servers. A problem with the "administrative error" hypothesis is that the DRBD administration has been, beyond the initial installation, entirely through LCMC. That is, it's a problem for LCMC (perhaps an older version though) if it allows an admin's error that results in false reports of up-to-date connections. Using online verify also confirms that we're not at all up to date: Oct 24 16:49:24 vm1 kernel: [5730169.131424] block drbd0: conn( Connected -> VerifyS ) Oct 24 16:49:24 vm1 kernel: [5730169.131434] block drbd0: Starting Online Verify from sector 0 Oct 24 16:49:24 vm1 kernel: [5730169.185546] block drbd0: Out of sync: start=584, size=8 (sectors) Oct 24 16:49:24 vm1 kernel: [5730169.188980] block drbd0: Out of sync: start=1112, size=16 (sectors) Oct 24 16:49:24 vm1 kernel: [5730169.236967] block drbd0: Out of sync: start=64, size=8 (sectors) Oct 24 16:49:24 vm1 kernel: [5730169.630823] block drbd0: Out of sync: start=32832, size=8 (sectors) ... on for 947 lines of such notices in this case. Disconnecting and reconnecting the secondary should cause a resync per the manual. Okay. But that's not preventing the problem redeveloping - not identifying and correcting the cause. To review how these were administratively set up: An LVM partition was used as a backing store in creating each VM. A matching LVM partition was created on the second server. LCMC was used at that point to assign both to DRBD, using the data from the first LVM. It is initially working, or else the secondary wouldn't be populated at all. But it stops working at some point, while leaving DRBD showing that everything's just fine - short of running online verify or doing the disconnect-reconnect sequence. I could script disconnect-reconnect behavior overnight. That still wouldn't guarantee good mirrors in between, so DRBD still can't be 100% depended on for failover then. This is not the most up-to-date system, drbd version 188.8.131.52. Still.... Whit