Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Oct 24, 2012 at 05:44:49PM -0400, Whit Blauvelt wrote: > > Date: Wed, 24 Oct 2012 17:16:21 -0400 > From: Whit Blauvelt <whit.drbd at transpect.com> > To: Rasto Levrinc <rasto.levrinc at gmail.com>, drbd-user at lists.linbit.com > Cc: drbd-mc at lists.linbit.com > Subject: Re: [drbd-mc] LCMC display says "up to date" but DRBD is not > User-Agent: Mutt/1.5.21 (2010-09-15) > > I wrote: > > > > I've got a fairly simple setup, that back some time ago was working well, > > > but at some point has slipped away from me. I have a number of KVM VMs which > > > have been set up by using a distinct LVM partition behind each, and then > > > using DRBD to mirror these between two servers via a dedicated crossover. > > > There are 6-8 VMs on each of the two servers, with both dedicated LVMs and > > > dedicated DRBD resources. I've been using current versions of LCMC along the > > > way to set up the DRBD mirroring. The LVMs have been set up using native > > > tools, and the KVM VMs through libvirt. > > > > > > To put the problem briefly, I've recently discovered, on shutting down VMs > > > on one server and then restarting the VMs on the other, after shifting DRBD > > > primary assignments, that the secondary DRBD storage has not kept up. This > > > is despite Connected/UpToDate claims in the Storage display of LCMC. > > > > The display in LCMC should be ok. Your problem is probably either your > > config or an administration error at some point, forcing the DRBD to think > > the data are up-to-date. You can run online verify to check if your > > secondary has the same data as primary, before finding out the hard way. For > > DRBD specific questions, you should ask in drbd-user mailing list. > > > > Rasto > > Thanks Rasto, > > Including the drbd list now. > > I'm certainly capable of administrative error. One such "adminnistrative error" we've come across much too frequently, and which shows exactly these "symptoms", is this: (I'm in the ascii art mood today...) You at one point had: --------------------- VM \ \ [logical volume] Then you added DRBD, and now you have: -------------------------------------- VM [DRBD] -------- [DRBD] remote node \ / \ / !! THIS IS WRONG !! [logical volume] (DRBD does not see or know about any changes done by VM) But what you need is actually: ============================== VM | [DRBD] -------- [DRBD] remote node / [logical volume] (DRBD sees every change done by the VM, and thus has a chance to mirror the changes over). Cheers, Lars > And the reporting of UpToDate > when the filesystems are definitely not is deeper than LCMC - drbd-overview > shows the same thing, "Connected UpToDate/UpToDate" even though the mirror > doesn't match. "cat /prod/drdb" gives the same misinformation. "drbdadm > cstate xxx" also gives "Connected". And "drbdadm dstate cent_s" gives > "UpToDate/UpToDate" on both servers. > > A problem with the "administrative error" hypothesis is that the DRBD > administration has been, beyond the initial installation, entirely through > LCMC. That is, it's a problem for LCMC (perhaps an older version though) if > it allows an admin's error that results in false reports of up-to-date > connections. > > Using online verify also confirms that we're not at all up to date: > > Oct 24 16:49:24 vm1 kernel: [5730169.131424] block drbd0: conn( Connected -> VerifyS ) > Oct 24 16:49:24 vm1 kernel: [5730169.131434] block drbd0: Starting Online Verify from sector 0 > Oct 24 16:49:24 vm1 kernel: [5730169.185546] block drbd0: Out of sync: start=584, size=8 (sectors) > Oct 24 16:49:24 vm1 kernel: [5730169.188980] block drbd0: Out of sync: start=1112, size=16 (sectors) > Oct 24 16:49:24 vm1 kernel: [5730169.236967] block drbd0: Out of sync: start=64, size=8 (sectors) > Oct 24 16:49:24 vm1 kernel: [5730169.630823] block drbd0: Out of sync: start=32832, size=8 (sectors) > ... on for 947 lines of such notices in this case. > > Disconnecting and reconnecting the secondary should cause a resync per the > manual. Okay. But that's not preventing the problem redeveloping - not > identifying and correcting the cause. > > To review how these were administratively set up: An LVM partition was used > as a backing store in creating each VM. A matching LVM partition was created > on the second server. LCMC was used at that point to assign both to DRBD, > using the data from the first LVM. > > It is initially working, or else the secondary wouldn't be populated at all. > But it stops working at some point, while leaving DRBD showing that > everything's just fine - short of running online verify or doing the > disconnect-reconnect sequence. I could script disconnect-reconnect behavior > overnight. That still wouldn't guarantee good mirrors in between, so DRBD > still can't be 100% depended on for failover then. > > This is not the most up-to-date system, drbd version 8.3.8.1. Still.... > > Whit > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed